HOWTO use JSONB, the binary formatted JSONs in PostgreSQL

A previous blog I wrote on JSON functionality in PostgreSQL went wildly popular and it got me thinking about writing a follow-up HOWTO about JSONB.

JSONB, like JSON, is a native data type in PostgreSQL and was introduced in version 9.4. The major difference between the types is signified in the name; the extra ‘B’ stands for binary format. Some more detailed comparison is given in the table below:

JSON	JSONB
Stores data in text format	Stores data in decomposed binary format
Input is as fast, as no conversions are required	Input is slightly slower, as there is conversion overhead involved
Processing functions must re-parse the data on each execution	Re-parsing is not needed, making data processing significantly faster
Indexing is not supported	Indexing is supported
All white space and line feeds in the input are preserved as-is	Extra white space and line feeds are stripped
Duplicate keys are retained, processing functions only consider the last value	Duplicate keys are purged at input, only the last value is stored
Order of the keys is preserved	Order is not preserved

In general, unless there are some highly specialized needs (like legacy applications designed to assume an order of keys), it is highly recommended that JSONB be used.

Basic handling provided by PostgreSQL for both JSON and JSONB is pretty much the same. Because I have already talked about JSON in a previous blog, I will focus on the difference JSONB provides below.

Ignoring extra white space

Let’s start with the following valid object:

{
“key1”: “value1”,
“key2”: [1, 2],
“key3”: “value3”
}

JSON data type shows the following:

json_sample=# SELECT ‘{
“key1”: “value1”,
“key2”: [1, 2],
“key3”: “value3”
}’::JSON;
json
———————–
{ +
“key1”: “value1”,+
“key2”: [1, 2], +
“key3”: “value3” +
}
(1 row)

With JSONB, however:

json_sample=# SELECT ‘{
“key1”: “value1”,
“key2”: [1, 2],
“key3”: “value3”
}’::JSONB;
jsonb
——————————————————
{“key1”: “value1”, “key2”: [1, 2], “key3”: “value3”}
(1 row)

As you can see, all extra white space and line feeds in case JSONB were stripped by the database.

Handling duplicate keys
Behavior with JSON data type:

json_sample=# SELECT ‘{“key1”: “value1”, “key2”: [1, 2, 3, 4], “key2”:[1,2], “key2″:”value2”, “key3”: “value3”}’::JSON;
json
——————————————————————————————-
{“key1”: “value1”, “key2”: [1, 2, 3, 4], “key2”:[1,2], “key2″:”value2”, “key3”: “value3”}
(1 row)

Behavior with JSONB data type:

json_sample=# SELECT ‘{“key1”: “value1”, “key2”: [1, 2, 3, 4], “key2”:[1,2], “key2″:”value2”, “key3”: “value3”}’::JSONB;
jsonb
——————————————————–
{“key1”: “value1”, “key2”: “value2”, “key3”: “value3”}
(1 row)

The above demonstrates how duplicate keys are essentially ignored by JSONS and only the last value is kept.

Indexing
In my personal opinion, indexing is the biggest differentiating factor between JSON and JSONB data types in PostgreSQL, making it the biggest reason to opt for JSONB wherever possible. Let’s illustrate with an example below:

Let’s create 2 tables with the same structure, one with an index and one without an index:

json_sample=# CREATE TABLE sales_jsonb (id INTEGER, sale JSONB);

CREATE TABLE

json_sample=# CREATE TABLE sales_jsonb_index (id INTEGER, sale JSONB);

CREATE TABLE

json_sample=# CREATE INDEX idxgin ON sales_jsonb_index USING gin (sale);

CREATE INDEX

Now let’s add some data to both these tables. Data used in this example (512,000 rows) can be downloaded from here.

json_sample=# COPY sales_jsonb(sale) FROM ‘/Users/Shared/json.txt’;

COPY 512000

json_sample=# COPY sales_jsonb_index(sale) FROM ‘/Users/Shared/json.txt’;

COPY 512000

Running EXPLAIN on both these tables illustrates the cost difference very clearly:

json_sample=# EXPLAIN (FORMAT JSON) select * from sales_jsonb where sale @> ‘{“item”: “205386681-0”}’;

QUERY PLAN

——————————————————————

[ +

{ +

“Plan”: { +

“Node Type”: “Seq Scan”, +

“Relation Name”: “sales_jsonb”, +

“Alias”: “sales_jsonb”, +

“Startup Cost”: 0.00, +

“Total Cost”: 14748.00, +

“Plan Rows”: 512, +

“Plan Width”: 103, +

“Filter”: “(sale @> ‘{“item”: “205386681-0”}’::jsonb)”+

} +

]

(1 row)

json_sample=# EXPLAIN (FORMAT JSON) select * from sales_jsonb_index where sale @> ‘{“item”: “205386681-0”}’;

QUERY PLAN

————————————————————————–

[ +

{ +

“Plan”: { +

“Node Type”: “Bitmap Heap Scan”, +

“Relation Name”: “sales_jsonb_index”, +

“Alias”: “sales_jsonb_index”, +

“Startup Cost”: 39.97, +

“Total Cost”: 1670.57, +

“Plan Rows”: 512, +

“Plan Width”: 103, +

“Recheck Cond”: “(sale @> ‘{“item”: “205386681-0″}’::jsonb)”, +

“Plans”: [ +

{
+

“Node Type”: “Bitmap Index Scan”, +

“Parent Relationship”: “Outer”, +

“Index Name”: “idxgin”, +

“Startup Cost”: 0.00, +

“Total Cost”: 39.84, +

“Plan Rows”: 512, +

“Plan Width”: 0, +

“Index Cond”: “(sale @> ‘{“item”: “205386681-0”}’::jsonb)”+

} +

] +

} +

]

(1 row)

As you can see, the total cost for the same query on the same table structure for the same data set goes from 14,748 to 1,670.57 when the JSONB column is indexed.

HOWTO use JSONB, the binary formatted JSONs in PostgreSQL

Ignoring extra white space

Leave A Comment Cancel Comment

Our Projects

Quick Links

Contact Info

Follow us at

Archives

Categories