A previous blog I wrote on JSON functionality in PostgreSQL went wildly popular and it got me thinking about writing a follow-up HOWTO about JSONB.
JSONB, like JSON, is a native data type in PostgreSQL and was introduced in version 9.4. The major difference between the types is signified in the name; the extra ‘B’ stands for binary format. Some more detailed comparison is given in the table below:
JSON | JSONB |
Stores data in text format | Stores data in decomposed binary format |
Input is as fast, as no conversions are required | Input is slightly slower, as there is conversion overhead involved |
Processing functions must re-parse the data on each execution | Re-parsing is not needed, making data processing significantly faster |
Indexing is not supported | Indexing is supported |
All white space and line feeds in the input are preserved as-is | Extra white space and line feeds are stripped |
Duplicate keys are retained, processing functions only consider the last value | Duplicate keys are purged at input, only the last value is stored |
Order of the keys is preserved | Order is not preserved |
In general, unless there are some highly specialized needs (like legacy applications designed to assume an order of keys), it is highly recommended that JSONB be used.
Basic handling provided by PostgreSQL for both JSON and JSONB is pretty much the same. Because I have already talked about JSON in a previous blog, I will focus on the difference JSONB provides below.
Ignoring extra white space
Let’s start with the following valid object:
{
“key1”: “value1”,
“key2”: [1, 2],
“key3”: “value3”
}
JSON data type shows the following:
json_sample=# SELECT ‘{
“key1”: “value1”,
“key2”: [1, 2],
“key3”: “value3”
}’::JSON;
json
———————–
{ +
“key1”: “value1”,+
“key2”: [1, 2], +
“key3”: “value3” +
}
(1 row)
With JSONB, however:
json_sample=# SELECT ‘{
“key1”: “value1”,
“key2”: [1, 2],
“key3”: “value3”
}’::JSONB;
jsonb
——————————————————
{“key1”: “value1”, “key2”: [1, 2], “key3”: “value3”}
(1 row)
As you can see, all extra white space and line feeds in case JSONB were stripped by the database.
Handling duplicate keys
Behavior with JSON data type:
json_sample=# SELECT ‘{“key1”: “value1”, “key2”: [1, 2, 3, 4], “key2”:[1,2], “key2″:”value2”, “key3”: “value3”}’::JSON;
json
——————————————————————————————-
{“key1”: “value1”, “key2”: [1, 2, 3, 4], “key2”:[1,2], “key2″:”value2”, “key3”: “value3”}
(1 row)
Behavior with JSONB data type:
json_sample=# SELECT ‘{“key1”: “value1”, “key2”: [1, 2, 3, 4], “key2”:[1,2], “key2″:”value2”, “key3”: “value3”}’::JSONB;
jsonb
——————————————————–
{“key1”: “value1”, “key2”: “value2”, “key3”: “value3”}
(1 row)
The above demonstrates how duplicate keys are essentially ignored by JSONS and only the last value is kept.
Indexing
In my personal opinion, indexing is the biggest differentiating factor between JSON and JSONB data types in PostgreSQL, making it the biggest reason to opt for JSONB wherever possible. Let’s illustrate with an example below:
Let’s create 2 tables with the same structure, one with an index and one without an index:
json_sample=# CREATE TABLE sales_jsonb (id INTEGER, sale JSONB);
CREATE TABLE
json_sample=# CREATE TABLE sales_jsonb_index (id INTEGER, sale JSONB);
CREATE TABLE
json_sample=# CREATE INDEX idxgin ON sales_jsonb_index USING gin (sale);
CREATE INDEX
Now let’s add some data to both these tables. Data used in this example (512,000 rows) can be downloaded from here.
json_sample=# COPY sales_jsonb(sale) FROM ‘/Users/Shared/json.txt’;
COPY 512000
json_sample=# COPY sales_jsonb_index(sale) FROM ‘/Users/Shared/json.txt’;
COPY 512000
Running EXPLAIN on both these tables illustrates the cost difference very clearly:
json_sample=# EXPLAIN (FORMAT JSON) select * from sales_jsonb where sale @> ‘{“item”: “205386681-0”}’;
QUERY PLAN
——————————————————————
[ +
{ +
“Plan”: { +
“Node Type”: “Seq Scan”, +
“Relation Name”: “sales_jsonb”, +
“Alias”: “sales_jsonb”, +
“Startup Cost”: 0.00, +
“Total Cost”: 14748.00, +
“Plan Rows”: 512, +
“Plan Width”: 103, +
“Filter”: “(sale @> ‘{\”item\”: \”205386681-0\”}’::jsonb)”+
} +
} +
]
(1 row)
json_sample=# EXPLAIN (FORMAT JSON) select * from sales_jsonb_index where sale @> ‘{“item”: “205386681-0”}’;
QUERY PLAN
————————————————————————–
[ +
{ +
“Plan”: { +
“Node Type”: “Bitmap Heap Scan”, +
“Relation Name”: “sales_jsonb_index”, +
“Alias”: “sales_jsonb_index”, +
“Startup Cost”: 39.97, +
“Total Cost”: 1670.57, +
“Plan Rows”: 512, +
“Plan Width”: 103, +
“Recheck Cond”: “(sale @> ‘{\”item\”: \”205386681-0\”}’::jsonb)”, +
“Plans”: [ +
{ +
“Node Type”: “Bitmap Index Scan”, +
“Parent Relationship”: “Outer”, +
“Index Name”: “idxgin”, +
“Startup Cost”: 0.00, +
“Total Cost”: 39.84, +
“Plan Rows”: 512, +
“Plan Width”: 0, +
“Index Cond”: “(sale @> ‘{\”item\”: \”205386681-0\”}’::jsonb)”+
} +
] +
} +
} +
]
(1 row)
As you can see, the total cost for the same query on the same table structure for the same data set goes from 14,748 to 1,670.57 when the JSONB column is indexed.