JSON is a widely used format to exchange and store information, mostly because of it's simple yet expressive syntax. It can represent simple key-value pairs just as well as a deeply nested document. Since 2012, PostgreSQL has continuously added support for storing and operating on JSON documents within table columns, to close the gap between traditional RDBMS and their NoSQL alternatives. It has proven vital for fast prototyping or refactoring, where a fixed schema would impose additional cost or slow down operations.
Preparing a sample environment
For this post, we are going to create a simplified table named json_table
, containing just one column named json_column
. Into that testing table, we insert a single row, containing the following JSON document:
{
"name": "Samantha",
"nicknames": ["Sammy", "Sam"],
"tattoos": [
{
"location": "shoulder",
"shape": "heart"
},
{
"location": "wrist",
"shape": "key"
}
],
"work": {
"profession": "baker",
"employer": {
"name": "Sample Bakery Inc",
"address": {
"street": "Sample Street 1",
"zipcode": 12345,
"city": "Houston",
"state": "Texas"
}
}
}
}
To create the table and insert the document, execute the following lines in your PostgreSQL shell:
CREATE TABLE json_table (json_column JSONB);
INSERT INTO json_table (json_column) VALUES ('{"name":"Samantha","nicknames":["Sammy","Sam"],"tattoos":[{"location":"shoulder","shape":"heart"},{"location":"wrist","shape":"key"}],"work":{"profession":"baker","employer":{"name":"Sample Bakery Inc","address":{"street":"Sample Street 1","zipcode":12345,"city":"Houston","state":"Texas"}}}}');
This may get broken by automatically generated formatting of your shell. You can disable this in psql with
\a
\t on
The \a
will toggle between aligned and unaligned formatting modes, while \t on
instructs psql
to only return the data you selected, without generated table header/footer lines.
To improve readability when querying JSON documents, you can always wrap a JSON result into jsonb_pretty()
to have it nicely formatted.
JSON and JSONB
You may have noticed that we set the data type to JSONB
, not JSON
. PostgreSQL provides both of these types, but they are very different internally.
JSON
was the first data type to work with JSON documents in PostgreSQL. It stores documents as plaintext, running only very basic validation on it before storing. The document is kept as-is, maintaining even optional superfluous whitespace, key order and even duplicate keys. Running queries on this data type is very expensive, as it needs to re-parse the document every time. It should only be used in cases where you never intend to update or run queries on the document, other than reading it entirely, for example to store dynamic attachments to log data or objects that will be interpreted by your application only.JSONB
was added shortly after, mainly to address the high performance penalty of theJSON
type. It stores the document in a binary format, allowing for more efficient querying and selective updates. It will transform documents on insert, removing duplicate keys, optional whitespace and losing original key order. Since none of these things are supported by the JSON specification anyway, this is usually not a problem, often even desirable behaviour. This data type is what almost all applications will want to use for JSON documents.
For the remainder of this post, we will exclusively work with the JSONB
column of our sample table.
Selecting fields with the ->
operator
PostgreSQL provides a number of operators and functions to query JSON documents. The most basic one is ->
, used to select a field within a document.
Selecting the name
field from our test table would look like this:
SELECT json_column->'name' FROM json_table;
This returns "Samantha"
. You may have noticed that it returned the value in quotes. That's by design: the returned value is a JSONB
document, so the value is not postgres TEXT
, but a JSON string. This allows us to chain operations together, for example to select nested fields:
SELECT json_column->'work'->'profession' FROM json_table;
If you wanted to retrieve the value as normal postgres TEXT
instead, you can use the ->>
variant of this operator instead:
SELECT json_column->>'name' FROM json_table;
This time, it simply returns Samantha
, without quotes. Note that you can only use this operator at the end of an operator chain, as the returned TEXT
data type isn't compatible with JSONB
operators anymore.
The -> operator can also select elements from arrays, by providing it the index of the desired element (starting at 0):
SELECT json_column->'nicknames'->0 FROM json_table;
This returns the first element in the nicknames array from our document. If provided with a negative index, it will count from the end of the array instead (-1 being the last element in the array):
SELECT json_column->'nicknames'->-1 FROM json_table;
This time, we received the last element from the array.
Understanding path
syntax
A more condensed alternative to specify a key from a JSON document is the path
syntax. This syntax is available during SELECT queries using the #> operator, and the only supported form of specifying keys for update or delete operations.
A path in for a JSON document is constructed as a list of keys, in order, seperated by commas and enclose in curly braces. To select the profession field from our sample document using path syntax would look like this: {work,profession}
.
This approach is significantly shorter than the -> operator for deeply nested queries, for example
SELECT json_column->'work'->'employer'->>'name' FROM json_table;
SELECT json_column #>> '{work,employer,name}' FROM json_table;
Both queries return the sample result, but the first is slightly more complex, requiring quotes around each field name and careful placement of the ->>
at the end of the chain, whereas the path
syntax will look the same for retrieving JSONB
results using #>
or TEXT
with #>>
.
Just like the -> operator, the path syntax can also specify array elements using their index, or a negative index to count elements in reverse:
SELECT json_column #>> '{nicknames,0}' FROM json_table; # returns first nickname from array
SELECT json_column #>> '{nicknames,-1}' FROM json_table; # returns last nickname from array
Adding and updating fields in a JSON document
To change values in a JSON document, you must use the jsonb_set()
function. This function takes the complete document to be edited, a path to the target field within that document, and a new value for it. A fourth optional boolean parameter controls what to do if path
does not point to an existing key: By default, it will simply create a new key with that name; changing it to false
will leave the document unchanged instead. This is only true for the last key specified by path; if any parent key is missing, the operation will fail quietly instead.
UPDATE json_table
SET json_column = jsonb_set(
json_column,
'{name}',
to_jsonb('Samantha Doe'::text)
);
This will change the name
field to Samantha Doe
. Note that we used to_jsonb()
on the new value to turn it from a postgres TEXT
into a JSONB
string. While we could have provided it as a literal JSON string, like '"Samantha Doe"'::JSONB
, the double quotes are a frequent cause of issues, especially when inserting other data types like INT
, which must be provided without quotes. The to_jsonb()
function will take normal postgres data types and deal with the exact conversion for us.
This approach can also be used to create new fields, for example adding an age
field with value 21
to our document:
UPDATE json_table
SET json_column = jsonb_set(
json_column,
'{age}',
to_jsonb(21)
);
Merging documents and appending to arrays
The process of merging two JSON documents refers to the practice of taking one object, and assigning all it's keys to another one, overriding existing keys and creating missing ones. In PostgreSQL, this is done using the ||
operator. Let's see an example, by merging this document and our sample document:
{
"name":"Benjamin",
"nicknames":[
"Ben",
"Bennie"
],
"hair":{
"type":"Strands",
"color":"Black"
}
}
The SQL query to achieve this would look as follows:
UPDATE json_table
SET json_column = ( json_column || '{"name": "Benjamin", "nicknames": ["Ben", "Bennie"], "hair": {"type": "Strands", "color": "Black"}}' );
The is as expected: all keys from the document to the right of the ||
operator have been created or overriden existing values in the left document (the contents of the column json_column
): Fields name
and nicknames
have been changed, while hair
was added as a new field. Note that merging created all missing parent keys as well, as opposed to jsonb_set()
. Merging can be a fast way of updating multiple fields of a JSON document at once, but can also lead to accidental data loss through overriding fields with unclean merge documents.
By combining the merging operator and jsonb_set()
, you can append values to an array:
UPDATE json_table
SET json_column = jsonb_set(
json_column,
'{nicknames}',
( json_column->'nicknames' || to_jsonb('Benny'::text) )
);
This is slightly more complex, but all it does is change the field nicknames
to it's old value, but merged with the string Benny
, effectively appending it to the array.
Removing keys from documents
To remove a key from a JSONB
document, you have two choices: The -
operator to remove a top-level key by name, or the #-
operator, to remove a key by path
.
Let's look at -
first, by removing the nicknames
field from our sample document:
UPDATE json_table SET json_column = json_column - 'nicknames';
The field nicknames is now gone. Instead of a field name, we could also provide an index if we wanted to remove an element from an array. This is handy, but restricted to top-level keys; if we wanted to remove the address
of the employer
field, this approach wouldn't work, as it can't reach nested keys. This is where #-
comes in, allowing us to specify a target by path
:
UPDATE json_table SET json_column = json_column #- '{work,employer,address}';
And just like that, even nested keys can be removed with ease.
Checking for existence, type and containment
One of JSON's main benefits is that it does not follow a rigid schema, but that also leads to new challenges: A key may or may not exist. To check for the existence of a key in PostgreSQL, the ?
operator is used. Retrieving only the name
fields of documents that also have a work
field could look like this:
SELECT json_column->>'name' FROM json_table WHERE json_column ? 'work';
This operator is restricted to top-level keys only. If you want to check the existence of a nested key, say the employer
field under work
, you will need to select it first and compare the result:
SELECT json_column->>'name' FROM json_table WHERE json_column->'work' ? 'employer';
There are variants of this operator, namely ?| to check if a document contains any of the keys given in an array, and ?& to check if it contains all of them. Retrieving the name
of documents that have an employer field containing both name
and address
values:
SELECT json_column->>'name' FROM json_table WHERE json_column->'work'->'employer' ?& ARRAY['name', 'address'];
But knowing that a key exists may not be enough, as we might still not know what kind of value is stored there. This is where the function jsonb_typeof()
comes into play:
SELECT jsonb_typeof(json_column->'name') FROM json_table;
The result of this function is a postgres TEXT
value with the name of the JSON data type stored there, one of object
, array
, string
, number
, boolean
, or null
.
For more complex checks for keys and values, the containment operator @> can be used. This has two uses: checking if an array contains a value, or checking if a document contains all key with given values. Let's see how checking if the nicknames array contains the string Sam would look:
SELECT json_column->'nicknames' @> to_jsonb('Sam'::text) FROM json_table;
To check if a document contains a name
field with value Samantha
, and an employer
with name
Sample Bakery Inc
, you use the containment operator with a JSONB
document:
SELECT json_column @> '{"name": "Samantha", "work": {"employer": {"name": "Sample Bakery Inc"}}}' FROM json_table;
This can be used to quickly check for the existence of several keys with values, even top-level and nested keys mixed with array elements all at once.
JSONB subscripting
Added in version 14, JSONB subscripting provides an alternative syntax to carry out simple operations on JSONB documents. Subscripting looks like treating the entire JSONB column like an associative array:
SELECT json_column['name'] FROM json_table;
While this may not be all that different for SELECT
queries, it vastly improves UPDATE queries:
UPDATE json_table SET json_column['name'] = to_jsonb('Ben'::text);
Subscripting allows for very simple updates to specific fields of a JSONB
document, without the need to use jsonb_set()
at all.
Note that subscripting has no alternative way to return postgres values instead of JSONB
objects, like ->>
and #>>
do, and cannot be used to delete keys from documents.
SQL/JSON path language
Also commonly referred to as JSONPath, this feature is a form of query language standardized by the IETF to provide a unified way of traversing and referring to JSON documents and keys, similar to what XPath offers for XML. The main benefit of this features is that JSONPath is not unique to PostgreSQL, but has implementations in a variety of different programming languages and databases. At first glance, simple queries using the dot notation variant of JSONPath should be famililar for people who worked with javascript before:
SELECT jsonb_path_query(json_column, '$.name') FROM json_table;
But JSONPath supports much flexible querying, for example to retrieve all name keys no matter how deeply nested:
SELECT jsonb_path_query(json_column, '$.**.name') FROM json_table;
This returns both the name
of the employer
, and the top-level name
key.
The entire spec allows for much more complex queries, even spanning context, variables and filters on path evaluations, but the query language is beyond the scope of this article.
Note that JSONPath can only be used to read documents, it cannot be used to update documents or delete keys.
PostgreSQL support for JSON documents through the JSONB
data type has significantly gained in features and performance over recent versions, slowly becoming a serious contender to the relational model approach. That said, JSON integration into PostgreSQL is still an ongoing effort, and does have issues, most notably a lack of statistics needed by the query planner, resulting in sometimes very inefficient query plans, more complex queries and a different approach to indexes, as indexing the entire document will quickly bloat the index and make even operations that use the index instead of reading the records off disk slow. The JSONB data type has reached reasonable maturity for production use, but there is still a lot of work to be done. This post covered just the bare essentials of working with JSON in PostgreSQL, if you want to dive deeper into the topics, the PostgreSQL documentation for JSON data types and JSON functions & operators are a good starting point.