What I Learned at Work this Week: Parsing JSON with PostgreSQL

Mike Diaz
4 min readMar 6, 2022
credit: Pixabay on Pexels

I think that my favorite way to learn is to run someone else’s code and log/debug/edit it to understand how it works. I got that opportunity this week when I was trying to figure out how certain rows were filtered in a Postgres database. The query looked something like this:

select *
from setup_configs
where 'p1' in (
select jsonb_array_elements(setup_json -> 'ap' -> 'trg') ->> 'pt'
)

I had never seen arrows in a SQL query before, so this subquery was pretty confusing for me. I took some steps to simplify it, though I couldn’t isolate it completely because it’s running a function on setup_json, which is a column in the setup_configs table. After adding and removing different pieces of the code and reading a bunch of errors (and a little documentation), I figured out what was going on here.

->

To build a better understanding, I opened a new tab in my browser and typed what I saw:

After a bit of reading, it became clear that the arrows are used for traversing JSON in Postgres. This became obvious in retrospect because the column in question, setup_json, is populated with JSON! Here’s a sample of what one of the cells might look like:

'{
"ap": {
"trg": [
{
"nm":"trg1",
"pt":"p1"
},
{
"nm":"trg2",
"pt":"p2"
}
],
"enr": {
"enr1":"abc",
"enr2":"xyz"
}
}
}'

What does this mean? It’s hard to read for two reasons: First, this is a minified configuration designed to take up as little space on a site as possible. Second, I’ve replaced all the real values with placeholders because I don’t want to give too much insight into the actual production code. While we can’t know what this means, it’s a great example if we want to practice using Postgres JSON Functions and Operators. The -> operator is used to select the value of a specific key from the top layer of the JSON object. So if we wrote:

select setup_json
from setup_configs;

We’d get that whole string that I wrote above in our resulting table. But by adding ->:

select setup_json -> 'ap'
from setup_configs;

We jump down one level:

"trg": [
{
"nm":"trg1",
"pt":"p1"
},
{
"nm":"trg2",
"pt":"p2"
}
],
"enr": {
"enr1":"abc",
"enr2":"xyz"
}
}

In the original query, we use the -> twice, which means we’re selecting from ap and then from trg:

[
{
"nm":"trg1",
"pt":"p1"
},
{
"nm":"trg2",
"pt":"p2"
}
]

jsonb_array_elements

There were still two other pieces of this query that I didn’t understand, firstly jsonb_array_elements. This is one of those functions that has a really great name because it takes array elements and returns them as jsonb. Take another look at the JSON that I printed above — it’s an array! With jsonb_array_elements, we can break each element in this array into its own row in a table. Let’s update our query to reflect that:

select jsonb_array_elements(setup_json -> 'ap' -> 'trg')
from setup_configs;

Our results would be:

  | jsonb_array_elements
--|---------------------
1 |{"nm":"trg1","pt":"p1"}
2 |{"nm":"trg2","pt":"p2"}

This is perfect! With two distinct rows, I can pull only the specific elements that I’m looking for within this array. If you’re like me, you might be asking how jsonb is distinct compared to json. They’re practically the same, but jsonb is stored in a decomposed binary form, so it’s more efficient.

-> vs ->>

Looking once more at our original query, we see that we’re trying to find elements where the pt key equals p1, so we have to dig a little deeper into our object. This time, however, we use ->> instead of ->. This Stack Overflow question was really helpful in illustrating that they’ll both pull the same piece of data, but they’ll return different data types. -> returns JSON or jsonb, while ->> returns text. That’s important since we’re comparing our result to a string. I tested out the query without the conversion just to see what would happen, and got this error:

invalid input syntax for type json

I assumed that the query would just return no results, but it actually caught the fact that our data types did not match before it even started the comparison. With what we know now, this error message makes total sense. Looking one more time at our original query, which does use the “as text” arrow operator:

select *
from setup_configs
where 'p1' in (
select jsonb_array_elements(setup_json -> 'ap' -> 'trg') ->> 'pt'
)

We can see that we’ll pull the entire row from every setup_config entry where the string p1 matches the pt value of every element in the ap -> trg array. Once we have that data, we could do a bulk update, or maybe deactivate all of those configurations when they are no longer relevant.

The wonders of SQL

As I learn more about SQL, I continue to be surprised at what different databases are capable of. There’s a lot more that I didn’t discuss in the PostgreSQL docs, so definitely check those out if you’re working with JSON in your DB. And remember, don’t be afraid to take apart someone else’s code! You’ll probably learn something.

Sources

--

--