Over the past few months, I’ve spent more time working on back end processes than my usual Python script work. You might know that the script in question reads a series of Postgres tables to determine the data each client wants in their specific report. Now, I’m maintaining a Java API that helps users create and edit rows in these tables. I’m learning a lot of Java, but I’m also doing new things with SQL. In the past, I’ve mainly edited tables manually because it’s easier than looking at documentation to find the proper syntax for a simple edit. But now it’s time to build long-lasting processes, so let’s do things the right way.
Aggregators (array_agg, array_remove)
In the past, I’ve written about why I’m not always happy we chose to use jsonb in our database. It’s difficult to parse with SQL, making reads, edits, and filters a pain. For example, one of our JSON properties is a list of strings, each representing a column that we’ll put in the csv report we’re writing. Unless you work with a lot of jsonb, you probably can’t think of an easy way to, say, update this list to remove a deprecated column.
For a different aspect of the report, we use a more traditional mapping table relationship, like this:
event_map_id | config_id | event_name
------------------------------------
1 | 1 | 'JOIN'
2 | 1 | 'UNSUBSCRIBE'
3 | 2 | 'JOIN'
Here we’re configuring which “events” we’d like to report on. Instead of putting them into a list like our column names, we’re saying “config 1 is going to report on JOIN and UNSUBSCRIBE events” and “config 2 is going to only report on JOIN.” If we decided “JOIN” isn’t a good name anymore, it wold be much easier to write SQL that replaces that value, or deletes those rows, or solves the problem however we like. But there is a disadvantage here: as this display scales, it becomes more difficult for the human eye to organize and understand the data.
In my case, I have to handle an API call that displays the details of any one configuration in a UI. If I say SELECT * FROM event_map WHERE config_id = 1;
I’m going to end up with two rows. But I want to build a single payload, so I have to figure out how to put this all into one object.
Most SQL users are probably already thinking GROUP BY, which we can combine with ARRAY_AGG to turn these various rows into a single list (DB Fiddle here if you want to try it out):
-- CREATING THE TABLE
CREATE TABLE event_map (
event_map_id integer NOT NULL,
config_id integer NOT NULL,
event_name varchar NOT NULL
);
-- POPULATING THE TABLE
INSERT INTO event_map VALUES (1, 1, 'JOIN');
INSERT INTO event_map VALUES (2, 1, 'UNSUBSCRIBE');
INSERT INTO event_map VALUES (3, 2, 'JOIN');
-- QUERYING THE TABLE
SELECT config_id,
ARRAY_AGG(event_name)
FROM event_map
GROUP BY config_id;
In my case, we had to filter out some NULL events that were coming through. If we want to remove anything from our list, we can use ARRAY_REMOVE:
SELECT config_id,
ARRAY_REMOVE(ARRAY_AGG(event_name), 'UNSUBSCRIBE')
FROM event_map
GROUP BY config_id;
JSON Aggregation
In our first example, we could create a list because each config had only one event associated with it. But what if we want to group and encapsulate even more data? Here’s an example:
-- CREATING THE SECOND TABLE
CREATE TABLE user_config (
user_map_id integer NOT NULL,
config_id integer NOT NULL,
user_name varchar NOT NULL,
is_active boolean NOT NULL
);
-- POPULATING THE SECOND TABLE
INSERT INTO user_config VALUES (1, 1, 'mdiaz', true);
INSERT INTO user_config VALUES (2, 1, 'jbrown', false);
INSERT INTO user_config VALUES (3, 1, 'dlee', false);
SELECT *
FROM user_config;
The result here will be three rows. For the sake of argument, let’s say we want to put all this data into one JSON object so we can better build our payload for the UI. First, we have to turn this into JSON:
SELECT config_id,
JSONB_BUILD_OBJECT(
'user_name', user_name,
'is_active', is_active
)
FROM user_config;
JSON_BUILD_OBJECT
can combine data from different columns into one object. It accepts any number of argument pairs, with the first being a property name and the second being the column to populate. Now we can group these together:
SELECT config_id,
JSONB_AGG(JSONB_BUILD_OBJECT(
'user_name', user_name,
'is_active', is_active
))
FROM user_config
GROUP BY config_id;
One of the reasons user_config
is its own table is that not every config
has user data. To assure that our data type would always be correct, we can use COALESCE
to handle nulls. Let’s create the config
table and then run the COALESCE
:
-- CREATING THE THIRD TABLE
CREATE TABLE config (
config_id integer NOT NULL,
is_active boolean NOT NULL
);
-- POPULATING THE THIRD TABLE
INSERT INTO config VALUES (1, true);
INSERT INTO config VALUES (2, false);
SELECT c.config_id,
COALESCE(JSONB_AGG(JSONB_BUILD_OBJECT(
'user_name', uc.user_name,
'is_active', uc.is_active
))
FILTER (WHERE uc.user_name is not null), '[]'::jsonb
)
FROM config c LEFT JOIN user_config uc
ON c.config_id = uc.config_id
GROUP BY c.config_id;
We wrap our JSON in COALESCE
and add a FILTER
that can remove unwanted rows. Finally, we add a second argument indicating what we want to populate instead. In my case, I used this to standardize any non-conforming rows so that I could easily put them in a GROUP BY
. This syntax took me a long time to find online, but the key is that it has to match the data type (jsonb) so I pass an array as a string and then cast it with ::
.
Adding a column constraint
Relative to the last section, this is a layup. Let’s say we want to make sure that there are no duplicates in our user_config
table. If we just want to avoid duplicates in one column, we could run this migration:
ALTER TABLE user_config
ADD CONSTRAINT unique_user_name UNIQUE(user_name);
We name our CONSTRAINT
, which is helpful for referencing it later, and then we declare what we want to be unique. Now, we won’t be able to add mdiaz
as a user name in another row.
If we want to say that the same user name can be associated with a second config_id
, we can create a UNIQUE
constraint that is more specific:
ALTER TABLE user_config
ADD CONSTRAINT unique_user_name UNIQUE(config_id, user_name);
Now, we’ll be able to add a new row with mdiaz
, but we’ll hit an exception if that row also uses 1 as config_id
.
ON CONFLICT
We’ve got an API that can read and write to these tables and we’ve just introduced a constraint saying we can’t write duplicate rows. We can use that constraint to create a dynamic query that will INSERT or UPDATE depending on the what’s in the table.
Let’s say our UI allows users to edit a bunch of fields in their config. They could add users, change user names, or deactivate users. We could write three different pages for these options, but that wouldn’t be good UI. Instead, we’ll pass the same type of payload regardless of the action and then we can use the back end, in this case an ON CONFLICT
statement, to determine the action. The way our constraint is currently set up, we can’t support name editing, but we can support toggling is_enabled
:
INSERT INTO user_config (user_map_id, config_id, user_name, is_enabled)
VALUES(:userMapId, :configId, :userName, :is_enabled)
ON CONFLICT ON CONSTRAINT unique_user_name DO UPDATE
SET user_map_id = :userMapId, config_id = :configId, user_name = :userName, is_enabled = :isEnabled
WHERE id = :userMapId;
We’ve written our front end to always pass back a fully populated configuration, even if the user is trying to edit rather than add. If they do want to add, this query will do the INSERT defined on the first two lines. But if they’re trying to change the row from active to inactive, we’ll hit the alternate logic after ON CONFLICT
. This runs SET
instead, which will update the existing row.
Putting it together
My recent work has been very challenging because my Java is rusty and it’s my first experience with Spring, GraphQL, and JDBI. But clearly it’s a great learning opportunity — I’m picking up a ton of new skills and honing my existing ones. We always work to keep our momentum going and stay positive. If we’ve got a good team around us, we’ll go far.
Sources
- DB Fiddle
- PostgreSQL ARRAY_AGG Function, PostgreSQL Tutorial
- JSON Functions and Operators, PostgreSQL Docs
- PostgreSQL COALESCE, PostgreSQL Tutorial
- Add Constraint to Table in PostgreSQL, TutorialsTeacher
- PostgreSQL Upsert Using INSERT ON CONFLICT statement, PostgreSQL Tutorial