What I Learned at Work this Week: Recurring MySQL Syntax
I like writing SQL. Whether it’s Snowflake, Postgres, or MySQL (these are surely the only querying languages), it all just makes sense. I don’t have to worry about classes or interfaces, asynchronous or parallel functions; I just have to try and write a simple, efficient query. Piece of cake.
Of course I realize that everything I dread about coding exists in query languages as well, or at least would need to be implemented in a service that ultimately makes a query, and this week I experienced it firsthand. A teammate of mine write a MySQL PR with a lot of functions that I’ve never seen before. I approved it, and it worked, but to make sure I really understand it, I’m going to try and repeat what he taught me.
The Goal
My team maintains a DB table with a metadata
json column. That column is parsed by a Java service, which looks for certain keys and executes different logic based on their values. For example, there might be a key called “accountId” which we use to determine what to append to an API endpoint before sending a message.
In this case, my teammate was trying to standardize the metadata to eliminate a redundant use case. We had two different pieces of logic that did the same thing, but one of them looked for a value called source
and the other called list
. It would be an obvious improvement to get rid of one of them, but then half of our metadata rows wouldn’t be properly processed. So, in addition to deleting logic that references “list,” we wanted to replace the metadata keys that said “list” with “source.”
A stored procedure
As usual, I’ll recreate a simplified version of the code that doesn’t give away all the details of our schema. To start:
DROP PROCEDURE IF EXISTS restructure_metadata;
DELIMITER $$
CREATE PROCEDURE restructure_metadata()
BEGIN
We’re about to create a new stored procedure, which is like a SQL function. We define it so that we can call it multiple times or in different circumstances, like if we want to pass an argument. Or procedure must be named and, if we try to define something that already exists, we’ll get an error. So we start with DROP PROCEDURE IF EXISTS
just to give us a clean slate.
Next, we define a delimiter. In MySQL, the standard delimiter is a semicolon, ;
. This character acts as the break between statements, but this can cause a problem if we’re using SQL to define SQL. If there’s one delimiter, how can we determine what’s part of the stored procedure and what’s part of the code surrounding it? So we create a new delimiter that we’re going to use to help close things out when we’re done defining the procedure.
Next, we start creating the stored procedure with the CREATE PROCEDURE
command. BEGIN
will indicate that whatever follows is part of the procedure.
Declaring variables
The way we use a stored procedure here is almost like a class in Java. We’re going to establish some values in our outermost scope so that we can reference them later:
DECLARE done INT DEFAULT 0;
DECLARE currentRowId INTEGER;
DECLARE rowCursor CURSOR FOR SELECT c.id
FROM configurations c
WHERE c.metadata -> '$.values.sourceIds' LIKE '"[%';
DECLARE
creates variables. The first line here is saying that done
is a variable of type INT
which defaults to 0
. currentRowId
is an INTEGER
which does not have a default. Finally, we declare rowCursor
as a CURSOR, which is a database object used for iterating the result of a SELECT statement inside a stored procedure. We want to execute some logic on every relevant row in our table and our cursor will help us collect a group to iterate through and keep track of where we are within that group.
We filter by a condition: c.metadata -> '$.values.sourceIds' LIKE '"[%'
. The ->
arrow allows us to parse json and the string that follows is the path to the value we seek. $
indicates that what we’re looking for is going to be a key and anything else, separated by .
, will drill down. We can expect our data to look something like this:
{
"values":
"sourceIds":
"[...
Note that the value of sourceIds
is an array, or at least we are specifying that it is with the second half of our clause: LIKE '"[%
. The %
is a wildcard saying “as long as the characters before this match, we’ll accept anything that comes after.” So our rowCursor
will now iterate through all rows that meet this condition.
The loop
We’ve got our cursor all set up. What do we do with it?
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN rowCursor;
label: LOOP
FETCH rowCursor INTO currentRowId;
IF done = 1 THEN LEAVE label;
END IF;
We declare a handler, which is a statement that checks conditions to determine whether a loop should continue. FOR NOT FOUND
means if there are no more rows to iterate through. When that condition is met, we set our previously established done
variable to 1
.
OPEN
means we’re ready to start using our cursor. We use a begin_label for our loop so that we can reference it by name later. The loop is literally called “label,” which I find confusing but seems to be a pattern based on documentation.
Inside the loop, we FETCH
the value of our cursor, which is an ID of one of the rows we want to edit, INTO
the variable we previously established, currentRowId
. This means we’re setting that variable to the value of the next ID. IF done = 1
, it means there is no next value, so we exit the loop. END IF
matches the BEGIN
we used to open this process. If the IF
condition is met, we close things out.
Parsing the metadata
If our metadata objects were simple and consistent, we would have just represented them as columns in the table instead of resorting to json. Sadly, there’s a bit of gnarly nesting, so we have to go through a few steps to pull out the data we’re looking for:
SET @sources = JSON_EXTRACT(CAST(JSON_UNQUOTE(
(SELECT c.metadata -> '$.values.sourceIds' as signupSource
FROM configurations c
WHERE c.id = currentRowId)) AS JSON), '$[*]');
SELECT @sources;
SET @result = '[]';
SET @i = 0;
SET @key_count = JSON_LENGTH(@sources);
We want to dig into the json, copy the data associated with each “list,” remove it, and then re-add it, but with the key being “source” instead of “list.” We remember that the “list” objects are contained inside of arrays that are inside of sourceIds
objects that are inside of a values
object in the metadata. In this code, we dig down into the json to pull out that array and associate it with the variable @sources
. But we don’t just save it in its original format, because of course there’s another wrinkle.
The list we’re extracting is json within json, so its quotes are escaped with a /
. For example, it looks like this:
"sourceIds":
"[{\"list\":\"12345\",\"someOtherValue\":\"54321\"}]"
We really don’t like this pattern, but that’s what the code expects and we’re not ready to change it, so we’re stuck with it. For our SQL to understand it, we need to change its format. JSON_UNQUOTE, combined with CAST, can take this string and transform it into JSON that MySQL can work with. That’s all wrapped in JSON_EXTRACT, which pulls out specific parts of a document that match the path provided. In this case, the path '$[*]'
means “everything inside the list.”
Let’s iterate
Next we declare three variables that will help us iterate through that list.
WHILE @i < @key_count DO
SET @element = JSON_EXTRACT(@sources, CONCAT('$[', @i, ']'));
SET @sourceValue = JSON_UNQUOTE(JSON_EXTRACT(@element, '$.list'));
SET @removedList = JSON_REMOVE(@element, '$.list');
SET @updatedElement = JSON_INSERT(@removedList, '$.source', @sourceValue);
SET @result = JSON_ARRAY_APPEND(@result, '$', CAST(@updatedElement as JSON));
SET @i = @i + 1;
END WHILE;
Let’s break down each step here:
WHILE
: we’re going to increase an iterator number, which is starting at 0, until it reaches the total number of elements in a collection, a classic pattern.SET @element
: We again useJSON_EXTRACT
, but this time to pull a specific element out of the@sources
list. As we iterate through that list, the index we use will increase incrementally.SET @sourceValue
: We expect the@element
to be an object containing key-value pairs where one of the keys may be"list"
and the value would be some ID. JSON_UNQUOTE and JSON_EXTRACT pull out that value, if it exists (looking for the “path” of'$.list’
) and transform it into a string we can use later.SET @removedList
: We remove the element we just copied from the collection with JSON_REMOVE. This function returns the updated “json document,” in other words, our collection without the unwanted element.SET @updatedElement
: JSON_INSERT takes three arguments: a json document (the collection we just removed the “list” from), a path (the “key” in our key-value pair), and a value (the second half of the key-value pair). We have created a copy of our existing element, but with “source” instead of “list.”SET @result
: Remember that we declared@result
as an empty array earlier. Now we’re using JSON_ARRAY_APPEND to put the new element into it. This way, if the array has three elements and two of them have keys of “list,” they’ll all be re-inserted in the same order.SET @i
: As is crucial in this type of WHILE pattern, we must increment the index to avoid an infinite loop. Our loop will end once the index is equal to the key_count, which will trigger theEND WHILE
line.
To conceptualize the change, imagine this:
"sourceIds":
"[
{\"list\":\"8\",\"someOtherValue\":\"North\"},
{\"location\":\"United States\",\"Zip\":\"12345\"},
{\"list\":\"10\",\"someOtherValue\":\"West\"},
]"
Becomes this:
"sourceIds":
"[
{\"source\":\"8\",\"someOtherValue\":\"North\"},
{\"location\":\"United States\",\"Zip\":\"12345\"},
{\"source\":\"10\",\"someOtherValue\":\"West\"},
]"
The larger loop
Remember, there are two loops. We’ve drilled all the way down to create an updated collection of sourceIds
, but we haven’t updated our row yet. Our final step is:
SET @updatedMetadata = JSON_OBJECT('values', JSON_OBJECT('sourceIds', @result));
SET @finalUpdatedMetadata = JSON_MERGE_PATCH(
(SELECT c.metadata
FROM configurations c
WHERE c.id = currentRowId),
@updatedMetadataCSSWithValues
);
UPDATE configurations c
SET metadata =@finalUpdatedMetadata
WHERE c.id = currentRowId;
SET @updatedMetadata
: Our updatedsourceIds
are nested undervalues
in themetadata
column. In this step, we’re rebuilding that object using the@result
object that we just established.SET @finalUpdatedMetadata
: We select the row we want to update, which we know because we’re keeping track of the ID, and we use JSON_MERGE_PATCH to merge our new metadata into the old. The merge helps us account for any other children ofvalues
which exist in the metadata — we didn’t include them in our new object, but we don’t want to remove them from the json.UPDATE
: We drop the new metadata into our row!
The end
We made it! But don’t forget to close the loops:
end loop;
end $$
CALL restructure_metadata();
We end the outer loop and then end the procedure by using the delimiter we declared at the very beginning of the code. Since this is a migration file, we want to invoke the procedure right away and just one time. So we call it using CALL
.
If you find this confusing…I agree with you! But the practice of going through it step-by-step was a huge benefit. Understanding part of it, even a small part, is much better than understanding nothing at all!
Sources
- Defining Stored Programs, MySQL Docs
- MySQL Variables, Tutorials Point
- Cursors, MySQL Tutorial
- DECLARE … HANDLER Statement, MySQL Docs
- Statement Labels, MySQL Docs
- Functions That Modify JSON Values, MySQL Docs
- Cast Functions and Operators, MySQL Docs
- Functions That Search JSON Values, MySQL Docs