What I Learned at Work this Week: Recurring MySQL Syntax

Mike Diaz
8 min readApr 6, 2025
Photo by Elizaveta Rukhtina : https://www.pexels.com/photo/reflection-of-a-woman-on-the-mirrors-while-taking-photos-using-a-dslr-camera-10013237/

I like writing SQL. Whether it’s Snowflake, Postgres, or MySQL (these are surely the only querying languages), it all just makes sense. I don’t have to worry about classes or interfaces, asynchronous or parallel functions; I just have to try and write a simple, efficient query. Piece of cake.

Of course I realize that everything I dread about coding exists in query languages as well, or at least would need to be implemented in a service that ultimately makes a query, and this week I experienced it firsthand. A teammate of mine write a MySQL PR with a lot of functions that I’ve never seen before. I approved it, and it worked, but to make sure I really understand it, I’m going to try and repeat what he taught me.

The Goal

My team maintains a DB table with a metadata json column. That column is parsed by a Java service, which looks for certain keys and executes different logic based on their values. For example, there might be a key called “accountId” which we use to determine what to append to an API endpoint before sending a message.

In this case, my teammate was trying to standardize the metadata to eliminate a redundant use case. We had two different pieces of logic that did the same thing, but one of them looked for a value called source and the other called list. It would be an obvious improvement to get rid of one of them, but then half of our metadata rows wouldn’t be properly processed. So, in addition to deleting logic that references “list,” we wanted to replace the metadata keys that said “list” with “source.”

A stored procedure

As usual, I’ll recreate a simplified version of the code that doesn’t give away all the details of our schema. To start:

DROP PROCEDURE IF EXISTS restructure_metadata;
DELIMITER $$

CREATE PROCEDURE restructure_metadata()
BEGIN

We’re about to create a new stored procedure, which is like a SQL function. We define it so that we can call it multiple times or in different circumstances, like if we want to pass an argument. Or procedure must be named and, if we try to define something that already exists, we’ll get an error. So we start with DROP PROCEDURE IF EXISTS just to give us a clean slate.

Next, we define a delimiter. In MySQL, the standard delimiter is a semicolon, ;. This character acts as the break between statements, but this can cause a problem if we’re using SQL to define SQL. If there’s one delimiter, how can we determine what’s part of the stored procedure and what’s part of the code surrounding it? So we create a new delimiter that we’re going to use to help close things out when we’re done defining the procedure.

Next, we start creating the stored procedure with the CREATE PROCEDURE command. BEGIN will indicate that whatever follows is part of the procedure.

Declaring variables

The way we use a stored procedure here is almost like a class in Java. We’re going to establish some values in our outermost scope so that we can reference them later:

DECLARE done INT DEFAULT 0;
DECLARE currentRowId INTEGER;
DECLARE rowCursor CURSOR FOR SELECT c.id
FROM configurations c
WHERE c.metadata -> '$.values.sourceIds' LIKE '"[%';

DECLARE creates variables. The first line here is saying that done is a variable of type INT which defaults to 0. currentRowId is an INTEGER which does not have a default. Finally, we declare rowCursor as a CURSOR, which is a database object used for iterating the result of a SELECT statement inside a stored procedure. We want to execute some logic on every relevant row in our table and our cursor will help us collect a group to iterate through and keep track of where we are within that group.

We filter by a condition: c.metadata -> '$.values.sourceIds' LIKE '"[%' . The -> arrow allows us to parse json and the string that follows is the path to the value we seek. $ indicates that what we’re looking for is going to be a key and anything else, separated by ., will drill down. We can expect our data to look something like this:

{
"values":
"sourceIds":
"[...

Note that the value of sourceIds is an array, or at least we are specifying that it is with the second half of our clause: LIKE '"[%. The % is a wildcard saying “as long as the characters before this match, we’ll accept anything that comes after.” So our rowCursor will now iterate through all rows that meet this condition.

The loop

We’ve got our cursor all set up. What do we do with it?

DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN rowCursor;

label: LOOP
FETCH rowCursor INTO currentRowId;

IF done = 1 THEN LEAVE label;
END IF;

We declare a handler, which is a statement that checks conditions to determine whether a loop should continue. FOR NOT FOUND means if there are no more rows to iterate through. When that condition is met, we set our previously established done variable to 1.

OPEN means we’re ready to start using our cursor. We use a begin_label for our loop so that we can reference it by name later. The loop is literally called “label,” which I find confusing but seems to be a pattern based on documentation.

Inside the loop, we FETCH the value of our cursor, which is an ID of one of the rows we want to edit, INTO the variable we previously established, currentRowId. This means we’re setting that variable to the value of the next ID. IF done = 1, it means there is no next value, so we exit the loop. END IF matches the BEGIN we used to open this process. If the IF condition is met, we close things out.

Parsing the metadata

If our metadata objects were simple and consistent, we would have just represented them as columns in the table instead of resorting to json. Sadly, there’s a bit of gnarly nesting, so we have to go through a few steps to pull out the data we’re looking for:

SET @sources = JSON_EXTRACT(CAST(JSON_UNQUOTE(
(SELECT c.metadata -> '$.values.sourceIds' as signupSource
FROM configurations c
WHERE c.id = currentRowId)) AS JSON), '$[*]');
SELECT @sources;

SET @result = '[]';
SET @i = 0;
SET @key_count = JSON_LENGTH(@sources);

We want to dig into the json, copy the data associated with each “list,” remove it, and then re-add it, but with the key being “source” instead of “list.” We remember that the “list” objects are contained inside of arrays that are inside of sourceIds objects that are inside of a values object in the metadata. In this code, we dig down into the json to pull out that array and associate it with the variable @sources. But we don’t just save it in its original format, because of course there’s another wrinkle.

The list we’re extracting is json within json, so its quotes are escaped with a /. For example, it looks like this:

"sourceIds":
"[{\"list\":\"12345\",\"someOtherValue\":\"54321\"}]"

We really don’t like this pattern, but that’s what the code expects and we’re not ready to change it, so we’re stuck with it. For our SQL to understand it, we need to change its format. JSON_UNQUOTE, combined with CAST, can take this string and transform it into JSON that MySQL can work with. That’s all wrapped in JSON_EXTRACT, which pulls out specific parts of a document that match the path provided. In this case, the path '$[*]' means “everything inside the list.”

Let’s iterate

Next we declare three variables that will help us iterate through that list.

WHILE @i < @key_count DO
SET @element = JSON_EXTRACT(@sources, CONCAT('$[', @i, ']'));
SET @sourceValue = JSON_UNQUOTE(JSON_EXTRACT(@element, '$.list'));
SET @removedList = JSON_REMOVE(@element, '$.list');
SET @updatedElement = JSON_INSERT(@removedList, '$.source', @sourceValue);

SET @result = JSON_ARRAY_APPEND(@result, '$', CAST(@updatedElement as JSON));

SET @i = @i + 1;
END WHILE;

Let’s break down each step here:

  1. WHILE: we’re going to increase an iterator number, which is starting at 0, until it reaches the total number of elements in a collection, a classic pattern.
  2. SET @element: We again use JSON_EXTRACT, but this time to pull a specific element out of the @sources list. As we iterate through that list, the index we use will increase incrementally.
  3. SET @sourceValue: We expect the @element to be an object containing key-value pairs where one of the keys may be "list" and the value would be some ID. JSON_UNQUOTE and JSON_EXTRACT pull out that value, if it exists (looking for the “path” of '$.list’) and transform it into a string we can use later.
  4. SET @removedList: We remove the element we just copied from the collection with JSON_REMOVE. This function returns the updated “json document,” in other words, our collection without the unwanted element.
  5. SET @updatedElement: JSON_INSERT takes three arguments: a json document (the collection we just removed the “list” from), a path (the “key” in our key-value pair), and a value (the second half of the key-value pair). We have created a copy of our existing element, but with “source” instead of “list.”
  6. SET @result: Remember that we declared @result as an empty array earlier. Now we’re using JSON_ARRAY_APPEND to put the new element into it. This way, if the array has three elements and two of them have keys of “list,” they’ll all be re-inserted in the same order.
  7. SET @i: As is crucial in this type of WHILE pattern, we must increment the index to avoid an infinite loop. Our loop will end once the index is equal to the key_count, which will trigger the END WHILE line.

To conceptualize the change, imagine this:

"sourceIds":
"[
{\"list\":\"8\",\"someOtherValue\":\"North\"},
{\"location\":\"United States\",\"Zip\":\"12345\"},
{\"list\":\"10\",\"someOtherValue\":\"West\"},
]"

Becomes this:

"sourceIds":
"[
{\"source\":\"8\",\"someOtherValue\":\"North\"},
{\"location\":\"United States\",\"Zip\":\"12345\"},
{\"source\":\"10\",\"someOtherValue\":\"West\"},
]"

The larger loop

Remember, there are two loops. We’ve drilled all the way down to create an updated collection of sourceIds, but we haven’t updated our row yet. Our final step is:

SET @updatedMetadata = JSON_OBJECT('values', JSON_OBJECT('sourceIds', @result));

SET @finalUpdatedMetadata = JSON_MERGE_PATCH(
(SELECT c.metadata
FROM configurations c
WHERE c.id = currentRowId),
@updatedMetadataCSSWithValues
);

UPDATE configurations c
SET metadata =@finalUpdatedMetadata
WHERE c.id = currentRowId;
  1. SET @updatedMetadata: Our updated sourceIds are nested under values in the metadata column. In this step, we’re rebuilding that object using the @result object that we just established.
  2. SET @finalUpdatedMetadata: We select the row we want to update, which we know because we’re keeping track of the ID, and we use JSON_MERGE_PATCH to merge our new metadata into the old. The merge helps us account for any other children of values which exist in the metadata — we didn’t include them in our new object, but we don’t want to remove them from the json.
  3. UPDATE: We drop the new metadata into our row!

The end

We made it! But don’t forget to close the loops:

end loop;
end $$


CALL restructure_metadata();

We end the outer loop and then end the procedure by using the delimiter we declared at the very beginning of the code. Since this is a migration file, we want to invoke the procedure right away and just one time. So we call it using CALL.

If you find this confusing…I agree with you! But the practice of going through it step-by-step was a huge benefit. Understanding part of it, even a small part, is much better than understanding nothing at all!

Sources

--

--

Mike Diaz
Mike Diaz

No responses yet