What I Learned at Work this Week: Parsing Lists with Python

Photo by Mike from Pexels

After facing gRPC last week, it was a real relief to see that I could actually connect to the API I was looking for in Python. As I continued to receive feedback about the integration, I could lean on that connection as a constant and go down other routes to debug. This week, I got one piece of feedback that had nothing to do with how I was connecting to the API, but instead how I was parsing data in Python:

Why is my text being cut off when I use the create function to add a new row to the table?

Since we were in the final stages of the project, I got some quick help by a more experienced Pythonista. But now that I have some more time, let’s break down the reasoning here.

Let’s take a step back and talk about what I was actually asked to build. My company uses a slackbot integration to allow non-engineers to submit database requests through slack. Their commands are parsed and validated by a Python script before the changes are made, which helps prevent careless errors that could affect other processes that rely on the DB in question. It’s a pretty good system: increased productivity without having to build out a whole UI or grant additional permissions to hundreds of employees.

My task was to add commands that boiled down to the ability to VIEW, CREATE, and UPDATE in a new DB. I worked with another engineer who wrote the proto file that defined my gRPC requests, so I didn’t have to worry about the syntax of my DB queries. All I had to do was make sure the right arguments were being passed when someone wanted to execute one of these commands.

The domain of our data was messages that would be sent by companies to their subscribers. For simplicity, let’s say each row in my table was a message_template that contained a company_id, message_name, and message_text. After setting up the new command, I got the aforementioned feedback: in some cases, the message_text in the DB wasn’t fully reflecting what had been put into the command. Commands looked like this:

create template for company_id 3 title my new template text here is a template with text

But my result would look like this:

text isn’t reflecting a value!

The issue is that the text should say “here is a template with text” but it’s totally blank. So how does this logic actually work?

This integration was very cleverly built on a parsing algorithm that takes a string (the “command”), breaks it up, identifies key words, and then passes the values as arguments into command-specific methods. Here’s the first time that happens:

def handle_command(command):
args = command.split()
module_params = module.params.copy()
command_dict = {
"param_dict": module_params
}
command = parse_command(args, command_dict)

As usual, this is way simpler than what’s actually in our codebase, but it provides some context. The first thing we’ll notice is that we use the split method, which breaks up our command string into a list of elements:

args = ["create", "template", "for", "company_id", "3", "title", "my", "new", "template", "text", "here", "is", "a", "template", "with", "text"]

Next, we take something called module.params, make a copy, and put it into a command dictionary (command_dict). This slackbot has a lot of different functionality, so tasks are grouped into “modules” depending on what table they’re editing or what team they’re used by. The correct module is also identified by a keyword, but we’ll ignore that in this simple example and just go straight to the params defined in the module I wrote:

params = {
"company_id": None,
"title": None,
"text": None,
"template_id": None,
}

Things are starting to come together. We’ve got a list of words as args and we’ve got a separate dictionary where some of the keys match words we see in the list. As you’ve probably guessed, these params are used to distinguish the difference between a column name and its content.

Finally, we pass our two new variables to a new function: parse_command.

def parse_command(command, command_dict):
method = command[0]
verb_index = 1
while method not in module.methods and verb_index < len(command):
method += "_"
method += command[verb_index]
verb_index += 1
if method not in module.secured_actions:
return f"Bad command, {command[1]} is not a supported action."
command_dict["action"] = module.methods[method]
for param_index in range(verb_index, len(command)):
if command[param_index] in command_dict["param_dict"]:
curr_param = command[param_index]
param_index += 1
value = ""
while (
param_index < len(command)
and command[param_index] not in command_dict["param_dict"]
and command[param_index] not in PREPOSITIONS
):
value += command[param_index] + " "
param_index += 1
command_dict["param_dict"][curr_param] = value.strip()
return command_dict

There are two things going on here. First, we have to figure out which of our module’s methods we’re using. Each module corresponds to a group of DB commands, so when a user references that module, they’re expected to identify their desired functionality. In our example, that method is create_template. The methods can be listed out by calling module.methods, so we can write a while loop to walk through those and check if our command list contains one.

Before our loop, we define method as the very first element in our command list: “create”. As I mentioned, the method is called create_template, so this doesn’t match the name of any existing methods. That activates the loop, which also runs on the condition that verb_index, a variable we set as 1, is not greater than the number of elements in the command list (16). Once inside the loop, we append an underscore to our method string and then append command[verb_index]. verb_index is set to 1, so we’re adding the 1th element in our list: “template”. Before completing our first run through the loop, we increment verb_index so that the loop will actually progress through our list and not run forever.

We’ve now changed the value of module to be “create” + “_” + “template”, or create_template. This matches the name of one of my methods, so the loop ends. If the loop got all the way to the end of our command list without finding a match, we would have served an error saying that the command is not an existing method.

Next, we build another loop using for. We’re going to run our loop once for each number between verb_index (which is now set to 2) and the length of our command list (still 16). Inside the loop, we continue to determine the role played by elements in the command list. We first had to determine which elements were part of the method name (the first two, in this case) and now we have to figure out which words are variable names and which ones are the arguments associated with those names. So first we check to see if the current element (command[param_index]) is one of the words in the param_dict that I passed as an argument. If it is, we establish the variable curr_param and run yet another loop:

while (
param_index < len(command)
and command[param_index] not in command_dict["param_dict"]
and command[param_index] not in PREPOSITIONS
):
value += command[param_index] + " "
param_index += 1

We have three conditions here:

  • param_index < len(command): we can recognize this by now as our iterator. The loop will end if we run out of command elements.
  • command[param_index] not in command_dict[“param_dict”]: We also want our loop to end if the element is part of our param_dict. We see that this loop is building a string of words separated by spaces that it eventually assigns as a value associated with a parameter in the param_dict. For example, we want our script to know that the value of “title” is “my new template” but also that the next element, “text”, is part of a different argument.
  • Finally, command[param_index] should not be part of PREPOSITIONS. And here, at the very bottom of this function, was the answer to my original question.
PREPOSITIONS = {"for", "in", "on", "with"}

PREPOSITIONS is a global variable in this script that defines four words that are meant to be used as helpers to make our commands easier to read. When we type the command “create template for company_id 3…”, the word for is more-or-less decoration — the condition above means it’s ignored when parsing our command.

Meeting the PREPOSITIONS condition short-circuits the loop and moves on to increment param_index and set a new value in the command_dict. It then jumps back to the top of the outer for loop and continues to iterate through the command list until it finds a value that is also a key in our param_dict. That happens immediately because the element after “for” is “company_id”. And here’s param_dict to jog your memory:

params = {
"company_id": None,
"title": None,
"text": None,
"template_id": None,
}

This doesn’t work quite so well the second time, when we find the preposition with in our list:

args = ["create", "template", "for", "company_id", "3", "title", "my", "new", "template", "text", "here", "is", "a", "template", "with", "text"]

When we hit with, we were building a string to submit as the value associated with the text param in our param_dict. Since detecting the preposition breaks out of the inner-loop, text was set to “here is a template” and then the outer loop resumed. If it continued iterating through the command list and never found another param name, nothing else would have been added and the text value would have remained “here is a template.” This is what happened in all of the bug cases that were reported to me. But actually, in this case, that’s not what would happen.

When the outer-loop resumes, it will find our next and final element: “text”, which it will recognize as a param name. We will therefore meet the conditional command[param_index] in command_dict[“param_dict”] and reset the text param in our dictionary. But this time since there aren’t any more elements in the list, it will just be set to an empty string.

This turned out to be a classic case of solving a problem by stepping through the logic of a long script until we found the step that generated the unexpected behavior. The code I’ve shared here is only a small portion of what I was working with, but I didn’t have to dive as deeply into those other areas because I established that they were not responsible for parsing my strings (and also, in fairness, because I had help from a colleague). Python is not my strength, but I’ve ended up with a strong understanding of this code because I was able to focus, take a deep breath, and read through it one line at a time.

Solutions Engineer