What I Learned at Work this Week: Conditional Logic for Empty Files

Mike Diaz
6 min readNov 27, 2021
Photo by Karolina Grabowska from Pexels

Loyal readers know that I have been working off-and-on with a Python script that generates a CSV which is meant to be shared with a user via Slack. This week, my coworker and I were able to successfully get such a file into Slack, meaning we were almost done with the project. We began testing more thoroughly and in an environment more similar to what we’d use in production.

On Friday, we found what we’re hoping is one of the last bugs we encounter on this journey: our script doesn’t perform well if it gets no results when pulling data. In our case, this happened because we were testing on a development server, meaning most queries we tried to make didn’t return any results. The code runs and creates an empty file, but the Slack API doesn’t post that file to the channel. Instead, we get no response, as if our code had never run at all.

We want our users to know if their query returned no results, rather than questioning whether the script was taking extra time to run or if they had used an incorrect command. That led us to try and learn how to write conditional logic on the basis that a file has no content, or is empty.

A sample script

I found a few different methods for checking on file size or content online and wanted to see if I could implement them myself. I wrote a simplified script that imitates the logic of my professional work so that I could more easily share it with all of you. Feel free to try it out at home:

import csvfetchall_with_content = [(1, 'row 1', True), (2, 'row 2', False)]
fetchall_without_content = []
def generate_file_from_list(list_data):
with open('output.csv', 'w+') as output:
csv_writer = csv.writer(output)
for row in list_data:
id = row[0]
name = row[1]
boolean_result = row[2]
csv_writer.writerow([id, name, boolean_result])
generate_file_from_list(fetchall_with_content)
# generate_file_from_list(fetchall_without_content)
print('file generated')

At work, our script uses PostgresConnection to query a database and then makes the data into an iterable data object with cursor.fetchall:

def get_join_data(company_domain):
with PostgresConnection(
environment=ENVIRONMENT,
mode='w',
dict_cursor=True,
s3_key='signup_properties.json',
) as pg_connection, pg_connection.cursor() as cursor:
cursor.execute(
JOIN_QUERY,
{'attentive_domain': company_domain})
return cursor.fetchall()

I wanted to skip the query step and just return something similar to what I’d get from this function, so I checked on what cursor.fetchall would return. According to PYnative (added emphasis mine):

cursor.fetchall() fetches all the rows of a query result. It returns all the rows as a list of tuples. An empty list is returned if there is no record to fetch.

As you can see in my code, I created two possible results of fetchall. One was a list of tuples:

fetchall_with_content = [(1, 'row 1', True), (2, 'row 2', False)]

And the other was an empty list, which is what I was seeing in my development environment:

fetchall_without_content = []

Next I defined a function that would accept one of these lists and iterate through the elements in that list, adding them as values to a file it creates called output.csv. I wrote two rows that invoke the function and commented out whichever one I didn’t want to test at the moment. Finally, I added a print statement to indicate that my logic had run without issue.

Running this script generates output.csv in the same directory of the script itself, so I can view the results in my IDE or any other software that can read CSVs:

so far, so good

os.path.getsize

I found three distinct methods of checking filesize. The first two make use of the os module, which brings functionality from our operating system into our Python script. From there, we can access the path module, which gives us methods that can read information about files based on their pathnames. Getsize works just how you’d expect it to:

generate_file_from_list(fetchall_with_content)
filesize = os.path.getsize('output.csv')
print(f'The filesize is {filesize}')
# The filesize is 29

If the file is empty, os.path.getsize(‘output.csv’) returns 0, so I can update my logic to say:

generate_file_from_list(fetchall_without_content)
filesize = os.path.getsize('output.csv')
if filesize == 0:
print('The report produced no data')
else:
print('The filesize is data has been saved')

os.stat

While os.path will give us access to a series of methods for our file path, os.stat accepts a path and returns an object with relevant properties that describe the file. Our logic will look almost exactly the same — we just have to change the definition of filesize:

filesize = os.stat('output.csv').st_size

I was surprised that this worked so well because I got an error when I tried to implement it at work. I realized that I had been using a different syntax before, which looked something like this:

with open('output.csv', 'w+') as output:
filesize = os.stat(output).st_size
if filesize == 0:
print('The report produced no data')
else:
print('The data has been saved')

And indeed that generated the error I had seen previously:

success! (in finding the error)

Now that I’ve done some research into the function here, I recognize that it’s not happy receiving the file data type that open() returns. It requires a path, which is why it worked when we passed it a string earlier.

Reading the file’s first character

Before I knew about any built-in Python functionality, my first thought was to simply check the contents of the file I had created. Empty files are still truthy, but if there are no contents, then the first element will be falsey. I can use read to pull that data:

with open('output.csv', 'r') as output:
first_char = output.read(1)
print(first_char)
if not first_char:
print('The report produced no data')
else:
print('The data has been saved')

One thing that’s important to note is that I have to use r with my open function, whereas before using w+ was okay. When I generate a file from fetchall_with_content, first_char is set as 1 (because that’s the very first value in output.csv) and my else is executed. If I use fetchall_without_content instead, first_char is an empty string, which is falsey and returns causes the logic to return “The report produced no data.”

An alternate (better?) solution

As I was researching for this blog, it occurred to me that my methodology might not be the best way to solve this problem. My goal was to write conditional logic based on a Python-created file being empty, but the file is finalized as one of the final steps in my script. I think it probably makes more sense to short-circuit the logic immediately after querying the database if those queries don’t return any data. Here’s how I can apply that idea in my professional script:

join_data = get_join_data(company_domain)
click_data = get_click_data(company_domain)
if not join_data and not click_data:
return utils.create_response('NO CONTENT')
else:
create_and_move_file(counter, join_data, click_data, link_type)
return utils.create_response('SEND FILE TO SLACK')

get_join_data and get_click_data are the functions that query the database. Once those have run, we can check the results to see if they are empty lists or not. An empty list is falsey in Python, so we can use:

if not join_data and not click_data:

The alternative was to allow create_and_move_file to run, which will produce a CSV that can be checked for size or contents. The upside of this solution is that it stops the code as soon as we know we won’t have any data for our report. The downside is that it creates a previously undefined path for our logic to take. We see two possible return values here: SEND FILE TO SLACK and NO CONTENT. These will be handled by another file, but right now that file doesn’t know what to do with NO CONTENT. As always, we’ll have to consider a series of factors when deciding how to proceed: what’s easiest to read, what’s most efficient, what’s most iterable or sustainable. Likewise, as always, it’s helpful to have the knowledge to offer multiple solutions and the ability to implement them based on feedback from your team. There’s rarely one single correct answer!

Sources

--

--