SQL is one of the few frameworks where I feel confident in the code I’m writing. Whether it’s the wrong JOIN type, a GROUP BY column not being aggregated, or even an extra comma, I can usually figure out what I need to change to make my query work. But most of my SQL work has been in querying tables rather than creating or writing to them. This week, I was reminded that there’s still quite a bit for me to learn in the world of SQL.
My Table
I wanted to move data from a Postgres DB to a MySQL DB to better reflect domain ownership (this wasn’t a reflection of a preference between Postgres and MySQL, it just so happened that one team owned a Postgres DB and the other a MySQL DB). The first cool thing I learned, thanks to a teammate, is that the IntelliJ IDE can generate SQL for table creation:
Thanks to this, I already had a ready-made CREATE TABLE command. Here’s a simplified version:
create table daily_reports
(
daily_report_id serial
primary key,
company_id integer not null,
active boolean not null,
public_key_fingerprint varchar(255) default NULL::character varying
);
The real table had a lot more columns than this, but this is all we need to illustrate what I learned this week. We’ll start at the end and work backward.
Postgres to MySQL
I was so excited about the IntelliJ feature that I forgot to check if it was actually valid syntax. Believe it or not, the SQL used to generate this Postgres table didn’t fit seamlessly into MySQL. The main obstacle I came across was:
default NULL::character varying
This is Postgres casting unknown NULL values to varchar, but the ::
syntax doesn’t work in MySQL. I had to remove that part and just make the default value NULL for the code to run successfully. I may be missing part of the functionality here, but it seems redundant anyway since we’ve previously stated that this column’s value is varchar(255)
.
Unsigned Integers
I had a dilligent teammate who closely reviewed my SQL before approving the migration. He pointed out that the company_id
column could be an unsigned integer:
company_id integer unsigned not null,
I linked a piece of IBM documentation, but like most explanations of this concept, I found it a bit confusing. The main takeaway is that “unsigned” means there’s no indication of whether the integer is negative or positive. It’s assumed positive.
This works perfectly for something like an ID, since we should assume that value will always be > 0. By making the value unsigned, we can repurpose the bit that would represent the sign into the ID permutations themselves. In other words, the largest possible signed value is 2,147,483,647, but the highest possible unsigned value is 4,294,967,295, because we’re ignoring all the bit permutations that represent negative numbers. I doubt we’ll ever want to create more than 2 billion scheduled reports, but hey, you never know.
Serial Primary Key Bug
My teammate also pointed out that actually running daily_report_id serial primary key
creates two indexes:
It turns out this is a known bug in MySQL. As that StackExchange post explains, SERIAL is an alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE
. These terms mean about what you’d expect, but for reference:
- BIGINT is an integer that uses up to 8 bytes of storage. Meaning it can represent 2⁶³ value combinations.
- UNSIGNED scroll up if you want to read the explanation again.
- NOT NULL means this will always have an integer value
- AUTO_INCREMENT means the integer is going to increase in sequence. This is something we’d want for a primary key, as you can imagine.
- UNIQUE means no two values can be the same, which fits because this is an ID.
These are all qualities we’d want for a primary key, but what I didn’t know, or didn’t intend, was to also actually create an index through use of SERIAL. I’d prefer to leave that up to PRIMARY KEY
.
The post explained a workaround that also worked for me. If SERIAL is shorthand for a longer series of commands, but also has an unintended consequence…why not just go the long way? By typing out BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE
, I was able to avoid the extra index.
Default Collation
The last thing in my PR review was a note about MySQL’s collation. My teammate explained to me that we might want to check a DB’s collation rules to make sure any tables we add will behave as expected. In the linked blog post, Lukas Vileikis explains that “collations are sets of rules for comparing characters in a character set.” The collation setting will determine how different characters are read.
In our case, our DB was using the default collation utf8mb4_unicode_ci
. So the characters will be interpreted in UTF-8, unicode, and “ci,” which stands for “case insensitive.” I was warned that this means a row with public_key_fingerprint
of abcd555
would be read as the same as AbCD555
(these are just examples, not valid public key fingerprints), but I don’t foresee that impacting production, so I decided to leave it alone.
The Little Things
In my posts, I often focus on little bits of information like what we reviewed today. In my experience, it’s much easier than trying to understand grand concepts or frameworks and, since I try to fit my writing within a few hours on the weekend, makes it more likely that I’ll actually finish. My *hope* is that at some point, I’ll understand enough small things to make bigger concepts easier. But even if that doesn’t happen, I’m still learning and that’s always a good thing.