What I Learned at Work this Week: .htm(l)
This week I picked up a ticket to create an email template for a client who was looking to send “you left this in your cart” reminders to their shoppers. It was fun to learn a new platform (we use Bee for our email templates), but the one thing that really stood out to me was the file extension I used to upload my templates. As you might expect, Bee exports templates as HTML files. But for some reason, I was asked to change the filename extension from .html to .htm.
The Filename Extension
A filename is…the name of a file. The extension generally occurs after a period in the filename and is commonly used to indicate the format of a file. If we’re recording audio files, we might see extensions like .wav or .mp3, for documents we might see .doc or .pdf, or for images we might see .jpg or .gif. We as users will treat these files differently; we know they contain different types of media, we expect them to be different sizes, and we also know that certain programs respond to certain file types. My TextEditor can’t open a .jpg file.
Image, video, and audio files aren’t restricted to just one filename extension each. Likewise, HTML files can have either a .html or .htm filename extension. So what’s the difference between the two?
.html vs .htm
Based on my reading, the primary difference is…personal preference! We shouldn’t be expecting different file sizes or compression rates if we choose .html or .htm. We want to choose one or the other and be consistent to prevent confusion and unexpected behavior, but I wonder what made my company choose .htm in the first place. So I did a bit more digging.
It turns out that filename extensions used to be a bit more important than they are today. As I’ve discussed in previous blog posts, one of the greatest advancements we as programmers enjoy today is an abundance of space and memory on our machines. This wasn’t such a luxury in the early days of DOS (disk operating system) and Microsoft Windows (think pre-95). To conserve a bit of space, filenames were truncated using the 8.3 filename convention.
The numbers 8 and 3 represent the allotted characters in the naming convention: 8 for the filename and 3 for the extension. Older Microsoft systems would convert user-created filenames to an 8.3 version by taking the first 6 characters of the name and appending a tilde and a number as a counter. 8.3 filenames are character agnostic. So if my file was called OurFunTextFile.txt, the result would be OURFUN~1.TXT. Special characters are either ignored or converted to underscores. Periods and spaces are ignored.
The counter would increment if I had another file with a similar name. So OurFunTextFilePartTwo.txt becomes OURFUN~2.TXT. On Windows 95, 98 and ME, if we find a prefix shared by more than 9 files, the last character before the tilde would be cut off to allow the counter to increment to two digits. Eventually, we’d start to see OURFU~10 and so on. Starting with Windows 2000, the limit changed to 4 files with the same 6-character prefix. On the fifth, the convention would take the first two characters from the long file name and then append four hexadecimal characters from a hash of the filename. So OurFunTextFileFifthTime.txt might become OU01F2.TXT. Not only does this scale better, but it’s more secure because it’s much more difficult to guess the pattern and recreate a shortened filename from outside the system.
The key part here is the .3 — in this convention, extensions can only have 3 characters. To follow the convention, we can’t use .docx, .jpeg, or html. Since .htm isn’t functionally different, it’s an ideal substitute.
How was this relevant to my case?
This convention has a lot to do with 20+ year old operating systems, which are not at all relevant to the work I do on a daily basis. So why did I have to change my filename extensions to .htm files in the first place? I actually couldn’t find any obvious explanation. I learned that certain microcontrollers may use the 8.3 filename convention, but it wasn’t clear that I was working with those.
Ultimately, my guess is that this is driven by the convention of consistency: at some point, my company decided to use .htm files for this practice and our code or a vendor’s code likely searches for or reads .htm files specifically when executing a task related to the email templates. What’s great is that even though I didn’t find the answer I was looking for, I am now well equipped to understand the answer if I ask around the office. I’ll be sure to update this post if I learn more. The building blocks of knowledge can be used to reach a multitude of destinations!
- Stack Overflow pointed me in some helpful directions. Check out the threads here and here.
- A nifty Wikipedia explanation for 8.3 filename conversion.
- Nostalgia Nerd: Why does DOS use 8.3 Filenames?
- Arduino documentation on a microprocessor using 8.3 formatting for filenames.