What I Learned at Work This Week: Base64 Encoding

Another week as a Solutions Engineer, another term I didn’t understand. While working on some new tickets, I saw base64 written a few times in our codebase and documentation. I was able to infer that it was changing the format of some data to encode it, but I wasn’t sure why or how…which made it the perfect subject for this week’s blog!

What and Why?

<img src="…" />

At work, I was exposed to base64 encoding because Fivetran’s REST API requires API keys and secrets to be base64 encoded in requests. I can’t say why they have this requirement since base64 is easily decoded and is not typically used for encryption. But what also may stand out is that API keys and secrets are generally character strings. So if base64 encoding converts binary to string, how would it work…on a string?

Base64 Encoding of a String

In this image, we can see that the binary values of all the characters have 6 digits (6 bits). This is the key to base64 encoding because the values we’re translating from will contain 8 bits, the native amount of all digital storage and communication.

I’m still not sure how commonly base64 encoding is meant to be used on a string, but my real-life client needed it for their real-life API, so I figure it’s worth exploring. Let’s start with a simple string:

Ahoy

To start, let’s plug this into a base64 encoder to see what we should expect to end up with:

Ahoy => QWhveQ==

We know that base64 encoding works with binary data, so we’ll first have to translate these ASCII characters. So the letters in our string would end up looking like this:

A => 01000001
h => 01101000
o => 01101111
y => 01111001

We’ve converted our string of characters to binary and here it is:

01000001 | 01101000 | 01101111 | 01111001

We’ve got a 32-bit input that was originally made up of octets. To convert it to base64, we’ll translate each segment of the input, but six bits at a time rather than 8. Let’s consult our base64 encoding table to translate the first sextet:

010000 => Q

And we’ll continue along until…

010000 | 010110 | 100001 | 101111 | 011110 | 01
Q W h v e ?

Oh no! We ran out of sextets because we started with 32 bits, which is not evenly divisible by 6. You probably noticed that our base64 ouput’s last three characters were a Q and two = symbols. When dealing with overflow, the encoder fills out the rest of the sextet with 0s, so in trying to translate 01, it actually translates 010000. So what’s up with the =?

Base64 uses the symbol = as a padding character to signify that the 0s used to fill out the code did not come from the original string. We’ll never have an odd number of bits, nor will we ever need more than 4 placeholder 0s, so each = symbol represents one pair/set of 0s. That’s why we see two at the end of our output — we needed two sets of 0s to fill out a sextet. To further exemplify this principle, let’s amend our input by adding a character:

Ahoy!

By adding a character, our binary translation now contains 40 bits. 36 of those bits fit neatly into sextets, so we know that our remainder of 4 will need one set of 0s added before translation. That’s represented by one = symbol.

String:  Ahoy!
Octets: 01000001 | 01101000 | 01101111 | 01111001 | 00100001
Sextets: 010000 | 010110 | 100001 | 101111 | 011110 | 010010 | 0001
Base64: Q W h v e S E=

More Uses

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store