What I Learned at Work this Week: GPG, PGP, Encryption

credit: Pexels

The title is a little misleading this week since…I haven’t actually learned what I want to know about encryption yet. This is extra unusual because I work with PGP as part of my job. When I get a request to set up file encryption for a client, I re-read documentation that I always have trouble following, try my best to make it work, usually fail to do that, and have to consult a more seasoned engineer. This is not an ideal workflow, so I’m spending some time this weekend trying to better understand the language and logic around encryption.

These two acronyms have consistently been a source of confusion for me because they’re so similar and frequently show up in the same domain. I had previously used them interchangeably, not realizing that they mean two completely different things:

PGP stands for (if you can believe it) Pretty Good Privacy. In short, it’s an encryption program. This involves signing, encryption, and decryption. Signing means assigning a digital signature to our data (document, email, disk partition) that verifies its authenticity (it was sent by the right person) and integrity (it wasn’t altered in transit).

PGP can use public and private keys for encryption. The use of a public key is also known as asymmetric encryption because, though data can be encrypted by anyone (since the encryption key is public), only one party holds the decryption key, which is private. Symmetric encryption involves both the sender and the receiver holding private keys, also known as a shared secret. It’s not uncommon for these methods to be used together, whereby the shared secret needed to decrypt a piece of data is encrypted asymmetrically before being sent from one party to another.

I recognize that I might not ever understand how encryption actually works, but this research helped me decode some of the terms I had seen when it came to encryption. For example, I learned that a public key fingerprint is not the same thing as a public key. The fingerprint is a code that points to the key, but it’s shorter so it’s more easily communicated and shared.

GPG, sometimes abbreviated as GnuPG stands for GNU Privacy Guard. It provides libraries that we can use to encrypt data in our programs. So while PGP is more broad, GPG represents one specific method for encryption, which happens to adhere to the standards set by PGP.

Now that I understood encryption a little better, I took another look at my codebase to see how the logic flowed. I saw this on the second-to-last line of the Python script:

gpg = gnupg.GPG(gnupghome=args.gpg_keyring_path)

We set a variable called gpg using a gnupg object, which is imported from a GPG library. We pass an argument that identifies gnupghome, which in this case is set to a location passed as an argument when the script is invoked. That argument contains a path to the directory containing the GnuPG library files.

The GPG() method returns a gpg object, which was used in a function called encrypt_file — it definitely seems like we’re on the right track!

def encrypt_file(public_key_fingerprint, input_file, output_file):
input_file.seek(0)
text = ''.join(input_file.readlines())
text = text.encode('utf8')
result = gpg.encrypt(text, [public_key_fingerprint, OUR_OWN_PGP_PUBLIC])

if not result.ok:
raise ValueError(f'Could not encrypt file {result.stderr}')
else:
output_file.write(str(result))
output_file.flush()

This is a simplified version of the function for illustration purposes. The first three lines create a string from the contents of the file we want to encrypt and encode it with UTF-8. So our variable text is a string containing the encoded (but not encrypted) contents of the file in question.

Once we have the text, we run it through the gpg.encrypt method. The first argument is the text we want to encrypt and the second argument is an array of recipients. More accurately, it’s a list of encryption key fingerprints, one for each potential receiver of our encrypted text. In this case, we’re encrypting the text once using the key fingerprint provided by the client and then a second time for my own company’s public key. This argument only has to be an array if we’re providing multiple encryption key fingerprints. If we only wanted to use one, we could just pass in that string.

The encrypt method creates an object that has an ok property. If the encryption fails, it’s set to false and a status property is added with a value explaining the failure (in this case we use the stderr property to get details on the error). In our code, a successful encryption will result in writing to the output file and then running Python’s flush method. This clears the the internal buffer, which helps finalize the output file. To save computing resources, the write method doesn’t necessarily fully update our output_file object when executed, but instead writes to a buffer. By clearing that buffer, flush assures that our file has been updated and can be reliably accessed in other parts of the code.

Encryption is a hugely complex subject that requires a lot of background to understand. But just because we can’t understand every piece of a concept doesn’t mean we shouldn’t try to chip away at it. I’ll probably never be able to write a useful encryption function, but now I’m in a much better position to use encryption in future Python scripts and, as a bonus, I learned a few handy new methods too!

Sources

Solutions Engineer