What I Learned at Work this Week: Using protobuf in Java

Mike Diaz
6 min readJul 30, 2023

--

True, I am using APIs. Photo by ThisIsEngineering: https://www.pexels.com/photo/woman-writing-on-whiteboard-3861943/

Okay the title is misleading for a couple of reasons:

  1. I didn’t really learn about protobuf this week. I started using it weeks ago but never felt confident enough to write a post about it.
  2. Also I haven’t really learned about it because I am *struggling*.

This isn’t too different from a lot of my posts. What I’m about to share might be useful to someone looking for a beginner’s understanding of Protocol Buffers, but it’s definitely going to be helpful to me, as I try to organize my thoughts around the concept and fill in some of the blanks that come up when you learn something for the first time. Let’s see how I implemented this framework.

Protocol Buffers

Some of the code we see here might seem familiar, but let’s start from the beginning just to be safe. Protocol Buffers are language-neutral, platform-neutral, extensible mechanisms for serializing structured data. Serializing data means taking a code object and turning it into a string, like JSON. We frequently see serialization when we have to share data between disparate platforms, like when we make an API call.

But Protocol Buffers have advantages over JSON: we can specify a format and the serialized result will be much smaller than if it were JSON or XML because they use binary for serialization. In my case, I have to use Protocol Buffers to communicate between front end and back end. The work I’ve done so far has been sending data *to* the front end, so it’s written in Java. If seeing some of that code would be helpful, read on.

My Proto File

To specify our format, we must write a .proto file and then, if we’re using Java, translate that structure to a class. As loyal readers know, I’m always working with an ETL script (extract, transform, load…it’s basically data reporting). Various clients want various customizations for their reports, so they each get their own config row in the DB. If we’re going to read from the DB and display that data on the front end, we’ll want to use Protocol Buffers to make sure that the serialized config data we send is always in the same shape. So here’s a small, simplified version of what I’ve got for my EtlConfig.proto file:

syntax = "proto3";
option java_outer_classname = "EtlConfigProtos";
service EtlReportService {
option (auth.service) = {
requires_authentication: true
};
rpc ListEtlConfigs(ListEtlConfigsRequest) returns (ListEtlConfigsResponse);
}

enum EtlConfigFeature {
ETL_CONFIG_FIRST_FEATURE = 0;
ETL_CONFIG_SECOND_FEATURE = 1;
}

enum EtlEnabled {
ENABLED = 0;
CUSTOM_SCHEDULE = 1;
DISABLED = 2;
}

message EtlConfig {
string etl_config_id = 1;
repeated EtlConfigFeature features = 2;
optional EtlEnabled enabled = 3;
}

We’re writing in proto3 and the class that will come from this file is EtlConfigProtos. Below that, we define a service, which is a Spring component used to write business logic. Our service makes an RPC (remote procedure call) request to our DB which returns a list of ETL Configs, but serialized in the format we’re about to define. If you’re not familiar with RPC, you can read one of my other posts or so just think of it as an API call, which is why it receives a request and returns a response.

Next we have two enums. When creating a message in Protocol Buffers, we can use primitive data types for the properties, or we can use custom object types, just like classes or in this case enums. EtlEnabled, for example, used to be a boolean, but there is now a third possible option, so we changed it to an enum. Note that the enum values all have numbers associated with them. These values are used by the serializer to simplify and identify the various properties.

Finally, we have our message, which is what our serialized content will actually be expected to match. It has three properties:

  • An ID, which is a string.
  • A feature or features, which is a list of enums. This is why we use repeated.
  • An enabled status. It’s set to optional specifically because I’m trying to add this property after the API has already been set up from both sides. If I want to add something after the fact, it should be optional so that existing requests coming from the front end don’t cause issues. After all, I can’t update the calls coming from the front end before I update the back end to receive new message types. This is our safest option, and we can update it in the future.

The Generated Java File

In protobuf documentation, you’ll commonly see protoc as the command suggested for compiling your protos. At work, we compile many of our protos as part of our Gradle build process (Gradle is a build automation tool). If you’re really patient and good with Java, you can probably make sense of the generated file, but I think it’s more effective to let your IDE help lead the way while you write your own class that implements your new methods.

We started with a file called EtlConfig.proto, which defines what the proto class will look like. When we compile our code, EtlConfigProto.java is generated, which allows us to instantiate the Protocol Buffer with Java. Finally, we’ll write a class called EtlConfig.java where we build a Java version of this class and create a method that can transform it into the proto:

@Value.Immutable
public interface EtlConfig {

static Builder builder() {
return ImmutableEtlConfig.builder();
}

OptionalLong getEtlConfigId();
EtlEnabled getEnabled();
Set<EtlConfigFeature> getFeatures();

default EtlConfigProtos.EtlConfig toProto() {
return EtlConfigProtos.EtlConfig.newBuilder()
.setEtlConfigId(Long.toString(getEtlConfigId().orElse(0L)))
.setEnabled(getEnabled())
.addAllFeatures(getFeatures())
.build();
}

static Either<Problem, EtlConfig> fromRequest(EtlConfigProtos.AddNewEtlConfigRequest request) {
try {
return builder()
.etlConfigId(request.getEtlConfigId())
.enabled(EtlEnabled.ENABLED)
.addAllFeatures(request.getFeaturesList())
.build();
} catch (NullPointerException e) {
String message = "A required value was missing, please check input and try again";
LOG.error(message, e);
return Either.left(Problem.invalidArgument(message));
}
}

interface Builder {

Builder etlConfigId(long etlConfigId);

Builder enabled(EtlEnabled enabled);

Builder addAllFeatures(Iterable<EtlConfigFeature> features);

EtlConfig build();
}
}

We’re missing some imports here because this is a simplified version of what I really wrote, so I’m sorry to say it won’t work if you try to run it at home (it’s also using a bunch of Spring features, which we haven’t set up today). But hopefully we can gain some understanding of the pieces of an Immutable.

We use the @Value.Immutable annotation because we’re creating an interface that will not be altered. It’s critical that no subsequent code changes it, because if it is tweaked, the results won’t match our protobuf and everything else will fail. Interfaces are like classes, but they are implemented rather than instantiated — we can’t say new EtlConfig() — that won’t work. Instead, when we compile our code, the magic of Spring generates ImmutableEtlConfig, which is a class that implements our interface.

When we want to create a new instance in the shape of an EtlConfig, we use a generated builder method which is based on the interface we define all the way at the bottom of the code block. I honestly don’t know why we do this, but it’s the syntax of our framework now. You can see that builder is used in our fromRequest method. To invoke it, we chain on the relevant properties and then add .build() at the end:

return builder()
.etlConfigId(request.getEtlConfigId())
.enabled(EtlEnabled.ENABLED)
.addAllFeatures(request.getFeaturesList())
.build();

This is a really important principle: why do we have a fromRequest method? What does that mean?

The request is data of the type EtlConfigProto. It’s built by some obscure generated code and is sort of similar to the Java object we’re defining in EtlConfig.java, but it’s not exactly the same. So if we want to actually work with it in our Java code, we have to pull out the relevant properties and reattach them to an object of type EtlConfig. That’s what the builder does.

Likewise, if we’re going to send data to the front end, we have to convert it to a protobuf. That’s the point of all this, after all! So we write a method called toProto and use a builder from our EtlConfigProto file, and let the magic of Protocol Buffers do the property assignments.

Keeping the Conversions Straight

If you’ve ever worked in Java, you can probably imagine the stress that comes with keeping these types in order and the compliation errors that constantly arise when you’re passing EtlConfig.ENABLED instead of EtlConfigProtos.Enabled. That’s about where I got stuck this week, as I have to figure out how to get an enabled status from a Java object and write it in the form of a proto. As usual, I’m sure it’ll be something obvious, and as usual, my ability to ask others for help will be improved by the research I’ve done over this weekend.

Sources

--

--