About a week ago, my manager posted an @channel in Slack asking someone to pick up a semi-urgent ticket. The result, of course, is a silent standoff where (I imagine) everyone on the team reads the message, holds their breath, and hopes someone else responds. I lasted for about 10 minutes…which means I ended up with the ticket.
It was a pretty standard request, albeit one that I didn’t have much experience with. But when I asked the person who did have experience, they pointed out that there was one unusual aspect in this case: we were being asked to use a tool that normally accesses a MySql database to instead query a Postgres database. And that meant we’d have to develop a connection to that new DB.
The tool in question is built on Python, so one of my teammates pointed me in the direction of a Python example I could try. After further investigation, I was told that it was an older solution and that it would be preferable to use gRPC. Great! So all I had to do was learn what gRPC is and how to use it in Python.
What is gRPC?
RPC stands for Remote Procedure Call. The g is added in this case because this particular RPC was developed by Google in 2015. The idea here is that we can build an API that shares methods between servers so that we can make requests between different machines and different programming languages.
To make this work, we implement a gRPC server that has access to the data being queried. The client side uses what’s called a stub to call methods on the server. Here’s a great image from the documentation:
If we’re going to be using methods to communicate between different machines in potentially different languages, we need some way to serialize our data. Serialization takes data from disparate languages or sources and converts it into something that can be put into a request and universally understood. One common tool for serialization is JSON, which takes code and turns it into a string so that an object can, for example, be stored as a cookie. gRPC can use JSON as well, but the documentation suggests that we use protocol buffers.
Protocol Buffers
Protocol buffers are a serialization mechanism, just like JSON. When we use protocol buffers, we first have to define the structure of our objects in a .proto file. Here’s an example of a Person data type as a proto:
message Person {
string name = 1;
int32 id = 2;
bool has_ponycopter = 3;
}
Note that protocol buffer data are called messages. So this message is called a Person and has three fields: name, id, and has_ponycopter. The data types of each field are printed to the left of the field name, so name is a string, id is an integer of up to 32 bits, and has_ponycopter is boolean.
The numeric values are unique numbers that identify the fields so that they can be organized for serialization. When we compile our proto file, it will create getters and setters as part of a class that we can instantiate and reference on the client side with our stub. That sounds easy, right?
Implementation
The gRPC docs provide a quick start example, which I always find helpful when trying to understand abstract concepts. I followed the instructions to download the dependencies and the repo. The next step was to run two separate servers (since gRPC is all about communicating between the two of them):
// from the terminal
python greeter_server.py// in a different terminal window
python greeter_client.py
The first command refers to the server, or the entity that is going to receive our request. Let’s look at the code in that file:
""""""The Python implementation of the GRPC helloworld.Greeter server."""from concurrent import futures
import loggingimport grpcimport helloworld_pb2
import helloworld_pb2_grpcclass Greeter(helloworld_pb2_grpc.GreeterServicer):
def SayHello(self, request, context):
return helloworld_pb2.HelloReply(message='Hello, %s!' % request.name) def serve():
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
helloworld_pb2_grpc.add_GreeterServicer_to_server(Greeter(), server)
server.add_insecure_port('[::]:50051')
server.start()
server.wait_for_termination() if __name__ == '__main__':
logging.basicConfig()
serve()
We want to pay attention to the imports here. We import logging, grpc, and then two files that mention pb2. helloworld_pb2_grpc becomes relevant as we define our Greeter class. In Python, when we pass an argument to a class as we define it, we’re assigning inheritance. So let’s see what helloworld_pb2_grpc.GreeterServicer is bringing to the table:
class GreeterServicer(object):
"""The greeting service definition.
""" def SayHello(self, request, context):
"""Sends a greeting
""" context.set_code(grpc.StatusCode.UNIMPLEMENTED)
context.set_details('Method not implemented!')
raise NotImplementedError('Method not implemented!')
It contains one method, sayHello, but it looks like all the method does is throw an error. If we look back at our implementation of this class, we’re sort of overriding this method with useful logic:
def SayHello(self, request, context):
return helloworld_pb2.HelloReply(message='Hello, %s!' % request.name)
From my perspective (based on my Java background), the class created by helloworld_pb2_grpc is more like an interface. It has a method, but we’re expected to redefine the method if we plan to use it — it doesn’t give us any functionality off the bat. When we do implement it, we use another method from a pb2 file — helloworld_pb2.HelloReply. Now helloworld_pb2 is much harder to read than helloworld_pb2_grpc. My best guess is that this is where the code we’re invoking comes from:
_HELLOREPLY = _descriptor.Descriptor(
name='HelloReply',
full_name='helloworld.HelloReply',
filename=None,
file=DESCRIPTOR,
containing_type=None,
fields=[
_descriptor.FieldDescriptor(
name='message', full_name='helloworld.HelloReply.message', index=0,
number=1, type=9, cpp_type=9, label=1,
has_default_value=False, default_value=_b("").decode('utf-8'),
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
options=None),
],
extensions=[
],
nested_types=[],
enum_types=[
],
options=None,
is_extendable=False,
syntax='proto3',
extension_ranges=[],
oneofs=[
],
serialized_start=62,
serialized_end=91,
)
I’m paying attention to the fields attribute, which has a name field with a value set to message. If you happened to look at the quick start documentation, this might look familiar. Here’s the proto file that generated this pb2 data:
// The greeting service definition.
service Greeter {
// Sends a greeting
rpc SayHello (HelloRequest) returns (HelloReply) {}
}// The request message containing the user's name.
message HelloRequest {
string name = 1;
}// The response message containing the greetings
message HelloReply {
string message = 1;
}
We’re looking at the final property here: HelloReply. Note that it has one defined field, which is called “message.” That matches what we see in the pb2 file. Now if we look back at the implementation, we see that this method dynamically sets the message property based on a request object that is passed as an argument. The rest of the server side logic just opens up a port and starts listening for input.
It should be clear now that those pb2 imports are critical to our use of gRPC. Any methods we write in our server or client files must use data types we have defined in our proto file, which is translated into Python in the pb2 files. If we deviate, we can’t expect our data to be properly serialized, so the defined data types hold us accountable for passing requests that can be interpreted by the gRPC back and forth.
We have some understanding of our server, but what’s going on with the client? Here’s the client side code:
"""The Python implementation of the GRPC helloworld.Greeter client."""from __future__ import print_function
import loggingimport grpcimport helloworld_pb2
import helloworld_pb2_grpcdef run():
# NOTE(gRPC Python Team): .close() is possible on a channel and
# should be used in circumstances in which the with statement does
# not fit the needs of the code. with grpc.insecure_channel('localhost:50051') as channel:
stub = helloworld_pb2_grpc.GreeterStub(channel)
response = stub.SayHello(helloworld_pb2.HelloRequest(name='you'))
print("Greeter client received: "+ response.message)if __name__ == '__main__':
logging.basicConfig()
run()
grpc.insecure_channel specifies where to find the server, which matches what we previously set. Way back at the beginning of this article, we explained that clients use a stub to call methods on the server. The stub is defined in the pb2_grpc code. On the next line, where we define response, we invoke sayHello from the stub and once again pass it an argument generated from a helloworld_pb2 method. But this time, it’s HelloRequest instead of HelloResponse. We give the request a name parameter, which we know will be interpolated into the HelloResponse on the server side. The expected result is the printing:
Coding out gRPC
We’ve seen two files that are not too difficult to read: our proto file that defines our data objects and our server/client files that we were able to walk through on our own with basic Python knowledge. It might have been easy to miss the origin of our ever-important pb2 files, which are actually generated by a proto compiler. This process is usually kicked off by the protoc command, which reads our proto files and then creates the pb2 files based on what’s inside.
This tutorial shows us how to set up a really simple gRPC server from both ends, but there’s still a lot to be learned. at work, I’ll have to figure out how it relates to an existing database and how I can use the gRPC to make DB queries. Still, the first step is learning the concept and understanding the framework. I think we can all say we’re a lot closer than we were yesterday.
Sources
- Introduction to gRPC, gRPC docs
- Data Serialization, Developedia
- Protocol Buffers, Google Developers
- Quick start, gRPC docs