What I Learned at Work this Week: Python Arrow Library (and a bonus)

Mike Diaz
4 min readNov 20, 2021

--

Photo by Charlotte May from Pexels

After a few weeks on different subjects, I’ve returned to the adventure of reading through a long Python script. This week, I found a function that contained some syntax worth investigating. When defining a variable called date_range, the code invoked a library called Arrow:

if args.end_date:
date_range = arrow.Arrow.range(
'day', arrow.get(args.date), arrow.get(args.end_date)
)

What is arrow.Arrow? What arguments does range accept and what data type does it return? It’s time to consult the documentation!

Arrow

Those docs describe Arrow as:

a Python library that offers a sensible and human-friendly approach to creating, manipulating, formatting and converting dates, times and timestamps

It brings together a lot of functionality that is otherwise disparate in the Python landscape. For example, using only Arrow, we can:

  • Generate a timestamp
  • Shift the time in that timestamp (ie add or remove one hour/day/month etc)
  • Change the time zone of our timestamp
  • Change the format of our timestamp
  • Change the language of our timestamp

Arrow can work with various data types (date, time, datetime, tzinfo, timedelta, relativedelta, etc.) and can parse ISO compliant strings. If you’re not familiar with those, here’s the example provided:

>>> arrow.get('2013-09-30T15:34:00.000-07:00')
<Arrow [2013-09-30T15:34:00-07:00]>

Note that it returns a unique “Arrow” data type rather than a string or anything else. If we want to print them, we can format these objects by passing them through Arrow’s format method:

>>> arrow.utcnow().format('YYYY-MM-DD HH:mm:ss ZZ')
'2013-05-07 05:23:16 -00:00'

Let’s look at one more cool method before re-examining the code we’re trying to understand. It’s called humanize:

>>> past = arrow.utcnow().shift(hours=-1)
>>> past.humanize()
'an hour ago'

On the first line, we create an Arrow object for the current timestamp in UTC, then call the shift method on it, passing the argument of hours=-1.

Shift adds or removes time from our Arrow object. In this case, we remove one hour (hour=-1). But the method also must leave some sort of metadata on our object because if we run humanize on it, Arrow remembers that it has had an hour removed. And so it returns a version of that transaction that a human can understand: a string that says “an hour ago.”

I’m not sure how commonly that’s used, so rather than look deeper into it, let’s check out our code sample again:

if args.end_date:
date_range = arrow.Arrow.range(
'day', arrow.get(args.date), arrow.get(args.end_date)
)

The first thing I noticed here was that we’re not referencing the methods in the same way as the examples from the documentation. Instead of:

arrow.range

We’ve got:

arrow.Arrow.range

At first I thought this was because we were importing the arrow object differently, but it’s exactly the same as the documentation. So I did a search for arrow.arrow.range and found some new documentation explaining the range method:

Returns an iterator of :class:`Arrow <arrow.arrow.Arrow>` objects, representing points in time between two inputs.

Ignoring the arrow.Arrow weirdness for a second, we see that range returns an iterator, meaning a data type that we can loop through. It contains Arrow objects that each mark a point in time between inputs, which must be the arguments we’re passing to the method. I searched for an example to see if the syntax matched what I was seeing:

>>> start = datetime(2013, 5, 5, 12, 30)
>>> end = datetime(2013, 5, 5, 17, 15)
>>> for r in arrow.Arrow.range('hour', start, end):
...
print(repr(r))
...
<Arrow [2013–05–05T12:30:00+00:00]>
<Arrow [2013–05–05T13:30:00+00:00]>
<Arrow [2013–05–05T14:30:00+00:00]>
<Arrow [2013–05–05T15:30:00+00:00]>
<Arrow [2013–05–05T16:30:00+00:00]>

And there it is on the third line! As I read more of the documentation, I learned that range is a method that must be called from the Arrow class rather than directly from the module. I don’t yet understand why this was written as such, but the original syntax makes more sense now:

date_range = arrow.Arrow.range(
'day', arrow.get(args.date), arrow.get(args.end_date)
)

We’re creating a range between the date argument and the end_date argument, which each element in the iterable being separated by one day. The script goes on to run execute additional logic for each date in the range, of course.

Bonus: [key] vs get(key)

This wasn’t enough to write a whole post about, but I did learn something else about Python as it relates to selecting values from a dictionary. Within the same function, I saw both of these styles implemented:

dict = {'a': True, 'b': False}
print(dict['a'])
// True
print(dict.get('a'))
// True

Both of these return the same value by referencing the a key in the dictionary, so why use one over the other? It turns out that get has a built in default that will save us in case of an KeyError:

print(dict['c'])
// KeyError: 'c'
print(dict.get('c'))
// None

Get handles the error by returning None, so we don’t have to worry about our script failing if a certain key is not found in our dictionary. That’s not necessarily better — for example during development we might want to throw the error so we’re aware of missing values. Like with both subjects of this post, it’s helpful to know more about our tools so that we can use them properly. And so that we can understand why others made certain decisions in their code.

Sources

--

--

Mike Diaz
Mike Diaz

Responses (1)