My company uses an SFTP (secure file transfer protocol) to share report files with clients. This week, we took some time to focus on the capacity of our SFTP and contingencies if we ever have to add more space. A member of my team was asked to build a dashboard and a monitor in Datadog and, though I had no idea where to start, I volunteered to help if I could.
Luckily for us, we got an assist from another member of our team who has a lot more experience with Datadog. I quickly got lost watching him deftly navigate their UI to develop a dashboard so I knew that this was a perfect opportunity for me to spend some time blogging. As he was working, he shared some entries from Datadog’s incredible blog. If you’re like me, reading about cloud computing requires a bit of decoding. So today, I’ll be working my way through an entry on Key Metrics for Amazon EBS monitoring and researching terms and concepts that I’m not familiar with. Welcome aboard!
The post starts out Amazon Elastic Block Storage (EBS) is persistent block-level storage as a service that works in conjunction with EC2 instances. I found two things confusing about this sentence:
- Wikipedia explains that block-level storage is a type of cloud storage where data behaves as it would if it were stored on a physical device like a hard drive. Such devices are also known as block devices, hence the name. In block-level storage, cloud services are divided into “blocks” and given arbitrary identifiers, just like physical blocks. Traditional file systems are then mapped over these blocks so that we can traverse and access our files. Other cloud storage formats include bucket stores (like Amazon S3) and instance stores (Amazon EC2)
- I couldn’t make it to the end of a sentence before I had to look up something else. EC2 instances are generally used for computing rather than storage because they take up temporary space on virtual computers. This is why block-level storage works well in conjunction with instance stores: the instance can run the process while it needs to be run and the blocks can persist the data even after the instance has been terminated. The Datadog blog says as much: Unlike EC2 instance store volumes, which are ephemeral and lose any data once the instance is destroyed, EBS volumes maintain their state when stopped or detached from an instance.
Throughput and Bandwidth Capacity
After the introduction to EBS, the Datadog blog starts a section called Staying Connected. I had to take another step back when I got to this sentence: If multiple high-throughput EBS volumes are attached to an instance with limited bandwidth capacity, the drives will never reach their maximum capacity.
So we already know that EBS volumes can be attached to an EC2 instance to help provide additional memory. I’ve seen the word throughput a lot (especially when reading Datadog documentation) but I realized I couldn’t confidently explain what it meant. I consulted the trusty TechTerms, which told me that throughput refers to how much data can be transferred from one location to another in a given amount of time. So I’m imagining how much data can be put through this volume over a specific period of time, usually on a per-second basis.
Looking back on the sentence that challenged me, it seems to indicate that we can add memory capacity to an EC2 instance with multiple EBS volumes, but the instance itself is limited by its bandwidth capacity. The blog’s example is an “r4.large instance” with a capacity of 425 Mbps (megabits per second). Throughput is measured in MB/s, or Megabytes per second (remember there are eight bits in a byte). No matter how many EBS volumes we add to our r4.large instance, we’re never going to be able to exceed about 53 MB/s of throughput because 53 megabytes is 424 megabits and our instance’s maximum capacity is 425 megabits per second.
I/O and KiB
The next part of the blog covers EBS disk types, which are broken up into two primary categories: solid-state drives (SSD) and hard disk drives (HDD). The author starts with SSD, explaining that the main attribute of SSD drives is high levels of I/O operations per second, or IOPS.
Searching I/O wasn’t specific enough to return tech-specific results (apparently I/O is also a type of psychology), but a search for IOPS brought me to Lunavi.com, where I learned that it stands for input/output operations per second. More practically speaking, we can think of this as the number of read and write operations a drive is capable of each second. For physical drives, this is determined by how frequently the drive’s disk can spin. Solid-state drives don’t have physical disks, which is why their I/O is higher than hard disk drives.
The Lunavi post goes on to explain block sizes, which determine how much space is taken up by our I/O operations. If we’re moving large files, we might want to make sure our block sizes can reach up to 1 MB. But this would mean that we might not be able to take advantage of a high IOPS because we’ll be limited by throughput. We can’t put multiple files into a single block, so we want to be thoughtful about block size — if we’re moving a lot of 4 kilobyte files, it would be inefficient to store them in 1 MB blocks.
This fits in perfectly with what I read on Datadog: because they work best with smaller operations, the individual I/O size, or block size, is capped at 256 KiB for SSD volumes. I hadn’t seen the abbreviation KiB before, but it turns out it’s just another way to say kilobyte. So SSD provides high IOPS with smaller block size and HDD works with fewer IOPS, but larger block sizes. Per Datadog, it’s especially useful for handling large, sequential I/O quantities. This is because EBS is able to fit multiple files in the same block if they are sequential.
I started this post because I was trying to catch up to my colleagues as they built a monitor and dashboard for disk space. Once I finally understood the terms used to measure EBS performance, I was able to start reading about Datadog-specific metrics for evaluating the health and efficiency of our EBS volumes. It was encouraging to see this:
I had just taken the time to understand what these categories are and how they relate to each other! Though it took me 2–3 hours to get through the first few paragraphs of Datadog’s post (extra long because I was also writing about them), I made it through the rest much more quickly. We can use Datadog to make sure that we’re making the most of our block size by measuring VolumeReadBytes and VolumeWriteBytes, as well as IOPS with VolumeReadOps and VolumeWriteOps. I’m looking forward to returning to this project with the guidance of an experienced Datadog users and the new domain knowledge I’ve gained.
- Key Metrics for Amazon EBS monitoring, Maxim Brown (Datadog)
- Block-level storage, Wikipedia
- Throughput, TechTerms
- Know Your Storage Constraints: IOPS and Throughput, Joe Kozlowicz
- Files size units: “KiB” vs “KB” vs “kB”, Stack Exchange