Looking into performance of the new EBS gp3 volumes

Just recently, I found out that AWS has introduced a new type of Elastic Block Storage called gp3 in addition to the popular gp2 volume type.

The new gp3 volumes promise a baseline of 3000 IOPS at no additional cost, compared to the gp2 volumes, where IOPS depend on the size of the volume. To get the same amount of baseline IOPS on gp2, you’d have to provision a volume of 1000GB (3 IOPS/GB). Before gp3, if you needed a higher IOPS baseline, but didn’t need the extra size, you were stuck with either ephemeral SSD’s or io2 volumes, which are ridiculously expensive. gp3 solves this by cranking the default all the way up to 3000 IOPS — very nice! They’re also 20% cheaper than gp2… so what’s the catch?

Should I convert all gp2 volumes to gp3 now?

Why not convert everything to gp3 right now, save the money and get the extra benefit of increased IOPS?

We don’t yet know the exact details, as gp3 volumes are generally being marketed as “gp2, but better and cheaper”. Their product information table shows gp2 and gp3 to be identical, except from this little clue:

gp2: General Purpose SSD volume that balances price performance for a wide variety of transactional workloads

gp3: Lowest cost SSD volume that balances price performance for a wide variety of transactional workloads

So, it’s down to “Lowest cost SSD” vs “General Purpose SSD” and what does that actually mean?Lowest Cost SSD” does sound like something is degraded, but what is it? We know that they know, so can we please know as well?

So I decided to just try it out on something under heavy load, but not production critical.

Test results

I’ve converted a few volumes on an internal elasticsearch analytics cluster. The most outstanding difference is latency and queue length.

(The red pointer is where I made the switch from gp2 to gp3)

Average Read Latency

Significant jump in read latency — whoa!

Average Write Latency

Significant jump in write latency — Not as high as the read, but still a good jump.

Average Queue Length

Here, we see an increase as well, compared to the previous two days. The queue length is my default go-to metric if I need to identify disks not performing well and this is definitely not someting that pleases my eye.

What does this mean?

It shows as expected, that there is no such thing as a free lunch. When AWS writes “Lowest cost SSD”, it does actually come with the cost of some degraded metrics. It’s perfectly fine, it’s just a shame that their product specification does everything it can, to convince you that gp3 is the same as gp2, just cheaper and better. I would have appreciated to get this information up front to be able to make more informed decisions.

I’m aware that this was a very high level and brief overview, I just wanted to put it out there for those of you wondering, like I did, if you should just convert all volumes immediately.

As with everything else, it depends.

Let me know

If anyone has any information on the actual difference between these two types of volumes, feel free to let me know and I’ll add it here.

CTO of Cludo. Passionate about devops and distributed systems. Father of two. Opinions are my own. @silasmorkgard