Systems Self-Defense Part 3

Published 2025-06-05

See Part 1 and Part 2

No, no, wait, hold on

So y’all may have been a bit surprised I brought up redis (or valkey) as a data-persistence option and then say “nah, we’re not doing that.”

“But wait,” I hear you say. “Redis and Valkey persist their data, just like postgres and sqlite… why are you giving that up? They’re fast, keep their data in-memory. What’s not to love?”

True, as an SRE, I prefer to over-build things (to a point). I go for twisted pair wrapped in insulating plastic, not dry-core, paper wrapped lines. You might say “ah, that’s overkill” but I like to plan for a little bit of future-proofing.

So what’s wrong with valkey, or any other in-memory sharded solution? Durability and Scale, at precisely the point where you do not want scaling problems or questions about your persistence.

First steps

Let’s look back at Part 1. In it, we had a single service instance that could only write as fast as the underlying filesystem (and operating system) could handle.

For a 7200rpm spindle drive, the disk can handle 50-100 IOPS. For an older, middle-rank SATA SSD (like a Samsung EVO 870), that hits around 100k IOPS. SATA supports upwards of 6 Gbits/s, but unless you’ve got over 10Gbit network interfaces (a step-up from most common consumer hardware, which caps out at 2.5 or 5Gbit), the service will hit a bottleneck at the network card before it hits the disk limit. Let’s not even consider the (basic) NVMe drives that can easily hit 10x the SATA SSD IOPS.

Even from there, the operating system, the CPU, memory, or the service implementation in code can all provide hard limits to what can be handled at maximum speed and, ultimately, those limits are constrained to a single instance, a single machine.

Running a simple stress-test against an HTTP service, replying in-memory, we can get anywhere from 7k RPS to 40k RPS depending on how the implementation is optimized. That’s about the limit of a single host – we can do more than ~100k RPS with some clever tweaking, or reducing the amount of work, but let’s consider that a single instance doing meaningful work will theoretically max-out at 100k RPS.

Once we start spawning multiple instances, across multiple machines, our limits are now (theoretically) unconstrained by a single disk or a single network adapter. That “100k RPS” can be multiplied by N number of additional instances. We’re now limited by how quickly those instances can communicate with each other or some other, external system.

Durability with Valkey / Redis

This brings us to Redis/Valkey. Note, I’m going to refer to Valkey as “redis”, but I do understand the distinction both in implememntation and licensing. Let’s set aside the licensing problem and look at the available functions.

You can read directly from the Redis team about benchmarking redis and pitfalls of the service here. If we review some of the risks/issues associated with redis, it says (emphasis mine):

The general indication you should use both persistence methods is if you want a degree of data safety comparable to what PostgreSQL can provide you.

This statement alone should make us all pause for a moment: if postgres is safer than redis, what is redis doing for persistence?

It has a point-in-time snapshot (RDB) that requires duplicating the instance and writing the dataset to disk, and an append-only-file (AOF) that writes the operations to disk as they are sent to the server. Both of these persist a recovery-style file to the filesystem. If your data is important, the combination of RDB and AOF is necessary to ensure complete recovery of your persisted data.

We have to ask ourselves, “is this data critical to our service?” Considering our only function is to accept a value and return a key that corresponds to the value, yes, this data is very critical to our service.

Durability with Postgres

Postgres may be looked at as “just another database”, but it is well-known for being customizable, flexible, and highly reliable, unencumbered by licenses or restrictions on its use.

Postgres is a disk-first storage system – while it will hold data temporarily in memory, its first priority is persisting the relevant bits to disk. This means that your maximum dataset size is not limited to how much memory your host has but instead is limited by the filesystem.

Due to the SQL-nature of its implementation, Postgres is also able to handle complex data objects in a way the other systems cannot. While that is not necessarily relevant to our current system (text -> key), allowing more complexity through adding fields/columns, ownership, access controls, etc, would require extensions in other environments. With any SQL-based relational database, such a request is usually “just another column/table.”

That additional complexity often comes at a performance cost, but what we lose in performance is also buoyed by multiple levels of validation and write-durability. In the case of Postgres, it prioritizes accuracy in a query over performance, such that a single INSERT (or even a SELECT) can trigger multiple writes to disk. Over time, Postgres will optimize files-in-memory to ensure speedy delivery of commonly requested data and, at any time, can provide analysis of where indexes or tables are consulted via the EXPLAIN directive. Anecdotally, postgres handles 10TB+ databases with the same speed and efficiency as it would a 10GB database.

Durability in Redis

By comparison, redis with AOF writes every command to disk and can support some complex structures such as transactions. It even has a ft.explain directive, yet the operations are expected to occur entirely in memory.

Compared to disk, memory is both expensive and limited in size. The largest memory capacities (as of 2025) are 512GB or 1TB per chip, at approximately $15,000 USD per chip. Compare this to a refurbished 26TB hard drive that costs $290 USD (around $11/TB).

Let’s also consider this separate from cost: should a dataset exceed the size of available system memory, not only will redis be unable to hold more data, it won’t be able to persist said data to our AOF or RDB files.

Durability with redis, therefore, is limited to what memory your system supports and per discussion in part-1 is a limited-availability resource.

Benching Valkey

If we stick to single-item GET and SET, redis can handle 120k RPS on a multi-core machine with gigabytes of memory allocated. The documentation also flags these as O(1) operations, indicating that they can be completed in constant time.

If we start using the mSET and mGET, the throughput drops by at least a third (to 70% original throughput) and only gets worse as the set size increases. If we try and mitigate this by switching to the hash-oriented operations, we run into the same single- vs multiple-key drop in performance.

Testing Valkey locally on a relatively unloaded dual-xeon machine, the system supported ~135k RPS using SET and GET, but only up to 95k RPS (30% drop!) as soon as “only 10 keys” were done using MSET:

$ valkey-benchmark | grep -e "======" -e "throughput summary"
====== PING_INLINE ======
  throughput summary: 131406.05 requests per second
====== PING_MBULK ======
  throughput summary: 135135.14 requests per second
====== SET ======
  throughput summary: 135501.36 requests per second <--
====== GET ======
  throughput summary: 134589.50 requests per second <--
...
====== HSET ======
  throughput summary: 136239.78 requests per second <--
...
====== LPUSH (needed to benchmark LRANGE) ======
  throughput summary: 134408.59 requests per second
====== LRANGE_100 (first 100 elements) ======
  throughput summary: 73800.73 requests per second
====== LRANGE_300 (first 300 elements) ======
  throughput summary: 30646.64 requests per second
====== LRANGE_500 (first 500 elements) ======
  throughput summary: 18910.74 requests per second
====== LRANGE_600 (first 600 elements) ======
  throughput summary: 15933.72 requests per second
====== MSET (10 keys) ======
  throughput summary: 95969.28 requests per second <--
...
====== FCALL ======
  throughput summary: 136798.91 requests per second

You can also see a precipitous drop in throughput as the LRANGE operators grow in size. None of this is exactly a problem – it tells us where the boundary lines are for our solution!

If redis can handle up to 135k RPS for small values, then so long as we limit our throughput to around 115k RPS (85% of max), we’re great!

While 115k RPS seems like a lot, consider that you have to divide that up across all of your service instances. So if you have ten instances concurrently accessing the redis server, an even distribution would be: 115_000 / 10 = 11_500/instance

Your instances might only be able to sustain around 12_000 operations per second individually, if they’re all working continuously.

Benching Postgres

(baseline)
$ pgbench -c 10 -j 2 -t 10000
query mode: simple
number of clients: 10
number of threads: 2
maximum number of tries: 1
number of transactions per client: 10000
latency average = 5.271 ms
initial connection time = 16.112 ms
tps = 1897.185198 (without initial connection time)
--
$ pgbench -c 100 -j 12 -t 10000
...
number of clients: 100
latency average = 16.658 ms
tps = 6003.288229 (without initial connection time)
--
$ pgbench -c 1000 -j 12 -t 10000
...
number of clients: 1000
latency average = 171.768 ms
tps = 5821.795094 (without initial connection time)

Postgres looks like it can handle about 6k transactions* per second (TPS) across a multithreaded system. Anecdotally, I’ve seen systems hit upwards of 8k TPS, but let’s assume a relatively unoptimized machine is our first attempt.

6k is definitely below 115k, but we’re talking about a transaction-based unoptimized benchmark as well. Depending on how our dataset grows and what the EXPLAIN command tells us, we can add indices, make optimizations to how we query the data, or even read replicas to offload read-only queries to a separate instance.

* Note: While we refer to “transactions” as opposed to “requests”, this is due to Postgres handling many operations through a (potentially) multi-statement transaction, rather than the simplicity of a “GET/SET” statement in redis.

Boundary Conditions and Data Volumes

One thing we’ve hinted at but not developed is how much data our service is going to need. The way our design is constructed, we “accept text” but never defined how much text is “too much” – what if someone sends us the entire body of War and Peace as a single text string?

While intermediary systems might reject that particular example before it ever gets to us, we should be in a position to set an upper-limit on the bytes we are handling per-request, per-transaction, with our service.

Let’s say we want to accept a maximum length of 100 characters in a single request, and we’ll return at most a 64-bit unsigned number (i.e., 18_446_744_073_709_551_615). Because UTF8 encoding can use up to 4 bytes per “character”, we need to account for 4 x 100 bytes (plus some padding for encoding, wraps, etc) for the text we are to receive. Because very few implementations will allocate “400 bytes” for an object, we should probably round-up our measurement to 1kb (1024 bytes) per received object. (This also let’s us raise our acceptable length limit in the future, should we choose.)

If every line used the maximum size value, napkin math can give us some early “breaking points” for various solutions.

Redis is limited to memory and let’s say the “average” server has at least 64GB of memory to dedicate to the process. If we save ~4GB for the operating system, that leaves us with 60GB to allocate. If we use ~1kb per entry, that leaves us with 62_914_560 or around 60M records before we start to exhaust redis’ ability to store data. We can spend significantly more money and get a machine with 256GB of memory, but the added cost only buys us up to ~250M records (a 4x increase).

After that, in order to scale, we need to talk about sharding the dataset and running multiple instances of redis, and coordinating their operations and backup.

For Postgres, as long as our operations per second don’t exceed ~5500, we can persist as much data as our server-disks can hold. The Dell R750 rack server comes with a single, default disk size of 600GB. If we carve out 100GB for the operating system and other things, that leaves us with 500GB for the database, which provides us a minimum of 500M records before we start looking for additional storage options. Since we know this will be a database server, we can purchase six or seven third-party SAS drives for ~$14/TB and immediately add 100TB of usable storage.

(If you want to use AWS or some cloud provider to manage your machines, you can always use the EBS-equivalent and scale a single disk up to 64TiB, according to their documentation as of 2025. That gives you 63-billion records without changing anything about your database processor or memory.)

Conclusions

Ultimately, what we have in the choice of postgres, valkey/redis, or any other data storage solution, is a set of trade-offs. While redis may be extremely high-performance, it has a limited storage capacity without expensive scaling options. Postgres has significantly higher storage capacity, but is limited to much lower rates of execution.

Each one of the solutions can be scaled, but when it needs to scale matters just as much as how it can be scaled. This leads us to a set of directives we can apply to any storage or execution question:

If scaling requires more compute capacity, that is more expensive than adding storage capacity. If we observe significantly more throughput demand than we do storage, lean in to the higher-performance option (redis). If we observe low-throughput, long-term data storage, and/or need flexibility in our future storage systems, take the higher-durability option (postgres).