Deploying Redis Cluster and Sentinel on a two-node cluster

The application behind the flow simulator at redis.kappenhagen.io uses Redis for two different jobs, and it runs two separate Redis deployments to do them. One is a cache, where losing a value costs you a slightly slower request. The other holds sessions, where losing a value logs someone out. Those are opposite tolerances, so they get opposite setups. Here is how both run on a small two-node cluster, and the one placement detail that makes two nodes worth anything for availability.

Why two deployments

A cache and a session store want different things from Redis. A cache is allowed to evict keys when it runs low on memory, and it is allowed to come back empty after a restart, because the database is still the source of truth and the cache just refills. A session store can do neither. Evicting a session or dropping it on restart kicks a user out. So the cache runs as a Redis Cluster, which shards data across nodes and is happy to evict, and the sessions run behind Redis Sentinel, which keeps replicas and fails over to keep the data reachable. Same Redis, two configurations chosen around what each one is allowed to lose.

The cache: a three-shard cluster

The cache is a Redis Cluster of six pods, three masters and three replicas, which is three shards. Redis Cluster splits its keyspace into 16384 hash slots and hands a third of them to each master. Every master has a replica that can be promoted if the master fails. On a single node this all just works and hides the only hard question, which is where those six pods actually land.

Each shard keeps its master and its replica on different nodes. Lose a node and every shard still has a surviving copy.

The placement that matters

With only two nodes, the trap is letting a shard put its master and its replica on the same node. If that node dies, both copies of that shard go with it, and the cluster is missing a third of its slots. The cluster will not serve until those slots come back. The fix is a pod anti-affinity rule that tells Kubernetes to keep a shard's master and replica apart. With that in place, losing a node costs you at most one copy of each shard, the replicas on the surviving node get promoted, and the cluster stays up on three masters. This is the single setting that decides whether a two-node cluster actually survives a node failure or just looks like it should.

The sessions: Sentinel and persistence

The session store is a separate Redis deployment with Sentinel turned on. Sentinel is a small companion process that watches the master, and when enough Sentinels agree the master is gone, they elect a replica to take over and steer clients to the new master. The quorum is how many of them have to agree before that happens. This deployment also turns on persistence from the first install, both the append-only file and RDB snapshots, so sessions survive a full restart of the pod as well as a failover. The cache does not bother with this, because an empty cache is fine. The session store cannot afford to come back empty.

Deploying both

Add the chart repository.

helm repo add bitnami https://charts.bitnami.com/bitnami

helm repo update
Install the cache as a Redis Cluster.

helm install redis-cluster bitnami/redis-cluster \ --namespace data \ --set password=YOUR_PASSWORD \ --set image.repository=bitnamilegacy/redis-cluster \ --set image.tag=7.4.1-debian-12-r0

NoteThe chart creates three masters and three replicas by default, which is the three shards. Bitnami moved their images off the free Docker Hub tier, so the repository is pointed at the bitnamilegacy image to keep pulls working.
Make sure a shard's master and replica never share a node.

kubectl get pods -n data -o wide

NoteThis is the placement from the diagram. The chart exposes pod anti-affinity settings for it; the goal is that each shard's two pods land on different nodes. Confirm it with the command above, which prints the node each pod is on. If a master and its replica share a node, fix the affinity before you rely on the cluster.
Install the sessions store with Sentinel.

helm install redis-sentinel bitnami/redis \ --namespace data \ --set sentinel.enabled=true \ --set sentinel.quorum=2 \ --set replica.replicaCount=2 \ --set password=YOUR_PASSWORD

NoteThis is the bitnami/redis chart, not redis-cluster. Each pod runs two containers, the Redis server and its Sentinel. A quorum of 2 means two Sentinels must agree before a failover starts.
Turn on persistence so sessions survive a restart.

--set commonConfiguration="appendonly yes"

NoteRDB snapshots are on by default; this adds the append-only file on top, so the store can rebuild from disk after a full restart rather than coming back empty.
Verify both are healthy.

redis-cli -a YOUR_PASSWORD cluster info

redis-cli -a YOUR_PASSWORD -p 26379 info replication

NoteThe cluster is ready when cluster_state:ok and cluster_slots_assigned:16384. The session store is ready when replication reports a master with its replicas online. Sentinel tracks the master itself, so the app asks Sentinel for the current master rather than hardcoding an address.

Where Sentinel failover gets complicated on Kubernetes

Once both are running, it is worth understanding how the session store behaves when something actually breaks, because Kubernetes changes the picture Sentinel was designed for. Sentinel expects to watch a master at a fixed address, decide it is down, and promote a replica. On Kubernetes a pod's address is not fixed: pods get rescheduled and come back with new IPs, so you reach the master through a stable name instead of a raw IP. That stable name is where the subtlety lives.

Testing this on the live cluster turned up two very different failure modes. Deleting the master pod outright was a non-event. The StatefulSet recreated it faster than Sentinel's down-after threshold, so Kubernetes healed the pod before Sentinel ever decided to fail over, and the session data was never at risk. Freezing the process with a SIGSTOP behaved much the same once it was resumed.

Taking the whole node down was the interesting one. Cordoning and deleting the node took the master pod with it, and the name Sentinel was using to reach the master stopped resolving. Instead of promoting a replica, Sentinel sat there reporting that it could not resolve the hostname. A name that no longer exists is a different problem from an address that stops answering, and hostname-based monitoring does not fail over cleanly through it. Recovery came from bringing the node back, not from a Sentinel failover.

The lesson is that on Kubernetes you have two recovery systems stacked on top of each other, the Kubernetes controllers and Sentinel, and they do not always cooperate the way the textbook Sentinel diagram implies. Sometimes Kubernetes heals the problem before Sentinel acts. Sometimes the very thing that makes a pod reachable, its name, is what stops Sentinel from doing its job. It is worth knowing which layer is actually handling a given failure rather than assuming Sentinel covers all of them. It is also why the failure scenarios in the flow simulator are walked through as simulations rather than triggered for real on the running cluster. The deployment builder in front of it exists for the same reason: a place to run those failures against any layout without touching anything real.

The result is two Redis deployments sized to two different risks: a sharded cache that is allowed to lose data, and a replicated, persisted session store that is not. The anti-affinity is what makes the two-node layout meaningful, since without it a single node failure can still take a shard offline. The failover behavior above is what keeps it honest about where its limits are.

← All write-ups