Embrace and Replace: Migrating ZooKeeper into Kubernetes

We recently migrated hundreds of ZooKeeper instances from individual server instances to Kubernetes without downtime. Our approach used powerful Kubernetes features like endpoints to ease the process, so we’re sharing the high level outline of the approach for anyone who wants to follow in our footsteps. See the end for important networking prerequisites.

Traditional ZooKeeper Migrations

ZooKeeper is the foundation of many distributed systems, allowing them a powerful platform to rendezvous to take attendance and form clusters. Behind the scenes, it relies on a comparatively basic approach to forming clusters: each server instance has a config file listing all of the member hostnames and numeric ids, and all servers have the same list of servers, like this:

server.1=host1:2888:3888

server.2=host2:2888:3888

server.3=host3:2888:3888

Each server has a unique file called myid to tell it which numeric id it corresponds to in that list.

Adding and removing hosts can be done as long as a key rule isn’t violated: each server must be able to reach a quorum, defined as a simple majority, of the servers listed in its config file. The traditional way to migrate a ZooKeeper server to a new instance involves, at a high level:

Configure and start a new host with “server.4=host:4…” in its server list
Update the config files on the existing hosts to add the new server entry and remove the retired host from their server list
Rolling restart the old hosts (no dynamic server configuration in the 3.4x branch)
Update connection strings in clients (perhaps just changing CNAME records if clients re-resolve DNS on errors)

The downside of this approach is many config file changes and rolling restarts, which you might or might not have solid automation for. When we set out to move ZooKeeper to Kubernetes we started thinking about this approach, but figured out an easier way. Safer too, because in our experience each new leader election has a small risk of taking long enough to bring down the systems relying on them.

New Approach

Our approach involves wrapping existing ZooKeeper servers in Kubernetes services and then does one-for-one server-to-pod replacements using the same ZooKeeper id. This requires just one rolling restart to reconfigure existing ZK instances, then shutting down the servers one by one. We won’t get into the weeds on the ways to configure Kubernetes topologies for ZooKeeper here, nor the low-level readiness checks, because there are many ways to do that with various pros and cons. The concepts discussed below work the same no matter the top-level topology.

We’ll proceed in five steps:

Complete prerequisites to ensure our ZooKeeper cluster is ready to migrate
Create ClusterIP services in Kubernetes that wrap ZooKeeper services
Configure ZooKeeper clients to connect to ClusterIP services
Configure ZooKeeper server instances to perform peer-to-peer transactions over the ClusterIP service addresses
Replace each ZooKeeper instance running on a server with a ZooKeeper instance in a Kubernetes pod

For each of the steps below, we’ll include a diagram of our infrastructure topology. The diagrams will contain only two ZooKeeper instances for ease of understanding, even though one wouldn’t want to ever create a cluster with fewer than three.

Complete Prerequisites

Starting with a working ZooKeeper cluster, we’ll want to ensure that services on the host are capable of communicating with our Kubernetes cluster. We include a few ways to accomplish that at the end of this article.

ZK1 Figure 1: Our starting state. A two-instance ZooKeeper cluster and some clients

Create ClusterIP Services

Create a ClusterIP service with matching Endpoint resources for each ZooKeeper server. They should pass the client port (2181) as well as the cluster-internal ports (2888,3888). With that done you should be able to connect to your ZooKeeper cluster via those service hostnames. Kubernetes ClusterIP services are useful here because they give you static IP addresses that act as load balancers to backend pods. In this case we're using them with a 1:1 mapping of service to pod so that we have a static IP address for each pod.

ZK_2 Figure 2: Our cluster, with ZooKeeper still on physical hardware, is reachable via ClusterIP Services

Reconfigure Clients

Once you’re able to connect to your ZooKeeper cluster via Kubernetes ClusterIP services, this is a good time to pause to reconfigure all your clients. If you’re using CNAME records in ZooKeeper connection strings, change the DNS records. Restart all the clients if they don’t have recent clients that will re-resolve DNS entries on connection failures. If you’re not using CNAME records then you’ll need to roll out new connection strings and restart all client processes. At this time, old and new connection strings will still work.

ZK_3 Figure 3: Clients are now communicating with our ZooKeeper cluster using ClusterIP service instances

Reconfigure ZooKeeper Instances

Next up, we’ll make our ZooKeeper servers do their peer-to-peer communication via those ClusterIP services. To do this, we’ll modify the config files to incorporate the addresses of the ClusterIP services. It’s also essential to configure the zk_quorum_listen_all_ips flag here: without it the ZK instance will unsuccessfully attempt to bind to an ip address that doesn’t exist on any interface on the host, because it's a Kube service IP.

server.1=zk1-kube-svc-0:2888:3888
server.2=zk2-kube-svc-1:2888:3888
server.3=zk3-kube-svc-2:2888:3888
zk_quorum_listen_all_ips: true

Rolling restart those hosts and now we’re ready to start replacing hosts with pods.

ZK-K8s Migration Blog Post Images

Figure 4: ZooKeeper instances are now communicating with their peers using ClusterIP service instances

Replace ZooKeeper Hosts with Pods

One server at a time we’ll do the following steps:

Select a ZK server and its corresponding ClusterIP service
Shut down ZK process on the server
Start a pod configured with the same server list and myid file as the shut-down ZK server
Wait until ZK in the pod has started and synced data from the other ZK nodes

That’s it, your ZooKeeper cluster is now running in Kubernetes with all of the previous data.

ZK_5

Figure 5: Cluster after one round of pod-replacement. ZK1 is now running in a pod without ZK2 knowing anything has changed

Networking Prerequisites

For these steps to work well, there’s some network setup to handle. You need to take steps to ensure the following:

Kubernetes pod IP addresses need to be routable from all servers that need to connect to ZooKeeper
All servers connecting to ZooKeeper must be able to resolve Kubernetes service hostnames
Kube-proxy must be running on all servers that need to connect to ZooKeeper so that they can reach the ClusterIp services

These can be accomplished in a few ways. We use an in-house network plugin similar to Lyft's https://github.com/lyft/cni-ipvlan-vpc-k8s plugin or AWS's https://github.com/aws/amazon-vpc-cni-k8s which assign AWS VPC IP addresses to pods directly, instead of using a virtual overlay network, so all of our pod IPs are routable from any instance. Overlay networks like flannel (https://github.com/coreos/flannel) would work too, as long as all of your servers are attached to the overlay network.

Embrace and Replace: Migrating ZooKeeper into Kubernetes

Traditional ZooKeeper Migrations

New Approach

Complete Prerequisites

Create ClusterIP Services

Reconfigure Clients

Reconfigure ZooKeeper Instances

Replace ZooKeeper Hosts with Pods

Networking Prerequisites

Recommended Articles

Six Benefits of Posting Your Questions Publicly

Slacking hard, or hardly Slacking: Automating infrastructure at scale

Boston Scalable Architecture Meetup: A Night of Learning

Join our subscribers

Embrace and Replace: Migrating ZooKeeper into Kubernetes

Traditional ZooKeeper Migrations

New Approach

Complete Prerequisites

Create ClusterIP Services

Reconfigure Clients

Reconfigure ZooKeeper Instances

Replace ZooKeeper Hosts with Pods

Networking Prerequisites

Recommended Articles

Six Benefits of Posting Your Questions Publicly

Slacking hard, or hardly Slacking: Automating infrastructure at scale

Boston Scalable Architecture Meetup: A Night of Learning

Join our subscribers

Get Connected