Introduction to Zookeeper System Design

Blog > Introduction to Zookeeper System Design

SVCIT Editorial Jul 9, 2021

A Decrease font size. A Reset font size. A Increase font size.

Apache Zookeeper

Apache Zookeeper is a pillar for so many distributed applications because of its unique features. It uses as coordination between distributed applications. It exposes a simple set of primitives to implement higher-level services for synchronization, configuration, maintenance, groups, and naming. Zookeeper’s design is easy to use and program. It is run on java and has bindings for Java, Python, and C language.

Apache Zookeeper also provides service for distributed open-source centralizes, and coordination:

Maintaining configuration information: Sharing configuration information across all nodes.
Naming: Name the cluster of 1000s servers
Providing distributed synchronization: Locals, Barriers, Queues
Providing Groups Services: Leader selection

Companies Using Zookeeper System Design

Yahoo

Twitter
Netflix
Facebook

Why need Apache Zookeeper System Design

Coordination Services: The integration/communication of service in a distributed environment.
Coordination services are complex to get right. They are especially prone to errors such as race conditions and deadlock.
Race condition – Two or more operations trying to perform the same task.
Deadlock – Two or more operations have to wait for each other.
Relieve distributed applications with the responsibility of implementing coordination services from scratch.

“Primitive” Operations in a Distributed System

Master Election
- One node registers itself as a master and holds a “lock” on that data
- Other nodes cannot become masters until that lock is released
- Only one node is allowed to hold the lock for processing at a time

Crash Detection
- “Ephemeral” data on a node’s availability automatically goes away if the node disconnects or fails to refresh itself after some time-out period.
Group Management
Metadata
- List of outstanding tasks, task assignments

Apache Zookeeper

Sequential Consistency

Updates from any particular client can apply in the order.

Atomicity

Updates either succeed or fail.

Single System Image

A client will see the same view of the system. The new server will not accept the connection until it has caught up.

Durability

Once an update has succeeded, it will persist and will not be undone.

Timeliness

Rather than allow a client to see very stale data, a server will shut down.

Features

Apache Zookeeper also has the following characteristics:

It is simple
Zookeeper is replicated
It is ordered
Zookeeper is fast

Apache Zookeeper Design Goals

Simple

A shared hierarchical namespace looks like a standard file system. The namespace consists of data registers- called Znodes, and these are similar to files and directories.
Zookeeper allows distributed processes to coordinate with each other through a shared hierarchical namespace organized similarly to a standard file system.
Data will store in memory
Achieve high throughput and low latency numbers
High performance
- Used in a large, distributed system

Highly available
- No single point of failure

Strictly ordered access
- Synchronization

Unlike a typical file system designed for storage, Zookeeper stores data in memory, which means Zookeeper can achieve high throughput and low latency numbers.

Apache Zookeeper is Replicated

Zookeeper itself is intended to be replicated over a set of hosts called an ensemble.
The server that makes up the Zookeeper service must all know about each other.
They maintain an in-memory image of the state, along with transaction logs and snapshots in a persistent store.
As long as a majority of the servers are available, the Zookeeper service will be available.

Ordered

Zookeeper also stamps each update with a number that reflects the order of all Zookeeper transactions.

The number:

Reflects the order of transactions.
Used implement higher-level abstractions, such as synchronization primitives.
Subsequent operations can use the order to implement higher-level abstractions, such as synchronization primitives.

Zookeeper is Fast

It is especially fast in “read-dominant” workloads.
Zookeeper applications run on thousands of machines, and it performs best where reads are more common than writes, at ratios of around 10:1.

Multiple Updates

Batches together multiple operations to gather
Either all fail or succeed in their entirety
Possible to implement transactions
Others never observe any inconsistent state

Author: SVCIT Editorial Copyright

Silicon Valley Cloud IT, LLC.

Introduction to Zookeeper System Design

Blog > Introduction to Zookeeper System Design

What is Apache Zookeeper System Design?

Apache Zookeeper

Companies Using Zookeeper System Design

Why need Apache Zookeeper System Design

“Primitive” Operations in a Distributed System

Apache Zookeeper

Sequential Consistency

Atomicity

Single System Image

Durability

Timeliness

Features

Apache Zookeeper Design Goals

Simple

Apache Zookeeper is Replicated

Ordered

Zookeeper is Fast

Multiple Updates