Apache Kafka Architecture — Kafka Component Overview

7 min readAug 10, 2021

Let’s start by answering the question “What is Kafka?”.

Kafka is a Distributed Streaming Platform or a Distributed Commit Log.

Kafka is a distributed streaming platform which can be used for building real-time data pipelines and streaming apps. It is a highly scalable, fault-tolerant distributed system.

Though Kafka started as a publish-subscribe messaging system( similar to message queues or enterprise messaging systems but not exactly), over the years it has involved as a complete streaming platform.

The main Kafka components are topics, producers, consumers, consumer groups, clusters, brokers, partitions, replicas, leaders, and followers.

The following diagram offers a simplified look at the interrelations between these components.

Use Case:

Messaging System
Activity Tracking
Gather metrics from different locations
Application logs gathering
Stream processing
Decoupling of system dependencies
Integrating with Hadoop Spark with big data

Example:

Netflix use: To show to recommendation of TV shows

LinkedIn: To prevent spam

Uber: To gather user, taxi, trip in real data time

Kafka API Architecture

Producer API

The Kafka Producer API enables an application to publish a stream of records to one or more Kafka topics.

Consumer API

The Kafka Consumer API enables an application to subscribe to one or more Kafka topics. It also makes it possible for the application to process streams of records that are produced to those topics.

Streams API

The Kafka Streams API allows an application to process data in Kafka using a streams processing paradigm. With this API, an application can consume input streams from one or more topics, process them with streams operations, and produce output streams and send them to one or more topics. In this way, the Streams API makes it possible to transform input streams into output streams.

Connect API

The Kafka Connector API connects applications or data systems to Kafka topics. This provides options for building and managing the running of producers and consumers, and achieving reusable connections among these solutions. For instance, a connector could capture all updates to a database and ensure those changes are made available within a Kafka topic.

Kafka Cluster Architecture

Topics, Partitions and and offsets
Kafka Topics: A particular stream of data

Similar to a table in a dataset
You can have as many topics you wants
A topic is identified by its name
Topics are split in Partitions
Each partition is ordered
Each msg within a partition get get an incremental id, called offset

Offset only have a meaning for specific partition

Example: offset3 in partition0 does not represent the same data as offset 3 in Partition1

Order is guaranteed only in Partition (not cross partition)
Data is kelp only for a limited period of time(default is 1 week)
Once the data is written into the partition it can’t be changed(immutability)
Data is assigned randomly to the partition unless a key is provided

Kafka Brokers:Kafka Broker manages the storage of messages in the topic(s). If Kafka has more than one broker, that is what we call a Kafka cluster.

A Kafka cluster is made of multiple brokers (servers)
Each broker is identified by its ID
Each broker contain certain topic partitions
After connecting to any broker (called bootstrap broker), you will be connected to the entire cluster
A good no to started broker is 3 but some big cluster have more than 100 broker

Example: in this ex we choose to no brokers starting at 100 (arbitrary)

Brokers and Topics

Topic Replication Factor

Topic should have a replication factor > 1(Usually btw 2 and 3)
This way if a broker is down, another broker can server the data

Example: Topic-A with 2 partition and replication factor of 2

Example: We lost Broker 102

Result: Broker 101 and 103 still server the data

Concept of Leader for Partition

At any time only one broker can be a Leader for the giver Partition
Only that leader can receive and serve the data to a partition
The other broker with synchronize the data
Therefore each partition has one leader and multiple ISR(in sync replica)

Kafka Producers:A Kafka producer serves as a data source that optimizes, writes, and publishes messages to one or more Kafka topics. Kafka producers also serialize, compress, and load balance data among brokers through partitioning.

Producer write data to topics (which is made of partition)
Producer automatically know to which broker and partition to write to
Incase of Broker failure, Producer will automatically recover

Producer can choose to receive acknowledgement of data writes:

You can configure characteristics of acknowledgment on the producer as well.

acks=0: Producer won’t wait for acknowledgement (possibly data loss)

acks=1: Producer will wait for leader acknowledgement (limited data loss)

acks=all: Leader + replicas acknowledgement (no data loss)

Producer Message Keys

Producer can choose to send a key with msg (string, no etc)
If key=null, data is sent into Round robin (broker 101 then 102 and thn 103….)
If a key is sent, then all the msg for that key will always go to the same partition
A key is basically sent is you need msg ordering for a specific field(ex: truck_id)

Kafka Consumers:Consumers read data by reading messages from the topics to which they subscribe. Consumers will belong to a consumer group. Each consumer within a particular consumer group will have responsibility for reading a subset of the partitions of each topic that it is subscribed to.

Consumer read data from a topic (identified by name)
Consumer know which broker to read from
Incase of broker failure, consumers know how to recover
Data is read in order (within each partition)

Consumers Group: A Kafka consumer group includes related consumers with a common task. Kafka sends messages from partitions of a topic to consumers in the consumer group. At the time it is read, each partition is read by only a single consumer within the group. A consumer group has a unique group-id, and can run multiple processes or instances at once. Multiple consumer groups can each have one consumer read from a single partition. If the quantity of consumers within a group is greater than the number of partitions, some consumers will be inactive.

Consumer read data in consumers group
Each consumer within a group read exclusive partitions
If u have more consumers than partitions, some consumer will be inactive

Consumer Offsets

Kafka stores the offset at which a consumer group has been reading
The offset committed live in a Kafka topic named_consumer_offset
When a consumer in a group has processed data retrieve from Kafka, it should be committing the offset
If a consumer dies, it will be able to read from where it left off thanks to the committed consumer offsets

Delivery semantics for consumers

There are 3 delivery semantics

At most once
At least once (usually preferred)
Exactly once

Kafka Broker Discovery

Each Kafka broker call “bootstrap server”
That means you only need to connect only one broker and you will connect to the entire cluster
Each broker knows about all partition, brokers and Topics (metadata)

Zookeeper: Kafka does not function without zookeeper( at least for now, they have plans to deprecate zookeeper in near future). Zookeeper works as the central configuration and consensus management system for Kafka. It tracks the brokers, topics, and partition assignment, leader election, basically all the metadata about the cluster.

Zookeeper manager brokers
It helps in electing the leader for partitions
Sends notification in case of changes (broker dies, broker create, delete topics… etc)
Design operates with odd no (3,5,7…)
Zookeeper has a leader (handle write) other servers are followers (handle read)
Kafka can not work without Zookeeper

Hope It Helps :)

Apache Kafka Architecture — Kafka Component Overview

Kafka API Architecture

Producer API

Consumer API

Streams API

Connect API

Kafka Cluster Architecture

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Kirti Garg

Responses (2)