Beginning with Apache Kafka

Published in

LiveKlass

3 min readJun 13, 2019

What is Apache Kafka?

Apache Kafka is a distributed publish-subscribe based append-only messaging platform originally developed by LinkedIn and Later Donated to Apache Foundation.

Use Cases

Messaging
Log aggregation
Stream processing
Commit Log
Event Sourcing etc

How Does It Work?

Kafka works in the pub-sub model. So there will be a consumer/consumer group and publisher.

The producer will publish data to a specific topic and consumer subscribed to the topic will consume data. Multiple producers could publish to the same topic but only one consumer is able to consume data from a topic at a time.

Now you may ask if only one consumer is allowed to consume from a topic at a time and if there are thousands of events coming how do we scale them ?

Okay. In that case, we can split the topic into partitions to achieve scalability.

When we do partition of the topic each message in the partition has an offset_id/message_id. In the picture of left we have made 3 partitions and each producer will send an event to a specific partition. And the consumer will read from a specific partition. Now if we have 300 event/sec being sent, it will be divided into 3, so each partition will have 100 messages/sec. In this way, we can achieve Load-balancing. When we do partition, Kafka only ensures partition level message order. To achieve order with partitioning we have to use key.

If we get events increasing we have to do more partitions and add more consumers to scale and achieve Load-Balancing.

Once we receive the event we have to commit log to Kafka, so Kafka will understand that the event has been processed and Kafka won’t send it again. If in the middle of processing our app crushes Kafka will send that event again to process.

Replication Factor :

Replication factor is how data in Kafka will be replicated with different Kafka brokers. Let’s say we have chosen replication factor N = 3, the data will be replicated to 2 more brokers. So Kafka will still work if N-1 broker goes down, cause there is at least one broker alive with the data to keep it working.

What is Apache Zookeeper ?

Apache Zookeeper is a centralized service for distributed systems to a hierarchical key-value store, which is used to provide a distributed configuration service, synchronization service, and naming registry for large distributed systems. Apache Kafka doesn’t work without Zookeeper. Kafka cluster management, if a broker comes up or goes down, if a new topic created….this type of data being synchronized with different Kafka broker through Zookeeper.

Spinup Kafka >

docker-compose up

Consumer (Golang) >

Producer (Kotlin) >

Result >

Waiting for messages...Topic :  users.notification
Partition :  0
Offset :  1
Value :  hello

Resources :

1. github.com/confluentinc/confluent-kafka-go
2. compile group: 'io.opentracing.contrib', name: 'opentracing-kafka-client', version: '0.1.2'

Beginning with Apache Kafka

Written by Sakib Sami