Kafka paper notes

Published March 3, 2026

Very good and not too complicated read. The paper is by now outdated, I wasn’t aware that in the initial version there was no built-in message replication between brokers, but the paper came out in 2011, and over the years Kafka has developed a lot.

Notes

distributed messaging system
a stream of messages of particular type is defined by topic
- to balance the load, topic is divided into partitions
published messages are then stored at a set of servers called brokers
messages are just payloads of bytes, it’s up to a user to define how to serialize it
overall architecture is simple:
- producers push the messages to the topic
- consumers poll the topic for any unread messages

// Sample producer code:
producer = new Producer(…);
message = new Message(“test message str”.getBytes());
set = new MessageSet(message);
producer.send(“topic1”, set);

//Sample consumer code:
streams[] = Consumer.createMessageStreams(“topic1”, 1)
for (message : streams[0]) {
  bytes = message.payload();
  // do something with the bytes
}

storage is very simple, each partition of topic corresponds to a logical log
- log is implemented as a set of segment files of approximately the same size (e.g. 1GB)
- for better performance, messages are flushed to disk only after a certain amount of time has passed
- consumers only see flushed messages
there is no explicit message id, each message is addressed by its logical offset in a log
consumers always consume sequentially, once the offset is committed, it is acknowledged that messages up to that offset has been consumed
- committed offset is always the offset of the next message to consume that still hasn’t been processed
each consumer usually pulls multiple messages at the same time, up to a certain size
messages are never cached at the Kafka layer, it relies on the file system page cache and sendfile Linux API to copy the data from the disk to the socket.
- Kafka counts on near real time message processing so that the data is still in page cache due to write-through cache, when it’s read
Kafka has the concept of consumer groups which consists of one or more consumers that jointly consume a set of topics - each message is delivered to only one consumer
at any given time messages from 1 partition can be consumed by only one a single consumer within a consumer group
- partition in a topic is the smallest unit of parallelism
original design uses Zookeeper for coordination, but this scaled poorly because each offset commit triggered ZAB atomic broadcast protocol to have quorum replication
initial rebalance would cause a hard stop on all processing, to avoid split-brain decisions and have two consumers consuming from the same topic
- topics were assigned in a deterministic way across consumers after a pause
at least once delivery guarantee