~/posts/kafka-paper-notes/

Kafka paper notes


Link to the paper

Very good and not too complicated read. The paper is by now outdated, I wasn’t aware that in the initial version there was no built-in message replication between brokers, but the paper came out in 2011, and over the years Kafka has developed a lot.

Notes

  • distributed messaging system
  • a stream of messages of particular type is defined by topic
    • to balance the load, topic is divided into partitions
  • published messages are then stored at a set of servers called brokers
  • messages are just payloads of bytes, it’s up to a user to define how to serialize it
  • overall architecture is simple:
    • producers push the messages to the topic
    • consumers poll the topic for any unread messages
// Sample producer code:
producer = new Producer(…);
message = new Message(“test message str”.getBytes());
set = new MessageSet(message);
producer.send(“topic1”, set);

//Sample consumer code:
streams[] = Consumer.createMessageStreams(“topic1”, 1)
for (message : streams[0]) {
  bytes = message.payload();
  // do something with the bytes
}
  • storage is very simple, each partition of topic corresponds to a logical log

    • log is implemented as a set of segment files of approximately the same size (e.g. 1GB)
    • for better performance, messages are flushed to disk only after a certain amount of time has passed
    • consumers only see flushed messages
  • there is no explicit message id, each message is addressed by its logical offset in a log

  • consumers always consume sequentially, once the offset is committed, it is acknowledged that messages up to that offset has been consumed

    • committed offset is always the offset of the next message to consume that still hasn’t been processed
  • each consumer usually pulls multiple messages at the same time, up to a certain size

  • messages are never cached at the Kafka layer, it relies on the file system page cache and sendfile Linux API to copy the data from the disk to the socket.

    • Kafka counts on near real time message processing so that the data is still in page cache due to write-through cache, when it’s read
  • Kafka has the concept of consumer groups which consists of one or more consumers that jointly consume a set of topics - each message is delivered to only one consumer

  • at any given time messages from 1 partition can be consumed by only one a single consumer within a consumer group

    • partition in a topic is the smallest unit of parallelism
  • original design uses Zookeeper for coordination, but this scaled poorly because each offset commit triggered ZAB atomic broadcast protocol to have quorum replication

  • initial rebalance would cause a hard stop on all processing, to avoid split-brain decisions and have two consumers consuming from the same topic

    • topics were assigned in a deterministic way across consumers after a pause
  • at least once delivery guarantee