The question
I would like to understand how the default Kafka Partitioner actually works.
The research
- Kafka provides an interface1
Partitioner
here which should be implemented by every partitioning method. - If not otherly defined, the
DefaultPartitioner
is used, as defined here. At the time of my research, this still was the default; now this one is deprecated. I have to find out what the replacement is. - The partitioning logic is quite eaesy:
- If a partition is defined then use it.
- If no key is given then return something called
StickyPartition
- this probably is the result of the deprecation. Find I have to check on that one. - If a partition is given then in the end, the method
partitionForKey
is called, as defined here. The code is:return Utils.toPositive(Utils.murmur2(serializedKey)) % numPartitions
.serializedKey
is just a Bytes object. Generation of that one is unique and reversible.numPartitions
is the number of partitions, unique for a topic.murmur2
is a Hashaing algorithm, used for Lookups.- So in the end we get a number in the Modulo
numPartitions
space. One can easily see that the operation is deterministic for every key.
The result
The topic itself does not play a role when computing the partition for a message, only the number of partitions. This also explains why Co-Partitioning of topics works: If we have the same key for two messages in separate topics then the only the number of partitions plays a role for finding the partition assignment. If this number is equal then also the assigned partition is equal.
Another thing to note is that partitioning happens at Consumer Level. So we have to trust the Consumer Code (which may also be the rdkafka
C++ implementation) to correctly implement the same method.
Next steps
Find out why the Partitioner is deprecated and what the new default Partitioner looks and works like. Helpful references for that may be KIP-480 or KIP-794 as they were mentioned next to the deprecation.
An interface is essentially the abstraction of a class. A specific class (in our case, a Partitioner) needs to implement the interface specification. ↩︎