This post number 1 in the series Stream Processing Dev Setup. Find an overview of all posts in this series here.

Now that Kafka provides their own Docker Image, I can really run my own Kafka Server in Docker Containers. I love that!

I have got a development setup which I use quite often, so I would like to write that down here.

Motivation

Mainly at work, I am using Apache Kafka quite a lot. For quick tests or similar stuff, I want a system that is self-contained, I run one command that opens up the whole thing, and it should just work™.

I made some decisions that I find reasonable for a development setup:

  • I only have one broker. The typical recommendation is to have at least three brokers for redundancy and high availability; I don’t need that in my dev setup. One broker is fine.
  • I also don’t care about one or more separate controllers. This might be important in a production setting, but is really not required for development.
  • I do not have data persistence. My idea is that I can docker compose up -d if I need a cluster to develop or test something, and then docker comopse down again when I am done. I don’t want to persist topics or any other data.

Compose File Contents

Here’s the relevant part of my Compose file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
services:
  broker:
    image: apache/kafka:4.0.0
    container_name: broker
    ports:
      - 9092:9092
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT_HOST://0.0.0.0:9092,CONTROLLER://broker:9093,PLAINTEXT://broker:29092
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://127.0.0.1:9092
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT_HOST:PLAINTEXT,PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@broker:9093
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
      KAFKA_NUM_PARTITIONS: 3
  schema-registry:
    image: confluentinc/cp-schema-registry
    hostname: schema-registry
    container_name: schema-registry
    depends_on:
      - broker
    ports:
      - 8081:8081
    environment:
      SCHEMA_REGISTRY_HOST_NAME: schema-registry
      SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: broker:29092
      SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8081

This gives me the following:

  • A Kafka broker. This one is a combined broker and controller server, which is fully self-contained and does not need any external dependencies.
    Most of the configuration should be familiar to you if you know your way around Kafka, so I won’t go into too much detail about it. There are just a few caveats for Docker Stuff:
    • Listeners and advertised Listeners: I have got three listeners defined: PLAINTEXT_HOST, CONTROLLER and PLAINTEXT. The controller is just defined as a listener and not really important for development.
      • PLAINTEXT: This one listens on port $29092$. It can be used for communication in containers that run in the same network as the Kafka container (typically, containers that live in the same Docker Compose File). Inside the containers, use bootstrap.servers = broker:29092.
      • PLAINTEXT_HOST: If you are running applications on your host outside of Docker, you can just use bootstrap.servers = localhost:9092, and everything works as expected.
    • The default number of partitions is set to $3$. Do with that what you want.
    • A set of topic replication factors is set to $1$. This is important because we only have one broker; hence a higher replication factor would create problems.
  • A Schema Registry. This one does not do a lot.
    • You can communicate with it via it’s web server on http://localhost:8081, or on http://schema-registry:8081 for other containers.
    • It uses the broker for storage.

Caveats

  • As already discussed, this gives you a one-broker setup. For me, this is not a bug, it’s a feature.
  • Note the Confluent’s Schema Registry Image is not a FOSS image, you are restricted to Confluent’s Community License thingy. I am working on an alternative, but for now, this is good enough for me.
  • Again, a feature that you may interpret as a bug: There is no data persistence! Run docker compose down and everything is gone. Run docker compose up afterwards, and you restart from scratch, without previous data. Update the container version, you loose everything.

Augmentations

This setup is by far not complete, there are loads of extensions that I would like to do. Here are some thoughts:

  • Somehow based on this, I am thinking of migrating my production Kafka cluster at home to Docker containers, which makes dependency management much easier. In case I do that, I will probably blog about it; though I know that they are not production ready.
  • I have played around with a three-broker-setup on one machine. This is also just for development and playing around, but this will probably become another blog post in the future.
  • There are loads of services that one can add and require. From top of my head, I am thinking about Kafka Connect, maybe Apache Flink or KSQL, some UI, and some others. I will probably blog about these at a later time.

Still, this setup is a minimum viable setup to start developing against a Kafka Cluster.

Conclusion

It’s nice that Apache builds their own Docker Image now, and I am looking forward to properly containerize my development setup. I am also looking forward to extend this blog post to a kind of series with more services I am using.