Kafka 101 Tutorial: Getting started with Confluent's Kafka

Theodore Curtil
data engineer @ Acosom GmbH

About the Series

This article is the first of a series of articles about Apache Flink and Kafka. The goal of this series is to provide more examples of how to use Apache Kafka and Flink in production, taking a step by step approach. Indeed, this series is centered around a use-case we had to solve at Acosom, for one of our clients. With this step by step approach, I will introduce in each post one component of our solution; or the solution to a problem we faced during the implementation for our client. The main themes are going to be:

  • Getting started with Apache Kafka, using Confluent's community images
  • Getting started with Apache Flink using the open source distribution only
  • How to integrate both services together to provide scalable, reliable stream processing solutions

Introduction

All the code and content of this post is available on Github: https://github.com/theodorecurtil/kafka_101.

As this post is a technical post helping the reader to get started with stream processing, I highly recommend to clone the Github repo and to play around with the code. Also, the code in this article will be a building block for the rest of the series, and we will add complexity and services as we progress through the series. Please also note that code provided in this series will always be deployment-oriented. This means that in each repo there will be a docker-compose file to boot the infrastructure and the services up. This will simulate the local development setup that the reader should use when developing software applications.

We show how to run a local Kafka cluster using Docker containers. We will also show how to produce and consume events from Kafka using the CLI.

The Infrastructure

The cluster is set up using Confluent images. In particular, we set up 4 services:

  1. Zookeeper
  2. Kafka Server (the broker)
  3. Confluent Schema Registry (for use in later article...)
  4. Confluent Control Center (the UI to interact with the cluster)

Note that Kafka 3.4 introduces the capability to move a Kafka cluster from Zookeeper to KRaft mode. At the time this article is written, Confluent still has not released the new Docker image with Kafka 3.4. As such, we still use Zookeeper in this tutorial. For a discussion about Zookeeper and KRaft, refer to this article.

Services will be powered up and orchestrated using docker-compose. Let us quickly review the configurations.

Zookeeper

As usual, we need to attach Zookeeper to our Kafka cluster. Zookeeper is responsible for storing metadata regarding the cluster (e.g. where partitions live, which replica is the leader, etc...). This "extra" service that always needs to be started alongside a Kafka cluster will soon be deprecated; as metadata management will be fully internalized in the Kafka cluster, using the new Kafka Raft Metadata mode, shortened to KRaft.

Confluent's implementation of Zookeeper provides a few configurations, available here.

In particular, We need to tell Zookeper on which port to listen to connections from clients, in our case Apache Kafka. This is configured with the key ZOOKEEPER_CLIENT_PORT. Once this port is chosen, expose the corresponding port in the container. This configuration alone is enough to enable communication between the Kafka cluster and Zookeeper. The corresponding configuration is available below, as used in our docker-compose file.

Kafka Server

We also need to configure a single Kafka broker, with a minimum viable configuration. We need to specify the port mappings, and the networking settings (Zookeeper, advertised listeners, etc...). In addition, we set some basic logging and metrics configurations.

Details about the configuration can be found on Confluent website; and all configurations can be found here.

Start the Cluster

To start the cluster, start by cloning the repo; and "cd" into the repository, locally.

Make sure that ports that will be mapped from localhost are not already used; and that you do not have running containers with same names as the ones defined in our docker-compose.yaml file (check the container_name configuration key).

To start the cluster, simply run the command

Depending on the Docker version you have, the command might be

To check that all services are started, type the command

The output should be:

You should now be able to access the control-center container, which is the Confluent UI for Kafka cluster management on localhost:9021. Please refer to online ressources for a guided tour of Confluent control center.

Produce and Consume Messages using the CLI

With Kafka, there are the notions of Producers and Consumers. Simply put, producers are client applications writing data to the cluster, and consumers are applications reading data from the cluster. Consumers are ultimately doing the work from the data they read (e.g. a Flink application would be a consumer).

Confluent provides CLI tools to produce and consume messages from the command line. In this section, we will see the following:

  1. Create a topic
  2. Write (produce) to the topic
  3. Read (consume) from the topic

To access the CLI tools, we need to enter the broker container.

Create a Topic

For this example, we will create a simple topic in the cluster, with default configurations. We will create a topic named my-amazing-topic with a replication factor of 1 and a partitioning of 1. This means that messages will be not be replicated (1 message is only stored in one server) and will not be partitioned (1 partition is same as no partitioning). This means that the topic will be sharded in 1 log.

To instantiate this topic; run the following command from within the broker container

If the command succeeds, it will output

One can also check that the topic was successfully created by navigating to the Topics tab of the web UI; where the newly created topic should be listed with the *Healthy* status.

Topic created screenshot

Produce to the Topic

Now that we have a topic created with a default configuration, we can start producing records to it! Still from within the container, run the following command, and send your messages.

This command will produce 2 messages to the topic `my-amazing-topic` without a key, and with values foo and bar, some strings.

One can see that the messages were produced to the topic and are persisted in the topic by navigating to the Topics tab.

Messages produced screenshot

If you click on the Schema tab, you will notice that no schema is present. This means that the topic can contain records with different schema, like strings or json strings. No schema is enforced; which is obviously not a good practice in production; hence the need for the schema-registry container. But do not worry about it now, we will touch that point in our next blog post where we will be building a small producer application pushing Avro records to Kafka, with schema validation.


Consume from the Topic

The final step is to consume the messages we just produced, from the topic. To do that, type the following command from within the container


Kill the Cluster

Once you have played enough with your Kafka cluster, you might want to bring it down. To do this, "cd" into this project repo again and docker-compose down the infrastructure.

What is Coming Next?

In a next blog post, we will see how to - starting from this vanilla Kafka infra - produce Avro records to Kafka. Stay tuned! For more updates, follow me on Twitter :)

Acosom Insights