Imagine you’ve got a Kafka topic and start a new consumer group that consumes from the beginning. Now, imagine this takes time. Lots of time. How do you know how many more messages are left? Can you guess an ETA of how much longer it’ll take to the most recent state?

This was a task I had in hands. I did not find a tool that right out of the box did what I want, but luckily Kafka’s CLI tools enables me to come up with single command line to solve the issue.

If you’re in a hurry, here is the command:

watch -n 1 "bin/kafka-consumer-groups.sh --bootstrap-server <HOST> --describe --group <CONSUMER_GROUP> | awk '{current+=\$4;total+=\$5}END{print (current/total)*100}'"

Setting up the environment

If you don’t plan to see this in action in your environment, or you already have one, then skip this whole section. Really.

What we’re going to do is extremely similar to Kafka’s quickstart.

Start by downloading the tool, enter the folder and start a healthy Kafka cluster with one topic and one consumer group:

# Start Zookeeper (do this in one window and let it keep running)
bin/zookeeper-server-start.sh config/zookeeper.properties

# Start Kafka (do this in another window and let it keep running)
bin/kafka-server-start.sh config/server.properties

# Create a topic with 5 partitions (do this in yet another window)
bin/kafka-topics.sh --create --topic quickstart-events --partitions 5 --bootstrap-server localhost:9092

# Create a consumer group
bin/kafka-console-consumer.sh --topic quickstart-events --group quickstart-group --from-beginning --bootstrap-server localhost:9092

# This last command will keep on running. Stop it. Don't wait.

What you’ve got now is a topic named quickstart-events and a consumer group with 5 partitions named quickstart-group:

$ bin/kafka-consumer-groups.sh --describe --group quickstart-group --bootstrap-server localhost:9092

Consumer group 'quickstart-group' has no active members.

GROUP            TOPIC             PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
quickstart-group quickstart-events 2          0               0               0               -               -               -
quickstart-group quickstart-events 1          0               0               0               -               -               -
quickstart-group quickstart-events 0          0               0               0               -               -               -
quickstart-group quickstart-events 4          0               0               0               -               -               -
quickstart-group quickstart-events 3          0               0               0               -               -               -

Now, we’ve got to send some messages:

# Send 1.000.000 messages 5 times and discard the output
# This may take a few seconds
repeat 5 {seq 1000000 | bin/kafka-console-producer.sh --topic quickstart-events --bootstrap-server localhost:9092 > /dev/null}

If this command fails on you, don’t worry. It is due to repeat. I use ZSH and it works for me, but there are other replacements.

If you ever need more messages, just stop the consumer and add more (or do the whole dance but for another topic).

Let’s check again our consumer:

$ bin/kafka-consumer-groups.sh --describe --group quickstart-group --bootstrap-server localhost:9092

Consumer group 'quickstart-group' has no active members.

GROUP            TOPIC             PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
quickstart-group quickstart-events 2          0               1069508         1069508         -               -               -
quickstart-group quickstart-events 1          0               1046611         1046611         -               -               -
quickstart-group quickstart-events 0          0               967972          967972          -               -               -
quickstart-group quickstart-events 4          0               975416          975416          -               -               -
quickstart-group quickstart-events 3          0               940493          940493          -               -               -

It’s ok for the offset to be different. What’s important here is that the consumer is lagging 5.000.000 messages in total. This number of messages enables us to actually see the consumer’s group progress. There are other ways to do it, but we’ll stick with this one.

Don’t forget, at the end, to clean everything, stop all active executions and clean the leftovers:

rm -rf /tmp/kafka-logs /tmp/zookeeper

The command in action

One marvel of the bin/kafka-consumer-groups.sh --describe command is that it outputs a formatted table space separated (head for the last code snippet in the previous section to see one example).

One way to track progress is to execute the command every few seconds (e.g., watch -n 1), but that does not seem much of an option unless you want to count every lag on each update. Otherwise, you’ll just stare at changing numbers.

It’s time to make use of the nicely formatted table with AWK. We just got to pipe the table and sum the fourth and fifth columns (consumed and total messages, respectively) to get a percentage:

awk '{current+=\$4;total+=\$5}END{print (current/total)*100}'

And we got the command which was presented at the beginning:

watch -n 1 "bin/kafka-consumer-groups.sh --bootstrap-server <HOST> --describe --group <CONSUMER_GROUP> | awk '{current+=\$4;total+=\$5}END{print (current/total)*100}'"

If you did the previous section, then you can see it in action:

# Watch the progress of the consumer group in one window
watch -n 1 "bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group quickstart-group | awk '{current+=\$4;total+=\$5}END{print (current/total)*100}'"

# Start the consumer and discard the output (run in another window)
bin/kafka-console-consumer.sh --topic quickstart-events --group quickstart-group --from-beginning --bootstrap-server localhost:9092 > /dev/null

The output will be a single percentage that may be nifty to use with any other commands.

I hope you enjoyed this little hack. More will come.