Kafka vs RabbitMQ: How I Actually Decide in Production

The most common question I get from backend devs is some version of this: “should I use Kafka or RabbitMQ?” And the honest answer is that most people are asking the wrong question.

The right question is simpler. What happens to the message after it’s consumed? If it can be discarded, use RabbitMQ. If it needs to exist, be revisited, or be reprocessed, use Kafka. That one distinction resolves most of the confusion.

But since the simple answer never explains anything on its own, let me show you how I think through this in practice, including what I’ve seen go wrong at scale.

Before We Had Good Tooling

It helps to understand what problem each tool was actually built to solve, because that shapes everything about how they behave.

RabbitMQ came out of a very practical need: how do we reliably distribute work across multiple processes? You have tasks coming in, you have workers waiting to process them, and you need a broker in the middle to coordinate. That’s it. That’s the core job.

Kafka came from a different problem, one LinkedIn was facing in the early 2010s: how do we move massive amounts of event data between systems without losing anything? Not tasks, events. Things that happened. Data that needed to flow to analytics, to search indexes, to fraud detection, all at the same time. A traditional message queue wasn’t the right fit for that. So they built a log.

That origin story matters. Kafka is a distributed log that happens to work like a messaging system. RabbitMQ is a message broker that happens to be very good at routing. They look similar from the outside, you put something in, something comes out, but the model underneath is completely different.

“Kafka is a distributed log that happens to work like a messaging system. RabbitMQ is a message broker that happens to be very good at routing.”

How RabbitMQ Actually Works

RabbitMQ uses a push-based model. You publish a message to an exchange, the exchange routes it to one or more queues based on bindings, and RabbitMQ pushes those messages to consumers. When a consumer processes the message and sends an ACK, the message is gone.

@RabbitListener(queues = "payment.processing")
public void handlePayment(PaymentMessage message) {
    paymentService.process(message);
    // ACK is sent automatically - message is gone
}

// Meanwhile, in your config:
@Bean
public Queue paymentQueue() {
    return new Queue("payment.processing", true); // durable
}

@Bean
public DirectExchange exchange() {
    return new DirectExchange("payments");
}

The routing model is one of RabbitMQ’s strongest features. With direct, topic, fanout, and headers exchanges you can build very precise routing logic between services, something Kafka doesn’t offer natively. If you need a message to go to specific consumers based on a routing key, RabbitMQ handles that elegantly.

Latency is also worth calling out. RabbitMQ can achieve sub-millisecond latency in real conditions. If you’re building something where the time between publish and consume matters at that resolution, RabbitMQ is the right choice.

How Kafka Actually Works

Kafka uses a pull-based model. Producers write events to topics, which are partitioned and stored on disk. Consumers pull from those topics at their own pace, tracking their position with an offset. The key difference: the message is not deleted after consumption. It stays there, for however long you configure retention to be.

@KafkaListener(topics = "fraud.events", groupId = "fraud-detector")
public void consume(ConsumerRecord<String, FraudEvent> record) {
    fraudService.analyze(record.value());
    // offset committed manually - message still exists in the log
}

// Kafka config that matters:
spring.kafka.consumer.auto-offset-reset=earliest   // rewind to start
spring.kafka.consumer.enable-auto-commit=false     // manual control
spring.kafka.producer.acks=all                     // wait for all replicas

This changes the entire programming model. With Kafka, multiple consumer groups can read the same topic independently, each with their own offset. Your fraud detection service, your analytics pipeline, and your audit log can all consume the same payment event without any of them knowing the others exist.

And if something goes wrong, a bug in your consumer or a bad deployment, you rewind the offset and reprocess. The data is still there.

The Comparison That Actually Matters

DimensionKafkaRabbitMQ
ModelLog-based, pullQueue-based, push
RetentionPersisted on disk with configurable TTLConsumed and gone after ACK
ReplayYes, rewind offset anytimeNo, transient by design
ConsumersMultiple independent groupsOne consumer per message
RoutingBy partition keyExchanges and bindings, very flexible
ThroughputMillions of messages per secondHundreds of thousands of messages per second
LatencyHigher because of batchingSub-millisecond possible
OperationsMore complex: partitions, offsets, consumer groupsSimpler to run and reason about

How to Decide in Practice

I use a simple set of questions before committing to either tool. Work through them in order, the first one that fires is usually your answer.

Use Kafka When

  • Multiple services need to consume the same event independently.
  • You need replay, reprocessing historical data after bugs or deploys.
  • You’re building an audit log or event sourcing architecture.
  • Throughput is in the millions of messages per second.
  • You need a single source of truth for what happened in your system.

Use RabbitMQ When

  • You have discrete tasks that need to run exactly once.
  • Complex routing between queues is a requirement.
  • Low latency matters more than high throughput.
  • You want simpler operational overhead.
  • You’re distributing work across multiple workers.

Common mistake: The most frequent error I see is teams using Kafka as a glorified task queue. You end up inheriting all the operational complexity, partition management, offsets, consumer groups, lag monitoring, without needing any of it. If your use case is “send this email” or “resize this image”, RabbitMQ does the job with a fraction of the infrastructure overhead.

Can You Use Both?

Yes, and in complex systems you often will. At a payment company, for example, you might use Kafka to stream all transaction events across the platform, fraud detection, analytics, and audit all consuming the same topic, while using RabbitMQ to distribute the actual work of processing individual payments across a pool of workers.

They complement each other. Kafka handles the event backbone. RabbitMQ handles the task distribution. The mistake is trying to force one to do the job of the other.

The underlying principle is always the same one I come back to: understand the problem before you pick the tool. Kafka and RabbitMQ both solve real problems well. Choosing the wrong one doesn’t mean the system won’t work, it means you’ll spend the next six months fighting unnecessary complexity instead of building features.

And that’s the real cost of the wrong choice.