Messaging Queues
Message queues enable asynchronous communication between services, decoupling producers from consumers and providing resilience, scalability, and flexibility in distributed systems.
Why Message Queues?
Synchronous (Tight Coupling):
┌──────────┐ HTTP ┌──────────┐ HTTP ┌──────────┐
│ Service A│───────▶│ Service B│───────▶│ Service C│
└──────────┘ └──────────┘ └──────────┘
If B is down, A fails. A must wait for B+C.
Asynchronous (Loose Coupling):
┌──────────┐ ┌─────────┐ ┌──────────┐
│ Service A│───────▶│ Queue │───────▶│ Service B│
└──────────┘ └─────────┘ └──────────┘
A publishes and continues. B processes when ready.| Benefit | Description |
|---|---|
| Decoupling | Producer doesn’t know about consumers |
| Async Processing | Don’t block on slow operations |
| Load Leveling | Absorb traffic spikes, process at own pace |
| Reliability | Messages persisted until processed |
| Scalability | Add more consumers to increase throughput |
Core Concepts
Message Queue vs Message Broker vs Event Stream
Message Queue (Point-to-Point):
┌──────────┐ ┌───────┐ ┌──────────┐
│ Producer │────▶│ Queue │────▶│ Consumer │
└──────────┘ └───────┘ └──────────┘
Message removed after consumption
Message Broker (Pub/Sub):
┌──────────┐ ┌────────┐ ┌────────────┐
│ Publisher│────▶│ Topic │────▶│ Subscriber1│
└──────────┘ └────────┘ ├────────────┤
│ │ Subscriber2│
└────────▶└────────────┘
Message delivered to all subscribers
Event Stream (Log-based):
┌──────────┐ ┌─────────────────────────┐ ┌──────────┐
│ Producer │────▶│ Partition (append-only) │────▶│ Consumer │
└──────────┘ └─────────────────────────┘ └──────────┘
Messages retained, consumers track offsetDelivery Semantics
| Guarantee | Description | Use Case |
|---|---|---|
| At-most-once | Message may be lost, never duplicated | Metrics, logs (loss acceptable) |
| At-least-once | Message never lost, may be duplicated | Most applications (with idempotency) |
| Exactly-once | Message delivered exactly once | Financial transactions |
Message Ordering
| Ordering | How | Tradeoff |
|---|---|---|
| No ordering | Messages processed in any order | Maximum parallelism |
| Partition ordering | Ordered within partition/queue | Balance of order + scale |
| Global ordering | Single partition/queue | Limited throughput |
Apache Kafka
Distributed event streaming platform designed for high-throughput, fault-tolerant, real-time data pipelines.
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Kafka Cluster │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Topic: orders │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Partition 0 │ │ Partition 1 │ │ Partition 2 │ │
│ │ [0,1,2,3,4] │ │ [0,1,2,3] │ │ [0,1,2] │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │
│ │Broker 1 │ │Broker 2 │ │Broker 3 │ │
│ │(Leader) │ │(Leader) │ │(Leader) │ │
│ │ │ │ │ │ │ │
│ │Replica │ │Replica │ │Replica │ │
│ │of P1,P2 │ │of P0,P2 │ │of P0,P1 │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Producers ──▶ Partitions (by key hash) ──▶ Consumer GroupsKey Concepts
| Concept | Description |
|---|---|
| Topic | Category/feed name for messages |
| Partition | Ordered, immutable sequence of messages |
| Offset | Unique ID for each message within partition |
| Broker | Kafka server that stores data |
| Producer | Publishes messages to topics |
| Consumer | Reads messages from topics |
| Consumer Group | Group of consumers sharing partition load |
Partitioning Strategy
# Default: hash(key) % num_partitions
partition = hash(order_id) % 3
# Same key always goes to same partition
# Guarantees ordering for that key| Strategy | When to Use |
|---|---|
| Key-based | Need ordering per entity (user, order) |
| Round-robin | Maximum parallelism, no ordering needed |
| Custom | Special routing logic |
Consumer Groups
Topic: orders (3 partitions)
Consumer Group A (3 consumers):
┌────┐ ┌────┐ ┌────┐
│ C1 │ │ C2 │ │ C3 │
└──┬─┘ └──┬─┘ └──┬─┘
│ │ │
P0 P1 P2 ← Each consumer gets 1 partition
Consumer Group A (2 consumers):
┌────┐ ┌────┐
│ C1 │ │ C2 │
└──┬─┘ └──┬─┘
│ │
P0,P1 P2 ← C1 handles 2 partitions
Consumer Group B (independent):
┌────┐
│ C1 │
└──┬─┘
│
P0,P1,P2 ← Gets all messages (separate group)Kafka Guarantees
| Setting | Guarantee | Performance |
|---|---|---|
acks=0 | Fire and forget | Fastest, may lose |
acks=1 | Leader acknowledged | Balanced |
acks=all | All replicas acknowledged | Slowest, safest |
When to Use Kafka
- High-throughput event streaming (100K+ msg/sec)
- Event sourcing and CQRS
- Log aggregation
- Real-time analytics pipelines
- Change data capture (CDC)
- Microservices communication
RabbitMQ
Traditional message broker implementing AMQP protocol with flexible routing.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ RabbitMQ │
│ │
│ ┌──────────┐ ┌──────────┐ ┌─────────┐ │
│ │ Producer │────▶│ Exchange │────▶│ Queue │────▶Consumer │
│ └──────────┘ └──────────┘ └─────────┘ │
│ │ │
│ Routing Key │
│ + Binding │
└─────────────────────────────────────────────────────────────┘Exchange Types
Direct Exchange:
Producer ──▶ Exchange ──routing_key="order.created"──▶ Queue A
──routing_key="order.shipped"──▶ Queue B
Fanout Exchange:
Producer ──▶ Exchange ──▶ Queue A
──▶ Queue B (all queues get all messages)
──▶ Queue C
Topic Exchange:
Producer ──▶ Exchange ──"order.*.us"────▶ Queue A (US orders)
──"order.created.*"──▶ Queue B (all created)
──"#"───────────────▶ Queue C (everything)
Headers Exchange:
Routes based on message headers instead of routing key| Exchange Type | Routing Logic | Use Case |
|---|---|---|
| Direct | Exact routing key match | Task queues |
| Fanout | Broadcast to all queues | Notifications |
| Topic | Pattern matching (*, #) | Flexible routing |
| Headers | Header attribute matching | Complex routing |
Message Acknowledgment
# Manual acknowledgment (safe)
def callback(ch, method, properties, body):
process_message(body)
ch.basic_ack(delivery_tag=method.delivery_tag)
# Auto acknowledgment (risky - message lost if consumer crashes)
channel.basic_consume(queue='tasks', auto_ack=True)When to Use RabbitMQ
- Complex routing requirements
- Request/reply patterns (RPC)
- Priority queues
- Traditional task queues
- When you need message TTL
- Smaller scale (thousands msg/sec)
Amazon SQS
Fully managed message queue service from AWS.
SQS Types
Standard Queue:
┌──────────┐ ┌─────────────────┐ ┌──────────┐
│ Producer │────▶│ SQS Standard │────▶│ Consumer │
└──────────┘ │ - Best-effort │ └──────────┘
│ ordering │
│ - At-least-once │
│ - Unlimited TPS │
└─────────────────┘
FIFO Queue:
┌──────────┐ ┌─────────────────┐ ┌──────────┐
│ Producer │────▶│ SQS FIFO │────▶│ Consumer │
└──────────┘ │ - Strict order │ └──────────┘
│ - Exactly-once │
│ - 3000 msg/sec │
└─────────────────┘| Feature | Standard | FIFO |
|---|---|---|
| Throughput | Unlimited | 3,000 msg/sec (batching: 30K) |
| Ordering | Best-effort | Guaranteed |
| Delivery | At-least-once | Exactly-once |
| Deduplication | None | 5-minute window |
Visibility Timeout
┌──────────┐ ┌──────────┐
│ Consumer │ 1. Receive msg │ SQS │
│ A │◀───────────────────│ │
└──────────┘ │ [msg] │
│ (hidden) │
Processing... └──────────┘
┌── If processed: Delete message
│
└── If timeout expires: Message visible again
(another consumer can process)Dead Letter Queue (DLQ)
┌──────────┐ ┌────────┐ ┌──────────┐
│ Producer │────▶│ Queue │────▶│ Consumer │
└──────────┘ └───┬────┘ └──────────┘
│
After N failures
│
▼
┌──────────┐
│ DLQ │ ← Investigate failed messages
└──────────┘When to Use SQS
- AWS-native applications
- Simple queue needs without complex routing
- Serverless architectures (Lambda integration)
- When you want managed infrastructure
Comparison: Kafka vs RabbitMQ vs SQS
| Feature | Kafka | RabbitMQ | SQS |
|---|---|---|---|
| Model | Event log | Message broker | Queue |
| Ordering | Per partition | Per queue | FIFO only |
| Retention | Configurable (days/forever) | Until consumed | 14 days max |
| Replay | Yes (offset reset) | No | No |
| Throughput | Very high (100K+/sec) | Medium (10K/sec) | High (unlimited for Standard) |
| Routing | Topic/partition | Flexible exchanges | None |
| Scaling | Partitions | Queues | Automatic |
| Managed | Confluent/MSK | CloudAMQP | Native AWS |
| Complexity | High | Medium | Low |
Decision Matrix
| Use Case | Recommended |
|---|---|
| High-throughput event streaming | Kafka |
| Event sourcing / replay needed | Kafka |
| Complex routing logic | RabbitMQ |
| Request/reply (RPC) | RabbitMQ |
| AWS serverless | SQS |
| Simple task queue | SQS or RabbitMQ |
| Real-time analytics | Kafka |
| Log aggregation | Kafka |
Event-Driven Architecture
Event Types
| Type | Description | Example |
|---|---|---|
| Event Notification | Something happened, minimal data | OrderCreated { order_id } |
| Event-Carried State Transfer | Full state in event | OrderCreated { order_id, items, total, customer } |
| Domain Event | Business-meaningful occurrence | PaymentReceived, InventoryReserved |
Event Sourcing
Store all changes as a sequence of events, derive current state by replaying.
Event Store:
┌────────────────────────────────────────────────────────────┐
│ Order #123 │
├────────────────────────────────────────────────────────────┤
│ 1. OrderCreated { items: [...], total: 100 } │
│ 2. PaymentReceived { amount: 100 } │
│ 3. OrderShipped { tracking: "ABC123" } │
│ 4. OrderDelivered { timestamp: "2024-01-15" } │
└────────────────────────────────────────────────────────────┘
Current State = Replay(Event 1 → 2 → 3 → 4)| Pros | Cons |
|---|---|
| Complete audit trail | Complexity |
| Replay/rebuild state | Event schema evolution |
| Temporal queries | Eventually consistent |
| Debug by replaying | Storage growth |
Saga Pattern
Manage distributed transactions across services using events.
Choreography (Event-driven):
┌─────────┐ OrderCreated ┌─────────┐ PaymentProcessed ┌─────────┐
│ Order │───────────────▶│ Payment │──────────────────▶│Inventory│
│ Service │ │ Service │ │ Service │
└─────────┘ └─────────┘ └─────────┘
▲ │
└─────────────────InventoryReserved──────────────────────┘
Orchestration (Central coordinator):
┌────────────────┐
│ Saga │
│ Orchestrator │
└───────┬────────┘
┌─────────────┼─────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Order │ │ Payment │ │Inventory│
└─────────┘ └─────────┘ └─────────┘| Approach | Pros | Cons |
|---|---|---|
| Choreography | Loose coupling, simple | Hard to track, circular dependencies |
| Orchestration | Clear flow, easy to track | Central point of failure |
Patterns and Best Practices
Idempotency
Ensure processing a message multiple times has the same effect as processing once.
def process_order(order_id, idempotency_key):
# Check if already processed
if redis.get(f"processed:{idempotency_key}"):
return # Already done
# Process order
create_order(order_id)
# Mark as processed
redis.set(f"processed:{idempotency_key}", "1", ex=86400)Outbox Pattern
Ensure reliable message publishing with database transactions.
┌──────────────────────────────────────────────────┐
│ Transaction │
│ ┌─────────────────┐ ┌─────────────────────┐ │
│ │ Update Order │ │ Insert into Outbox │ │
│ │ SET status = │ │ (order_id, event, │ │
│ │ 'confirmed' │ │ payload, status) │ │
│ └─────────────────┘ └─────────────────────┘ │
└──────────────────────────────────────────────────┘
│
Background job
│
▼
┌─────────────┐
│ Kafka/Queue │
└─────────────┘Dead Letter Queue Handling
def process_with_dlq(message, max_retries=3):
try:
process(message)
except Exception as e:
retry_count = message.headers.get('retry_count', 0)
if retry_count < max_retries:
# Retry with backoff
message.headers['retry_count'] = retry_count + 1
queue.publish(message, delay=2 ** retry_count)
else:
# Send to DLQ for investigation
dlq.publish(message)
alert_ops_team(message, e)Backpressure
Handle producer faster than consumer scenarios.
| Strategy | Description |
|---|---|
| Buffering | Queue absorbs burst, process later |
| Dropping | Discard messages when overwhelmed |
| Rate limiting | Slow down producer |
| Scaling | Add more consumers |
Interview Quick Reference
Common Questions
-
“How would you ensure exactly-once processing?”
- Idempotent consumers with deduplication keys
- Transactional outbox pattern
- Kafka transactions (for Kafka-to-Kafka)
-
“Design a notification system”
- Fanout pattern with topic/exchange
- Priority queues for urgent notifications
- DLQ for failed deliveries
-
“How do you handle message ordering?”
- Partition by entity key (Kafka)
- Single queue per entity
- Accept eventual consistency with versioning
-
“What happens if a consumer crashes mid-processing?”
- Messages re-delivered (at-least-once)
- Visibility timeout (SQS)
- Consumer group rebalance (Kafka)
Numbers to Know
| Metric | Value |
|---|---|
| Kafka throughput (single partition) | 10-50 MB/sec |
| Kafka throughput (cluster) | 100K+ msg/sec |
| Kafka retention default | 7 days |
| RabbitMQ throughput | 10-20K msg/sec |
| SQS Standard throughput | Unlimited |
| SQS FIFO throughput | 3,000 msg/sec |
| SQS message retention | 4 days (max 14) |
| SQS visibility timeout default | 30 seconds |
Design Checklist
- Delivery guarantee needed? (at-least-once, exactly-once)
- Ordering requirements? (none, partition, global)
- Message retention and replay needed?
- Throughput requirements?
- Routing complexity?
- Consumer failure handling? (retry, DLQ)
- Idempotency strategy?
- Monitoring and alerting? (lag, failures)
- Schema evolution strategy?