Producer Mastery and Message Delivery
Master Kafka producers: batching, compression, partitioning, delivery semantics, and performance tuning. Learn exactly-once semantics and transaction patterns.
How Producers Actually Work
When you call producer.send(), here's what happens behind the scenes:
- Serialization - Your object → bytes
- Partitioning - Which partition should this go to?
- Compression - Optionally compress the message
- Batching - Hold in memory buffer with other messages
- Send - Network call to broker
The magic is in step 4. Producers don't send every message immediately - they batch multiple messages together for efficiency.
The Batching Mechanism
Think of the producer like an Uber driver:
batch.size = 16384 bytes (16 KB)
linger.ms = 10 msThe producer says: "I'll wait up to 10ms or until I have 16KB of messages, whichever comes first."
Small batch.size + low linger.ms = Low latency, lower throughput
Large batch.size + higher linger.ms = Higher latency, higher throughput
Partition Assignment: The Most Important Decision
When you send a message, Kafka needs to decide which partition it goes to:
1. No Key (Round-robin)
producer.send("my-topic", value="hello")
# Goes to random partitionUse when: Order doesn't matter, you want even distribution
2. With Key (Hash-based)
producer.send("my-topic", key="user-123", value="clicked button")
# All messages with key "user-123" go to same partitionUse when: You need ordering per entity (user, order, session)
3. Custom Partitioner
class TimeBasedPartitioner(Partitioner):
    def partition(self, key, all_partitions, available):
        hour = datetime.now().hour
        return hour % len(all_partitions)Use when: You have special routing logic
⚠️ Common Trap
Using a skewed key like user_id when you have power users. If one user generates 10M events and everyone else generates 100, you'll have one hot partition and 99 idle ones.
Better approach:
# Combine multiple attributes
key = f"{user_id}:{session_id}"
# Or use consistent hashing with bounds
key = hash(user_id) % 1000  # Limit key spaceReliability Guarantees: The acks Parameter
 This is where you choose your consistency level:
acks=0 - Fire and forget
 - Producer doesn't wait for any acknowledgment
- Fastest, but can lose data
- Use for: Metrics, logs you don't care about
acks=1 - Leader acknowledgment
 - Wait for leader to write to its log
- Can lose data if leader dies before replication
- Use for: Most use cases (good default)
acks=all - Full ISR acknowledgment
 - Wait for leader + all in-sync replicas
- Slowest, but safest
- Use for: Financial transactions, critical events
Idempotent Producers: Preventing Duplicates
Here's a scenario that keeps people up at night:
- Producer sends message
- Broker writes it successfully
- Network hiccups - producer doesn't get acknowledgment
- Producer retries
- Now you have a duplicate
Solution: Enable idempotence
enable.idempotence = trueHow it works: Producer assigns each message a sequence number. Broker rejects duplicates with same sequence number.
Cost: Slightly lower throughput, but worth it for most use cases.
Transactions: The Nuclear Option
For exactly-once semantics across multiple partitions:
producer = KafkaProducer(
    transactional_id='my-transaction-id',
    enable_idempotence=True,
    acks='all'
)
producer.init_transactions()
try:
    producer.begin_transaction()
    producer.send('topic-a', value='message-1')
    producer.send('topic-b', value='message-2')
    producer.commit_transaction()
except Exception:
    producer.abort_transaction()When to use:
- Financial transactions
- Multi-step workflows that must be atomic
- When you're also consuming from Kafka (read-process-write)
When NOT to use:
- Simple logging or analytics
- When performance matters more than correctness
Producer Configuration Cheat Sheet
Throughput optimization
batch.size = 65536                    # 64 KB
linger.ms = 10
compression.type = lz4
buffer.memory = 67108864              # 64 MBReliability
acks = all
enable.idempotence = true
max.in.flight.requests.per.connection = 5
retries = 2147483647                  # Max retriesTimeouts
request.timeout.ms = 30000
delivery.timeout.ms = 120000Key Takeaways
- Batching is key to performance - tune batch.size and linger.ms
- Choose your partition key wisely - avoid hot partitions
- Use acks=all + min.insync.replicas=2 for durability
- Enable idempotence to prevent duplicates
- Use transactions sparingly - they add overhead
Next Steps
Ready to learn about consumers? Check out our next lesson on Consumer Groups and Concurrency Models where we'll learn how to read data from Kafka efficiently.