Storage, Retention, and Log Management
Master Kafka storage: segments, indexes, page cache, log retention strategies, compaction, and disk I/O optimization for production systems.
How Kafka Stores Data: The File System
Kafka doesn't use a database. It uses the file system. This is both its strength and its complexity.
Topic: user-events
├── Partition 0/
│   ├── 00000000000000000000.log    # Segment file
│   ├── 00000000000000000000.index  # Offset index
│   ├── 00000000000000000000.timeindex  # Time index
│   ├── 00000000000000000001.log
│   └── 00000000000000000001.index
├── Partition 1/
│   └── ...
└── Partition 2/
    └── ...Each partition is a directory. Each segment is a file. Simple, but powerful.
Segment Files: The Building Blocks
Kafka doesn't write to one giant file. It writes to segments:
Segment 1: [msg-0] [msg-1] [msg-2] [msg-3] [msg-4] [msg-5]
Segment 2: [msg-6] [msg-7] [msg-8] [msg-9] [msg-10] [msg-11]
Segment 3: [msg-12] [msg-13] [msg-14] [msg-15] [msg-16] [msg-17]Why segments?
- Parallelism: Multiple segments can be read/written simultaneously
- Retention: Delete old segments without affecting new ones
- Compaction: Compact segments independently
- Recovery: Faster recovery from crashes
Index Files: The Speed Boosters
Kafka maintains two types of indexes for fast lookups:
1. Offset Index (.index)
# Maps offset → file position
Offset 0  → Position 0
Offset 100 → Position 1024
Offset 200 → Position 2048
Offset 300 → Position 3072Use case: "Give me message at offset 150" → Jump to position 1024, scan forward
2. Time Index (.timeindex)
# Maps timestamp → offset
Timestamp 1609459200000 → Offset 0
Timestamp 1609459260000 → Offset 100
Timestamp 1609459320000 → Offset 200Use case: "Give me messages from 2 hours ago" → Find offset, then use offset index
Page Cache: The Secret Sauce
This is why Kafka is so fast. It doesn't read from disk - it reads from memory:
How Page Cache Works
- Producer writes message to segment file
- OS puts file in page cache (RAM)
- Consumer reads from page cache (not disk!)
- OS flushes to disk in background
Key insight: Kafka is essentially a RAM-based system with disk persistence.
⚠️ Page Cache Gotchas
- Other processes can evict your cache - don't run other I/O heavy apps
- OS decides what to cache - you can't control it directly
- Cold starts are slow - first read hits disk
Log Retention: Time vs Size
You can keep logs based on time, size, or both:
Time-based Retention
log.retention.hours = 168  # 7 days
log.retention.minutes = 60  # 1 hour
log.retention.ms = 3600000  # 1 hour (most precise)Size-based Retention
log.retention.bytes = 1073741824  # 1 GB per partition
log.segment.bytes = 1073741824     # 1 GB per segmentCombined (Recommended)
log.retention.hours = 168        # 7 days
log.retention.bytes = 1073741824  # 1 GB
# Delete when EITHER condition is metLog Compaction: The State Machine
Sometimes you don't want to keep all messages - just the latest state:
# Before compaction
[user-123: {name: "John", age: 25}]
[user-123: {name: "John", age: 26}]  # Update
[user-123: {name: "John", age: 27}]  # Update
[user-456: {name: "Jane", age: 30}]
# After compaction
[user-123: {name: "John", age: 27}]  # Only latest
[user-456: {name: "Jane", age: 30}]Use cases:
- User profiles
- Configuration settings
- Account balances
- Any key-value state
Enable Log Compaction
# Per topic
kafka-topics.sh --create --topic user-profiles \
  --config cleanup.policy=compact
# Global default
log.cleanup.policy = compactDisk I/O Optimization
Kafka is I/O bound. Optimize your disks:
1. Use SSDs
Why: 10-100x faster than HDDs for random I/O
Cost: More expensive, but worth it for production
2. Separate Log and OS Disks
# Mount points
/var/log/kafka    # OS logs
/data/kafka-logs  # Kafka data
/tmp/kafka        # Temporary files3. RAID Configuration
RAID 0 (Striping)
- Pros: Fastest write performance
- Cons: No fault tolerance
- Use: When you have replication
RAID 1 (Mirroring)
- Pros: Fault tolerant
- Cons: 50% storage efficiency
- Use: When you need local redundancy
4. OS-level Tuning
# Increase file descriptor limits
ulimit -n 65536
# Optimize for sequential I/O
echo 'vm.swappiness = 1' >> /etc/sysctl.conf
echo 'vm.dirty_ratio = 15' >> /etc/sysctl.conf
echo 'vm.dirty_background_ratio = 5' >> /etc/sysctl.conf
# Disable swap for Kafka
swapoff -aPartition Count Planning
More partitions = more parallelism, but also more overhead:
Partition Limits
- Per broker: ~4000 partitions max
- Per cluster: ~200,000 partitions max
- Per topic: No hard limit, but performance degrades
Partition Planning Formula
target_throughput = 100,000 msg/sec
consumer_throughput = 1,000 msg/sec per consumer
num_partitions = target_throughput / consumer_throughput = 100
# Add 20% buffer for growth
final_partitions = 100 * 1.2 = 120Storage Monitoring
Watch these metrics to avoid storage issues:
Disk Metrics
- Disk usage % - Alert at 80%
- Disk I/O wait - Alert at 20%
- Disk queue length - Alert at 10
Kafka Metrics
- Log size per partition - Track growth
- Segment count - Too many = performance hit
- Compaction lag - How behind compaction is
Production Configuration
Recommended Settings
# Retention
log.retention.hours = 168
log.retention.bytes = 1073741824
log.segment.bytes = 1073741824
log.segment.ms = 604800000
# Compaction
log.cleanup.policy = delete
log.cleanup.interval.ms = 300000
# Performance
num.io.threads = 8
num.network.threads = 3
socket.send.buffer.bytes = 102400
socket.receive.buffer.bytes = 102400Key Takeaways
- Kafka uses the file system, not a database - understand segments and indexes
- Page cache is your friend - keep data in RAM for speed
- Plan partition count carefully - more isn't always better
- Use SSDs for production - the performance gain is massive
- Monitor disk usage and I/O - storage issues kill performance
Next Steps
Ready to process data in real-time? Check out our next lesson on Kafka Streams and Real-Time Processing where we'll learn how to build streaming applications.