How Kafka Stores Data: The File System

Kafka doesn't use a database. It uses the file system. This is both its strength and its complexity.

Topic: user-events
├── Partition 0/
│   ├── 00000000000000000000.log    # Segment file
│   ├── 00000000000000000000.index  # Offset index
│   ├── 00000000000000000000.timeindex  # Time index
│   ├── 00000000000000000001.log
│   └── 00000000000000000001.index
├── Partition 1/
│   └── ...
└── Partition 2/
    └── ...

Each partition is a directory. Each segment is a file. Simple, but powerful.

Segment Files: The Building Blocks

Kafka doesn't write to one giant file. It writes to segments:

Segment 1: [msg-0] [msg-1] [msg-2] [msg-3] [msg-4] [msg-5]
Segment 2: [msg-6] [msg-7] [msg-8] [msg-9] [msg-10] [msg-11]
Segment 3: [msg-12] [msg-13] [msg-14] [msg-15] [msg-16] [msg-17]

Why segments?

Parallelism: Multiple segments can be read/written simultaneously
Retention: Delete old segments without affecting new ones
Compaction: Compact segments independently
Recovery: Faster recovery from crashes

Index Files: The Speed Boosters

Kafka maintains two types of indexes for fast lookups:

1. Offset Index (.index)

# Maps offset → file position
Offset 0  → Position 0
Offset 100 → Position 1024
Offset 200 → Position 2048
Offset 300 → Position 3072

Use case: "Give me message at offset 150" → Jump to position 1024, scan forward

2. Time Index (.timeindex)

# Maps timestamp → offset
Timestamp 1609459200000 → Offset 0
Timestamp 1609459260000 → Offset 100
Timestamp 1609459320000 → Offset 200

Use case: "Give me messages from 2 hours ago" → Find offset, then use offset index

Page Cache: The Secret Sauce

This is why Kafka is so fast. It doesn't read from disk - it reads from memory:

How Page Cache Works

Producer writes message to segment file
OS puts file in page cache (RAM)
Consumer reads from page cache (not disk!)
OS flushes to disk in background

Key insight: Kafka is essentially a RAM-based system with disk persistence.

⚠️ Page Cache Gotchas

Other processes can evict your cache - don't run other I/O heavy apps
OS decides what to cache - you can't control it directly
Cold starts are slow - first read hits disk

Log Retention: Time vs Size

You can keep logs based on time, size, or both:

Time-based Retention

log.retention.hours = 168  # 7 days
log.retention.minutes = 60  # 1 hour
log.retention.ms = 3600000  # 1 hour (most precise)

Size-based Retention

log.retention.bytes = 1073741824  # 1 GB per partition
log.segment.bytes = 1073741824     # 1 GB per segment

Combined (Recommended)

log.retention.hours = 168        # 7 days
log.retention.bytes = 1073741824  # 1 GB
# Delete when EITHER condition is met

Log Compaction: The State Machine

Sometimes you don't want to keep all messages - just the latest state:

# Before compaction
[user-123: {name: "John", age: 25}]
[user-123: {name: "John", age: 26}]  # Update
[user-123: {name: "John", age: 27}]  # Update
[user-456: {name: "Jane", age: 30}]

# After compaction
[user-123: {name: "John", age: 27}]  # Only latest
[user-456: {name: "Jane", age: 30}]

Use cases:

User profiles
Configuration settings
Account balances
Any key-value state

Enable Log Compaction

# Per topic
kafka-topics.sh --create --topic user-profiles \
  --config cleanup.policy=compact

# Global default
log.cleanup.policy = compact

Disk I/O Optimization

Kafka is I/O bound. Optimize your disks:

1. Use SSDs

Why: 10-100x faster than HDDs for random I/O

Cost: More expensive, but worth it for production

2. Separate Log and OS Disks

# Mount points
/var/log/kafka    # OS logs
/data/kafka-logs  # Kafka data
/tmp/kafka        # Temporary files

3. RAID Configuration

RAID 0 (Striping)

Pros: Fastest write performance
Cons: No fault tolerance
Use: When you have replication

RAID 1 (Mirroring)

Pros: Fault tolerant
Cons: 50% storage efficiency
Use: When you need local redundancy

4. OS-level Tuning

# Increase file descriptor limits
ulimit -n 65536

# Optimize for sequential I/O
echo 'vm.swappiness = 1' >> /etc/sysctl.conf
echo 'vm.dirty_ratio = 15' >> /etc/sysctl.conf
echo 'vm.dirty_background_ratio = 5' >> /etc/sysctl.conf

# Disable swap for Kafka
swapoff -a

Partition Count Planning

More partitions = more parallelism, but also more overhead:

Partition Limits

Per broker: ~4000 partitions max
Per cluster: ~200,000 partitions max
Per topic: No hard limit, but performance degrades

Partition Planning Formula

target_throughput = 100,000 msg/sec
consumer_throughput = 1,000 msg/sec per consumer
num_partitions = target_throughput / consumer_throughput = 100

# Add 20% buffer for growth
final_partitions = 100 * 1.2 = 120

Storage Monitoring

Watch these metrics to avoid storage issues:

Disk Metrics

Disk usage % - Alert at 80%
Disk I/O wait - Alert at 20%
Disk queue length - Alert at 10

Kafka Metrics

Log size per partition - Track growth
Segment count - Too many = performance hit
Compaction lag - How behind compaction is

Production Configuration

Recommended Settings

# Retention
log.retention.hours = 168
log.retention.bytes = 1073741824
log.segment.bytes = 1073741824
log.segment.ms = 604800000

# Compaction
log.cleanup.policy = delete
log.cleanup.interval.ms = 300000

# Performance
num.io.threads = 8
num.network.threads = 3
socket.send.buffer.bytes = 102400
socket.receive.buffer.bytes = 102400

Key Takeaways

Kafka uses the file system, not a database - understand segments and indexes
Page cache is your friend - keep data in RAM for speed
Plan partition count carefully - more isn't always better
Use SSDs for production - the performance gain is massive
Monitor disk usage and I/O - storage issues kill performance

Next Steps

Ready to process data in real-time? Check out our next lesson on Kafka Streams and Real-Time Processing where we'll learn how to build streaming applications.

Storage, Retention, and Log Management

Course Navigation