Compaction and Optimization

Learning Objectives

• Master different compaction strategies (size-tiered, leveled, universal)
• Implement Bloom filters for 10-100x speedup on negative lookups
• Build block caching for hot data optimization
• Learn compression techniques and trade-offs
• Implement batched writes and resource throttling

Lesson 6.1: Compaction Strategies

Why Compaction Matters

As the database operates, SSTables accumulate at different levels, causing performance degradation.

Without compaction:

Level 0: sstable-0, sstable-1, sstable-2, ... sstable-100 (100 files!)
Level 1: sstable-l1-0, sstable-l1-1, ...
Level 2: ...

Problem:
- Read "user:500": Must check 100+ files in Level 0
- Each file access = disk seek (expensive)
- Read latency degrades over time
- Disk space grows unbounded

With compaction:

Before:
  Level 0: 100 small files (1-10MB each)
  Level 1: 10 medium files (50MB each)

After compaction:
  Level 0: 4 small files (5MB each)
  Level 1: 5 medium files (100MB each)

Benefits:
- Read latency: O(100 seeks) → O(5 seeks)
- Disk space: Unused space reclaimed
- Tombstone cleanup: Delete entries removed

Compaction Strategies: Size-Tiered

Size-Tiered Compaction (SimpleDB, Cassandra):

Trigger: When level has too many files (e.g., > 10 files)

Action: 
  1. Select all files at level N that fit in memory
  2. Merge into single file
  3. Write to level N+1
  4. Delete source files

Timeline:
t0: Level 0: [f0(5MB), f1(5MB), f2(5MB)] → Trigger (3 files)
t1: Merge into temp file (15MB)
t2: Move to Level 1: [f1_0(15MB)] + existing files
t3: Level 0: [] (files deleted)
t4: New writes go to Level 0 again

Advantages:

• Simple to implement
• Good for write-heavy workloads
• Low CPU overhead during compaction

Disadvantages:

• Write amplification: ~5-10x
• Uneven file sizes at same level
• Space overhead during merge

Leveled Compaction

Leveled Compaction (LevelDB, RocksDB):

Design:
- Level 0: Multiple files (overlap allowed)
- Level 1+: Single file per key range (NO overlap)

Trigger: When level size exceeds limit (e.g., 10MB)

Action:
  1. Pick file from level N
  2. Find overlapping files in level N+1
  3. Merge all into level N+1
  4. Delete source files

Timeline:
Level 0: [file0(4MB), file1(3MB), file2(2MB)] → Size: 9MB (below 10MB)
         [file0(4MB), file1(3MB), file2(2MB), file3(5MB)] → Size: 14MB (TRIGGER)

Select file0 from L0:
  - Key range: ["a", "g"]
  - Find overlapping in L1: [file_l1_0 ("a"-"m")]
  - Merge: file0 + file_l1_0 → file_l1_0'
  - Delete: file0, file_l1_0

Result:
Level 0: [file1(3MB), file2(2MB), file3(5MB)]
Level 1: [file_l1_0'(overlapping merged)]

Advantages:

• Write amplification: ~10x (predictable)
• Bounded file count per level
• Space efficient
• Consistent performance

Disadvantages:

• Complex implementation
• Higher CPU during compaction
• Must maintain key range invariants

Compaction Trade-offs Comparison

Aspect	Size-Tiered	Leveled	Universal
Write Amplification	5-10x	~10x	~5x
Read Amplification	Medium	Low	Low-Medium
Space Overhead	High	Low	Medium
Implementation	Simple	Complex	Medium
Best For	Write-heavy	Read-heavy	Balanced

Lesson 6.2: Optimization Techniques

Bloom Filters for Faster Lookups

Problem: To check if key exists, must read SSTable (expensive disk I/O).

Without Bloom Filter:

Read "user:500":
  1. Check memtable: Not found
  2. Check SSTable 1: Read file (disk I/O) - Not found
  3. Check SSTable 2: Read file (disk I/O) - Not found
  4. Check SSTable 3: Read file (disk I/O) - Not found
  
  Result: 3 disk I/Os for missing key!

Solution: Bloom Filter

A probabilistic data structure that answers "is key in this file?" quickly:

• No false negatives (if says no, definitely not there)
• Few false positives (might say yes when actually no)
• Fast negative lookups
• Small space (1-2 bits per entry)

Bloom Filter:
- Bit array: [0, 0, 1, 0, 1, 1, 0, 1, ...]
- Hash functions: h1, h2, h3

Put "user:1":
  Set bits at positions: h1("user:1"), h2("user:1"), h3("user:1")

Check "user:1":
  All three positions set? YES, probably in set
  
Check "user:500":
  All three positions set? NO, definitely not in set!

Block Cache for Hot Data

Problem: Frequently accessed data requires disk reads.

// BlockCache caches SSTable blocks
type BlockCache struct {
    mu      sync.RWMutex
    cache   map[string]*CacheEntry
    maxSize int64
    size    int64
}

type CacheEntry struct {
    block   *DataBlock
    lastUsed time.Time
}

// Get retrieves from cache
func (bc *BlockCache) Get(key string) (*DataBlock, bool) {
    bc.mu.RLock()
    defer bc.mu.RUnlock()
    
    entry, ok := bc.cache[key]
    if ok {
        entry.lastUsed = time.Now()
        return entry.block, true
    }
    return nil, false
}

// Put stores in cache
func (bc *BlockCache) Put(key string, block *DataBlock) {
    bc.mu.Lock()
    defer bc.mu.Unlock()
    
    // Check if should evict
    if bc.size+block.Size() > bc.maxSize {
        bc.evictLRU()
    }
    
    bc.cache[key] = &CacheEntry{
        block:    block,
        lastUsed: time.Now(),
    }
    bc.size += block.Size()
}

Compression Strategies

Problem: SSTables consume disk space.

import (
    "github.com/golang/snappy"
    "compress/gzip"
)

// CompressionCodec compresses/decompresses blocks
type CompressionCodec interface {
    Compress(data []byte) ([]byte, error)
    Decompress(data []byte) ([]byte, error)
}

// SnappyCodec uses Snappy (fast)
type SnappyCodec struct{}

func (sc SnappyCodec) Compress(data []byte) ([]byte, error) {
    return snappy.Encode(nil, data), nil
}

func (sc SnappyCodec) Decompress(data []byte) ([]byte, error) {
    return snappy.Decode(nil, data)
}

// GzipCodec uses Gzip (better ratio)
type GzipCodec struct {
    level int
}

func (gc GzipCodec) Compress(data []byte) ([]byte, error) {
    var buf bytes.Buffer
    w, _ := gzip.NewWriterLevel(&buf, gc.level)
    w.Write(data)
    w.Close()
    return buf.Bytes(), nil
}

Compression Trade-offs:

Snappy (Fast):

• Compression ratio: 0.5-0.7
• CPU overhead: Low
• Best for: Recent data, frequent reads

Gzip (Better ratio):

• Compression ratio: 0.3-0.5
• CPU overhead: High
• Best for: Older data, infrequent reads

Batched Writes

Problem: Writing entries one-by-one is slow due to lock contention.

// Batch groups multiple operations
type Batch struct {
    entries []*LogEntry
}

// Add adds operation to batch
func (b *Batch) Put(key, value []byte) {
    b.entries = append(b.entries, &LogEntry{
        Op:    OpPut,
        Key:   key,
        Value: value,
    })
}

// Commit applies all operations atomically
func (b *Batch) Commit(db *DB) error {
    db.mu.Lock()
    defer db.mu.Unlock()
    
    // All writes in one lock
    for _, entry := range b.entries {
        switch entry.Op {
        case OpPut:
            db.memtable.Put(entry.Key, entry.Value)
        case OpDelete:
            db.memtable.Delete(entry.Key)
        }
    }
    
    // Single WAL write
    return db.wal.WriteBatch(b.entries)
}

// Usage
batch := &Batch{}
batch.Put([]byte("k1"), []byte("v1"))
batch.Put([]byte("k2"), []byte("v2"))
batch.Put([]byte("k3"), []byte("v3"))
batch.Commit(db)  // All or nothing

Hands-on Lab: Implement Leveled Compaction with Bloom filters

Build a complete compaction engine with optimization techniques.

Lab Tasks:

1. Implement LeveledCompactor with size-based triggers
2. Build Bloom filters with configurable false positive rate
3. Implement block cache with LRU eviction
4. Add compression support (Snappy + Gzip)
5. Implement batched writes
6. Write comprehensive tests and benchmarks
7. Measure performance improvements

Summary & Learning Outcomes

What You've Learned:

• Three major compaction strategies and their trade-offs
• Bloom filters for 10-100x speedup on negative lookups
• Block caching for hot data optimization
• Compression techniques and space/CPU trade-offs
• Batched writes for reduced lock contention
• Resource throttling and background processing

Key Metrics Achieved:

• Compaction reduces file count by 90%
• Bloom filters skip ~99% of negative lookups
• Block cache achieves 10+ MB/s for hot data
• Compression ratio: 0.3-0.5 (3-5GB from 10GB)
• Batched writes reduce lock contention by 10x

← Previous: SSTables and LSM Trees Next: TCP Server and Protocol Design →

Course Navigation

All Lessons

Compaction and Optimization

Learning Objectives

Lesson 6.1: Compaction Strategies

Why Compaction Matters

Without compaction:

With compaction:

Compaction Strategies: Size-Tiered

Advantages:

Disadvantages:

Leveled Compaction

Advantages:

Disadvantages:

Compaction Trade-offs Comparison

Lesson 6.2: Optimization Techniques

Bloom Filters for Faster Lookups

Without Bloom Filter:

Solution: Bloom Filter

Block Cache for Hot Data

Compression Strategies

Compression Trade-offs:

Snappy (Fast):

Gzip (Better ratio):

Batched Writes

Hands-on Lab: Implement Leveled Compaction with Bloom filters

Summary & Learning Outcomes

What You've Learned:

Key Metrics Achieved: