Compaction and Optimization

Learning Objectives

  • • Master different compaction strategies (size-tiered, leveled, universal)
  • • Implement Bloom filters for 10-100x speedup on negative lookups
  • • Build block caching for hot data optimization
  • • Learn compression techniques and trade-offs
  • • Implement batched writes and resource throttling

Lesson 6.1: Compaction Strategies

Why Compaction Matters

As the database operates, SSTables accumulate at different levels, causing performance degradation.

Without compaction:

Level 0: sstable-0, sstable-1, sstable-2, ... sstable-100 (100 files!)
Level 1: sstable-l1-0, sstable-l1-1, ...
Level 2: ...

Problem:
- Read "user:500": Must check 100+ files in Level 0
- Each file access = disk seek (expensive)
- Read latency degrades over time
- Disk space grows unbounded

With compaction:

Before:
  Level 0: 100 small files (1-10MB each)
  Level 1: 10 medium files (50MB each)

After compaction:
  Level 0: 4 small files (5MB each)
  Level 1: 5 medium files (100MB each)

Benefits:
- Read latency: O(100 seeks) → O(5 seeks)
- Disk space: Unused space reclaimed
- Tombstone cleanup: Delete entries removed

Compaction Strategies: Size-Tiered

Size-Tiered Compaction (SimpleDB, Cassandra):

Trigger: When level has too many files (e.g., > 10 files)

Action: 
  1. Select all files at level N that fit in memory
  2. Merge into single file
  3. Write to level N+1
  4. Delete source files

Timeline:
t0: Level 0: [f0(5MB), f1(5MB), f2(5MB)] → Trigger (3 files)
t1: Merge into temp file (15MB)
t2: Move to Level 1: [f1_0(15MB)] + existing files
t3: Level 0: [] (files deleted)
t4: New writes go to Level 0 again

Advantages:

  • • Simple to implement
  • • Good for write-heavy workloads
  • • Low CPU overhead during compaction

Disadvantages:

  • • Write amplification: ~5-10x
  • • Uneven file sizes at same level
  • • Space overhead during merge

Leveled Compaction

Leveled Compaction (LevelDB, RocksDB):

Design:
- Level 0: Multiple files (overlap allowed)
- Level 1+: Single file per key range (NO overlap)

Trigger: When level size exceeds limit (e.g., 10MB)

Action:
  1. Pick file from level N
  2. Find overlapping files in level N+1
  3. Merge all into level N+1
  4. Delete source files

Timeline:
Level 0: [file0(4MB), file1(3MB), file2(2MB)] → Size: 9MB (below 10MB)
         [file0(4MB), file1(3MB), file2(2MB), file3(5MB)] → Size: 14MB (TRIGGER)

Select file0 from L0:
  - Key range: ["a", "g"]
  - Find overlapping in L1: [file_l1_0 ("a"-"m")]
  - Merge: file0 + file_l1_0 → file_l1_0'
  - Delete: file0, file_l1_0

Result:
Level 0: [file1(3MB), file2(2MB), file3(5MB)]
Level 1: [file_l1_0'(overlapping merged)]

Advantages:

  • • Write amplification: ~10x (predictable)
  • • Bounded file count per level
  • • Space efficient
  • • Consistent performance

Disadvantages:

  • • Complex implementation
  • • Higher CPU during compaction
  • • Must maintain key range invariants

Compaction Trade-offs Comparison

Aspect Size-Tiered Leveled Universal
Write Amplification 5-10x ~10x ~5x
Read Amplification Medium Low Low-Medium
Space Overhead High Low Medium
Implementation Simple Complex Medium
Best For Write-heavy Read-heavy Balanced

Lesson 6.2: Optimization Techniques

Bloom Filters for Faster Lookups

Problem: To check if key exists, must read SSTable (expensive disk I/O).

Without Bloom Filter:

Read "user:500":
  1. Check memtable: Not found
  2. Check SSTable 1: Read file (disk I/O) - Not found
  3. Check SSTable 2: Read file (disk I/O) - Not found
  4. Check SSTable 3: Read file (disk I/O) - Not found
  
  Result: 3 disk I/Os for missing key!

Solution: Bloom Filter

A probabilistic data structure that answers "is key in this file?" quickly:

  • • No false negatives (if says no, definitely not there)
  • • Few false positives (might say yes when actually no)
  • • Fast negative lookups
  • • Small space (1-2 bits per entry)
Bloom Filter:
- Bit array: [0, 0, 1, 0, 1, 1, 0, 1, ...]
- Hash functions: h1, h2, h3

Put "user:1":
  Set bits at positions: h1("user:1"), h2("user:1"), h3("user:1")

Check "user:1":
  All three positions set? YES, probably in set
  
Check "user:500":
  All three positions set? NO, definitely not in set!

Block Cache for Hot Data

Problem: Frequently accessed data requires disk reads.

// BlockCache caches SSTable blocks
type BlockCache struct {
    mu      sync.RWMutex
    cache   map[string]*CacheEntry
    maxSize int64
    size    int64
}

type CacheEntry struct {
    block   *DataBlock
    lastUsed time.Time
}

// Get retrieves from cache
func (bc *BlockCache) Get(key string) (*DataBlock, bool) {
    bc.mu.RLock()
    defer bc.mu.RUnlock()
    
    entry, ok := bc.cache[key]
    if ok {
        entry.lastUsed = time.Now()
        return entry.block, true
    }
    return nil, false
}

// Put stores in cache
func (bc *BlockCache) Put(key string, block *DataBlock) {
    bc.mu.Lock()
    defer bc.mu.Unlock()
    
    // Check if should evict
    if bc.size+block.Size() > bc.maxSize {
        bc.evictLRU()
    }
    
    bc.cache[key] = &CacheEntry{
        block:    block,
        lastUsed: time.Now(),
    }
    bc.size += block.Size()
}

Compression Strategies

Problem: SSTables consume disk space.

import (
    "github.com/golang/snappy"
    "compress/gzip"
)

// CompressionCodec compresses/decompresses blocks
type CompressionCodec interface {
    Compress(data []byte) ([]byte, error)
    Decompress(data []byte) ([]byte, error)
}

// SnappyCodec uses Snappy (fast)
type SnappyCodec struct{}

func (sc SnappyCodec) Compress(data []byte) ([]byte, error) {
    return snappy.Encode(nil, data), nil
}

func (sc SnappyCodec) Decompress(data []byte) ([]byte, error) {
    return snappy.Decode(nil, data)
}

// GzipCodec uses Gzip (better ratio)
type GzipCodec struct {
    level int
}

func (gc GzipCodec) Compress(data []byte) ([]byte, error) {
    var buf bytes.Buffer
    w, _ := gzip.NewWriterLevel(&buf, gc.level)
    w.Write(data)
    w.Close()
    return buf.Bytes(), nil
}

Compression Trade-offs:

Snappy (Fast):
  • • Compression ratio: 0.5-0.7
  • • CPU overhead: Low
  • • Best for: Recent data, frequent reads
Gzip (Better ratio):
  • • Compression ratio: 0.3-0.5
  • • CPU overhead: High
  • • Best for: Older data, infrequent reads

Batched Writes

Problem: Writing entries one-by-one is slow due to lock contention.

// Batch groups multiple operations
type Batch struct {
    entries []*LogEntry
}

// Add adds operation to batch
func (b *Batch) Put(key, value []byte) {
    b.entries = append(b.entries, &LogEntry{
        Op:    OpPut,
        Key:   key,
        Value: value,
    })
}

// Commit applies all operations atomically
func (b *Batch) Commit(db *DB) error {
    db.mu.Lock()
    defer db.mu.Unlock()
    
    // All writes in one lock
    for _, entry := range b.entries {
        switch entry.Op {
        case OpPut:
            db.memtable.Put(entry.Key, entry.Value)
        case OpDelete:
            db.memtable.Delete(entry.Key)
        }
    }
    
    // Single WAL write
    return db.wal.WriteBatch(b.entries)
}

// Usage
batch := &Batch{}
batch.Put([]byte("k1"), []byte("v1"))
batch.Put([]byte("k2"), []byte("v2"))
batch.Put([]byte("k3"), []byte("v3"))
batch.Commit(db)  // All or nothing

Hands-on Lab: Implement Leveled Compaction with Bloom filters

Build a complete compaction engine with optimization techniques.

Lab Tasks:

  • 1. Implement LeveledCompactor with size-based triggers
  • 2. Build Bloom filters with configurable false positive rate
  • 3. Implement block cache with LRU eviction
  • 4. Add compression support (Snappy + Gzip)
  • 5. Implement batched writes
  • 6. Write comprehensive tests and benchmarks
  • 7. Measure performance improvements

Summary & Learning Outcomes

What You've Learned:

  • • Three major compaction strategies and their trade-offs
  • • Bloom filters for 10-100x speedup on negative lookups
  • • Block caching for hot data optimization
  • • Compression techniques and space/CPU trade-offs
  • • Batched writes for reduced lock contention
  • • Resource throttling and background processing

Key Metrics Achieved:

  • • Compaction reduces file count by 90%
  • • Bloom filters skip ~99% of negative lookups
  • • Block cache achieves 10+ MB/s for hot data
  • • Compression ratio: 0.3-0.5 (3-5GB from 10GB)
  • • Batched writes reduce lock contention by 10x