Course Navigation
← Back to Course OverviewAll Lessons
 ✓ 
 Introduction and Database Fundamentals  ✓ 
 Building the Core Data Structure  ✓ 
 Concurrency and Thread Safety  4 
 Append-Only Log (Write-Ahead Log)  5 
 SSTables and LSM Trees  6 
 Compaction and Optimization  7 
 TCP Server and Protocol Design  8 
 Client Library and Advanced Networking  9 
 Transactions and ACID Properties  10 
 Replication and High Availability  11 
 Monitoring, Metrics, and Observability  12 
 Performance Optimization and Tuning  13 
 Configuration and Deployment  14 
 Security and Production Hardening  15 
 Final Project and Beyond Current Lesson
  4 of 15 
  Progress 27% 
 Append-Only Log (Write-Ahead Log)
Learning Objectives
- • Understand why Write-Ahead Logging is essential for durability
- • Master file I/O patterns and binary encoding in Go
- • Implement crash recovery mechanisms
- • Learn different durability levels and their trade-offs
- • Build a complete WAL with checksums and corruption detection
Lesson 4.1: WAL Fundamentals
Why Append-Only Logs Matter
The Write-Ahead Log (WAL) is the foundation of database durability. It ensures that no committed data is lost, even if the system crashes.
The Durability Problem
Without persistence, all data lives in memory:
┌─────────────────────────────────────┐
│     In-Memory Store (RAM)           │
│  ┌──────────────────────────────┐   │
│  │ "user:1" → "Alice"           │   │
│  │ "user:2" → "Bob"             │   │
│  │ "counter" → 42                │   │
│  └──────────────────────────────┘   │
│                                     │
│         PROBLEM:                    │
│  Power failure → All data lost!     │
│  System crash → All data lost!      │
│  OOM killer → All data lost!        │
└─────────────────────────────────────┘Real-world scenario:
Application running:
1. Receive Put("user:100", "Alice")
2. Store in memory
3. Return success to client
Client: "Data saved!"
Immediately after:
- Power failure
- Server reboots
- Data is gone!
Client: "Where's my data?!"The Write-Ahead Log Solution
┌──────────────────────────────────────────┐
│      Application                         │
│  Put("user:1", "Alice")                  │
└─────────────────┬──────────────────────┘
                  │
                  ▼
         ┌────────────────┐
         │ Write to WAL   │
         │ (Disk)         │
         └────────────────┘
              ▲    │
              │    │ fsync()
              │    ▼
         ┌────────────────┐
         │  OS Buffer     │
         │  (Page Cache)  │
         └────────────────┘
              │
              │ fsync()
              ▼
         ┌────────────────┐
         │  Disk Storage  │
         │  (Persistent)  │
         └────────────────┘
                  │
                  ▼
         Return to client
         (Data is safe!)
                  │
                  ▼
         Update in-memory storeKey principle: Write to disk BEFORE acknowledging to client.
Timeline:
t0: Client sends Put("key", "value")
t1: Server writes entry to WAL
t2: Server calls fsync() - blocks until on disk
t3: Server updates in-memory store
t4: Server returns success to client
    (If crash between t0-t4, can recover from WAL)
If crash at t3:
  └─ On restart: Replay WAL, recover Put
  └─ Data is safe
If crash after t4:
  └─ On restart: Replay WAL
  └─ Data was already in memory
  └─ Data is safeLog Structure and Format Design
Basic Log Entry Format
Entry:
┌────────────┬─────────┬────────┬─────────┬──────────┐
│ Operation  │ KeyLen  │ Key    │ValueLen │ Value    │
│ 1 byte     │ 4 bytes │ VarLen │ 4 bytes │ VarLen   │
└────────────┴─────────┴────────┴─────────┴──────────┘
Example:
Operation: 0x01 (Put)
KeyLen: 0x00000004 (4 bytes)
Key: "user"
ValueLen: 0x00000005 (5 bytes)
Value: "alice"Log File Format (Multiple Entries)
WAL File (append-only):
┌─────────────────────────────────────────┐
│ Header                                  │
│ - Version: 1                            │
│ - CreatedAt: timestamp                  │
│ - Checksum: CRC32                       │
├─────────────────────────────────────────┤
│ Entry 1: Put("user:1", "alice")        │
├─────────────────────────────────────────┤
│ Entry 2: Put("user:2", "bob")          │
├─────────────────────────────────────────┤
│ Entry 3: Delete("user:3")              │
├─────────────────────────────────────────┤
│ Entry 4: Put("counter", "42")          │
├─────────────────────────────────────────┤
│ ... (more entries)                      │
│                                         │
│ File only grows (append-only)           │
│ Never modifies existing entries         │
└─────────────────────────────────────────┘Checksums for Corruption Detection
// Entry with checksum
type LogEntry struct {
    Op        byte        // 0: Put, 1: Delete
    KeyLen    uint32
    Key       []byte
    ValueLen  uint32
    Value     []byte
    Checksum  uint32      // CRC32 of (Op + KeyLen + Key + ValueLen + Value)
}
// Verify on read
func (e *LogEntry) Verify() error {
    computed := crc32.ChecksumIEEE(e.Serialize())
    if computed != e.Checksum {
        return ErrChecksumMismatch
    }
    return nil
}Lesson 4.2: WAL Implementation
File I/O in Go
Basic File Operations
import "os"
// Create/open file
f, err := os.OpenFile(
    "wal.log",
    os.O_CREATE|os.O_WRONLY|os.O_APPEND,  // Create, write-only, append
    0644,  // Permissions: rw-r--r--
)
if err != nil {
    return err
}
defer f.Close()
// Write data
data := []byte("hello")
n, err := f.Write(data)
if err != nil {
    return err
}
fmt.Printf("Wrote %d bytes\n", n)
// Force to disk
if err := f.Sync(); err != nil {
    return err
}
// Get file size
info, err := f.Stat()
if err != nil {
    return err
}
fmt.Printf("File size: %d\n", info.Size())
// Read from file
f2, _ := os.Open("wal.log")
buf := make([]byte, 1024)
n, err := f2.Read(buf)Binary Encoding
Go provides binary encoding in the `encoding/binary` package:
import "encoding/binary"
// Write fixed-size integers
var buf bytes.Buffer
binary.Write(&buf, binary.BigEndian, uint32(42))  // Big-endian encoding
binary.Write(&buf, binary.LittleEndian, uint32(42))  // Little-endian
// Read fixed-size integers
var val uint32
binary.Read(bytes.NewReader(buf.Bytes()), binary.BigEndian, &val)
// Variable-length encoding (varint)
buf2 := bytes.NewBuffer(nil)
binary.PutUvarint([]byte{}, 300)  // Encodes efficientlyVarint efficiency:
- • Small values (0-127): 1 byte
- • Medium values (128-16383): 2 bytes
- • Large values: 3-10 bytes
Example:
- • Encode 42: 0x2A (1 byte)
- • Encode 300: 0xAC 0x02 (2 bytes)
- • Encode 1000: 0xE8 0x07 (2 bytes)
Writing Log Entries
package persistence
import (
    "bytes"
    "encoding/binary"
    "hash/crc32"
    "io"
    "os"
    "sync"
)
// LogEntry represents a single write operation
type LogEntry struct {
    Op       uint8   // 0: Put, 1: Delete
    Key      []byte
    Value    []byte  // Empty for Delete
    Checksum uint32
}
// WriteAheadLog manages persistent writes
type WriteAheadLog struct {
    file       *os.File
    mu         sync.Mutex
    offset     int64       // Current file offset
    unfsynced  int         // Entries since last fsync
    syncEvery  int         // Sync every N writes
    buf        *bytes.Buffer
}
// NewWriteAheadLog creates a new WAL
func NewWriteAheadLog(filepath string, syncEvery int) (*WriteAheadLog, error) {
    f, err := os.OpenFile(
        filepath,
        os.O_CREATE|os.O_WRONLY|os.O_APPEND,
        0644,
    )
    if err != nil {
        return nil, err
    }
    
    // Get current file size
    info, err := f.Stat()
    if err != nil {
        f.Close()
        return nil, err
    }
    
    return &WriteAheadLog{
        file:      f,
        offset:    info.Size(),
        syncEvery: syncEvery,
        buf:       bytes.NewBuffer(make([]byte, 0, 4096)),
    }, nil
}
// Write appends an entry to the WAL
func (w *WriteAheadLog) Write(entry *LogEntry) error {
    w.mu.Lock()
    defer w.mu.Unlock()
    
    // Serialize entry
    data, err := w.serializeEntry(entry)
    if err != nil {
        return err
    }
    
    // Write to file
    if _, err := w.file.Write(data); err != nil {
        return err
    }
    
    w.offset += int64(len(data))
    w.unfsynced++
    
    // Check if should fsync
    if w.unfsynced >= w.syncEvery {
        if err := w.file.Sync(); err != nil {
            return err
        }
        w.unfsynced = 0
    }
    
    return nil
}Durability Guarantees
fsync() and OS Buffers
The operating system buffers writes in memory (page cache):
Application writes data:
  write() → OS Page Cache → Returns immediately
Data in page cache is NOT durable yet!
  └─ Another power failure → Data lost
fsync() forces to disk:
  fsync() → Blocks until on persistent storage → Returns
Now data is durable!Performance trade-off:
Without fsync():
- • Write latency: 1-10 µs (memory operation)
- • Throughput: 100M+ ops/sec
- • Durability: NONE (data lost on crash)
With fsync():
- • Write latency: 1-10 ms (disk I/O)
- • Throughput: 100-1000 ops/sec
- • Durability: FULL (data survives crashes)
Durability Levels
Level 0: No Durability (Caching)
func (wal *WAL) Write(entry *LogEntry) error {
    buf := entry.Serialize()
    _, err := wal.file.Write(buf)
    return err
    // No fsync() - data in OS cache only
}Level 1: Write-Ahead Log with Periodic fsync
func (wal *WAL) Write(entry *LogEntry) error {
    buf := entry.Serialize()
    _, err := wal.file.Write(buf)
    if err != nil {
        return err
    }
    
    wal.unfsynced++
    
    // fsync every N writes
    if wal.unfsynced >= 1000 {
        if err := wal.file.Sync(); err != nil {
            return err
        }
        wal.unfsynced = 0
    }
    
    return nil
}Level 2: Strict Durability (fsync every write)
func (wal *WAL) Write(entry *LogEntry) error {
    buf := entry.Serialize()
    _, err := wal.file.Write(buf)
    if err != nil {
        return err
    }
    
    // Immediate fsync - guarantees durability
    return wal.file.Sync()
}Choosing Durability Level
| Level | Latency | Throughput | Durability | Use Case | 
|---|---|---|---|---|
| 0 | <1µs | 100M ops/sec | None | Caching only | 
| 1 | 100µs | 10K ops/sec | Likely | Web sessions | 
| 2 | 10ms | 100 ops/sec | Strict | Financial | 
Lesson 4.3: Crash Recovery
Reading and Replay
// LogReader reads entries from a WAL file
type LogReader struct {
    file   *os.File
    reader io.Reader
}
// NewLogReader creates a reader for the WAL
func NewLogReader(filepath string) (*LogReader, error) {
    f, err := os.Open(filepath)
    if err != nil {
        return nil, err
    }
    
    return &LogReader{
        file:   f,
        reader: f,
    }, nil
}
// Read reads next entry
func (lr *LogReader) Read() (*LogEntry, error) {
    entry := &LogEntry{}
    
    // Read operation type
    opBuf := make([]byte, 1)
    if _, err := io.ReadFull(lr.reader, opBuf); err != nil {
        if err == io.EOF {
            return nil, io.EOF
        }
        return nil, err
    }
    entry.Op = opBuf[0]
    
    // Read key
    var keyLen uint32
    if err := binary.Read(lr.reader, binary.BigEndian, &keyLen); err != nil {
        return nil, err
    }
    entry.Key = make([]byte, keyLen)
    if _, err := io.ReadFull(lr.reader, entry.Key); err != nil {
        return nil, err
    }
    
    // Read value
    var valLen uint32
    if err := binary.Read(lr.reader, binary.BigEndian, &valLen); err != nil {
        return nil, err
    }
    entry.Value = make([]byte, valLen)
    if _, err := io.ReadFull(lr.reader, entry.Value); err != nil {
        return nil, err
    }
    
    // Read and verify checksum
    var checksum uint32
    if err := binary.Read(lr.reader, binary.BigEndian, &checksum); err != nil {
        return nil, err
    }
    entry.Checksum = checksum
    
    // Verify data integrity
    if err := entry.VerifyChecksum(); err != nil {
        return nil, err
    }
    
    return entry, nil
}Crash Recovery Implementation
// RecoveryManager replays WAL on startup
type RecoveryManager struct {
    walPath string
    store   Store
}
// NewRecoveryManager creates a recovery manager
func NewRecoveryManager(walPath string, store Store) *RecoveryManager {
    return &RecoveryManager{
        walPath: walPath,
        store:   store,
    }
}
// Recover replays the WAL and recovers to last consistent state
func (rm *RecoveryManager) Recover(ctx context.Context) error {
    // Check if WAL exists
    if _, err := os.Stat(rm.walPath); err != nil {
        if os.IsNotExist(err) {
            return nil  // No WAL, clean start
        }
        return err
    }
    
    reader, err := NewLogReader(rm.walPath)
    if err != nil {
        return err
    }
    defer reader.Close()
    
    var recovered, skipped int
    
    for {
        entry, err := reader.Read()
        if err == io.EOF {
            break
        }
        if err != nil {
            // Corrupted entry, skip
            skipped++
            continue
        }
        
        // Replay operation
        switch entry.Op {
        case 0:  // Put
            if err := rm.store.Put(ctx, entry.Key, entry.Value); err != nil {
                return err
            }
            recovered++
        case 1:  // Delete
            if err := rm.store.Delete(ctx, entry.Key); err != nil {
                return err
            }
            recovered++
        }
    }
    
    log.Printf("Recovered %d entries, skipped %d corrupted entries", recovered, skipped)
    return nil
}Log Rotation
// LogRotator manages WAL file rotation
type LogRotator struct {
    baseDir        string
    maxFileSize    int64
    currentWAL     *WriteAheadLog
    rotationMu     sync.Mutex
}
// Rotate switches to a new WAL file
func (lr *LogRotator) Rotate() error {
    lr.rotationMu.Lock()
    defer lr.rotationMu.Unlock()
    
    // Close current file
    if lr.currentWAL != nil {
        if err := lr.currentWAL.Sync(); err != nil {
            return err
        }
        if err := lr.currentWAL.Close(); err != nil {
            return err
        }
    }
    
    // Create new WAL with timestamp
    filename := fmt.Sprintf("wal-%d.log", time.Now().UnixNano())
    filepath := filepath.Join(lr.baseDir, filename)
    
    wal, err := NewWriteAheadLog(filepath, 100)
    if err != nil {
        return err
    }
    
    lr.currentWAL = wal
    return nil
}Hands-on Lab: Implement Write-Ahead Log with crash recovery
Build a persistent WAL with crash recovery.
Lab Tasks:
- 1. Implement WriteAheadLog with checksums
- 2. Support configurable fsync frequency
- 3. Implement LogReader with checksum verification
- 4. Build RecoveryManager for crash recovery
- 5. Write comprehensive tests
- 6. Benchmark different fsync rates
- 7. Test with corrupted entries
Summary & Learning Outcomes
What You've Learned:
- • Why Write-Ahead Logging is essential for durability
- • File I/O patterns and binary encoding in Go
- • Different durability levels and their trade-offs
- • Checksums and corruption detection
- • Crash recovery mechanisms
- • Log rotation and management
Key Takeaways
- • Always write to disk before acknowledging to client
- • Choose durability level based on your use case
- • Use checksums to detect corruption
- • Design for crash recovery from day one
- • fsync() is expensive but necessary for durability