Performance Optimization and Tuning

Learning Objectives

• Identify performance bottlenecks using profiling tools
• Optimize memory usage with object pooling and allocation reduction
• Design comprehensive load testing strategies
• Plan capacity and tune systems for different workloads
• Master Go performance optimization techniques
• Build production-ready performance monitoring

Lesson 12.1: Identifying Bottlenecks

Using CPU Profiler

$ go tool pprof cpu.prof

(pprof) top -cum
Showing top functions by cumulative time:
  40%  compaction
  25%  lock_acquire
  20%  serialization
  15%  memcpy

Interpretation

• compaction is using 40% of CPU time → Focus optimization there first
• lock_acquire is using 25% of CPU time → Check for lock contention
• serialization is using 20% of CPU time → Optimize data encoding
• memcpy is using 15% of CPU time → Ignore (not worth optimizing)

Lock Contention

❌ BAD: Lock held too long

func (s *Store) Get(key string) interface {
    s.mu.Lock()
    defer s.mu.Unlock()
    
    val := s.data[key]          // Fast: 1µs
    processed := expensiveWork(val)  // Slow: 100µs
    return processed
}

✅ GOOD: Minimize critical section

func (s *Store) Get(key string) interface {
    s.mu.Lock()
    val := s.data[key]
    s.mu.Unlock()
    
    // Do expensive work OUTSIDE lock
    return expensiveWork(val)
}

Lesson 12.2: Memory Optimization

Object Pooling

import "sync"

var bufferPool = sync.Pool{
    New: func() interface {
        return make([]byte, 4096)
    },
}

// Get from pool
buf := bufferPool.Get().([]byte)
defer bufferPool.Put(buf)

// Use buffer
// ... work with buf ...

// Result: Reuse instead of allocate

Reduce Allocations

❌ BAD

func formatKey(key []byte) string {
    return "prefix:" + string(key)  // Allocation every time
}

✅ GOOD

func formatKey(key []byte) []byte {
    result := make([]byte, 7+len(key))
    copy(result[0:], []byte("prefix:"))
    copy(result[7:], key)
    return result  // Single allocation
}

Lesson 12.3: Load Testing

Benchmark in Go

func BenchmarkGet(b *testing.B) {
    store := NewStore()
    store.Put("key", []byte("value"))
    
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        store.Get("key")
    }
}

// Run: go test -bench=. -benchmem
// Output: 50000000    20 ns/op    0 B/alloc

Load Test

type LoadTest struct {
    concurrency  int
    duration     time.Duration
    client       *Client
}

func (lt *LoadTest) Run() {
    done := time.Now().Add(lt.duration)
    count := 0
    errors := 0
    
    for i := 0; i < lt.concurrency; i++ {
        go func() {
            for time.Now().Before(done) {
                _, err := lt.client.Get("key")
                count++
                if err != nil {
                    errors++
                }
            }
        }()
    }
    
    fmt.Printf("Requests: %d, Errors: %d\n", count, errors)
}

Lesson 12.4: Capacity Planning

Headroom

Desired throughput:     10,000 ops/sec
Target utilization:     70%
Required capacity:      14,286 ops/sec

Why headroom?
- Traffic spikes
- Maintenance
- Graceful degradation
- SLA buffer

Tuning by Workload

type Config struct {
    // Read-heavy workload
    CacheSize     int64   // Large: 512MB
    ShardCount    int     // High: cores × 4
    WalSyncFreq   int     // Low: every 1000 writes
    
    // Write-heavy workload
    MemtableSize  int64   // Large: 256MB
    ShardCount    int     // Medium: cores × 2
    WalSyncFreq   int     // High: every 100 writes
}

Lab 12.1: Performance Testing

Objective

Build a comprehensive performance testing suite with profiling, load testing, and capacity planning tools.

Requirements

• CPU and Memory Profiling: Identify bottlenecks with pprof
• Load Testing: Multi-threaded stress testing framework
• Memory Optimization: Object pooling and allocation reduction
• Capacity Planning: Throughput and latency analysis
• Benchmark Suite: Automated performance regression testing
• Performance Monitoring: Real-time metrics during testing

Starter Code

type PerformanceTest struct {
    name   string
    store  Store
    result *TestResult
}

type TestResult struct {
    Operations    int64
    Errors        int64
    Throughput    float64
    AvgLatency    time.Duration
    P99Latency    time.Duration
}

func (pt *PerformanceTest) Run(concurrency int, duration time.Duration) *TestResult {
    // Run load test
    // Collect metrics
    // Return results
    return pt.result
}

// TODO: Implement profiling
func (pt *PerformanceTest) StartProfiling() {
    // Start CPU and memory profiling
}

// TODO: Implement load testing
func (pt *PerformanceTest) RunLoadTest(concurrency int, duration time.Duration) {
    // Multi-threaded load test
}

// TODO: Implement capacity planning
func (pt *PerformanceTest) AnalyzeCapacity() *CapacityReport {
    // Analyze throughput, latency, and capacity
    return &CapacityReport{}
}

Test Template

func TestPerformance(t *testing.T) {
    store := NewStore()
    pt := &PerformanceTest{
        name:  "GetOperation",
        store: store,
    }
    
    // Run performance test
    result := pt.Run(100, 30*time.Second)
    
    // Verify performance targets
    assert.True(t, result.Throughput > 10000, "Throughput too low")
    assert.True(t, result.AvgLatency < 1*time.Millisecond, "Latency too high")
    assert.True(t, result.P99Latency < 10*time.Millisecond, "P99 latency too high")
    assert.True(t, result.Errors == 0, "Should have no errors")
}

func BenchmarkGet(b *testing.B) {
    store := NewStore()
    store.Put("key", []byte("value"))
    
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        store.Get("key")
    }
}

func TestMemoryUsage(t *testing.T) {
    // Test memory allocation patterns
    // Verify object pooling is working
    // Check for memory leaks
}

Acceptance Criteria

✅ CPU profiling identifies bottlenecks
✅ Memory profiling shows optimization opportunities
✅ Load testing supports 10,000+ ops/sec
✅ Object pooling reduces allocations by 50%
✅ Capacity planning provides accurate estimates
✅ Benchmark suite catches performance regressions
✅ Real-time monitoring during testing
✅ > 90% code coverage
✅ All tests pass

Summary: Week 12 Complete

By completing Week 12, you've learned and implemented:

1. Bottleneck Identification

• CPU profiling with pprof
• Memory profiling for leaks
• Lock contention analysis
• Hot path optimization

2. Memory Optimization

• Object pooling with sync.Pool
• Allocation reduction techniques
• Memory reuse patterns
• Garbage collection optimization

3. Load Testing

• Go benchmarking framework
• Multi-threaded stress testing
• Throughput and latency measurement
• Error rate monitoring

4. Capacity Planning

• Headroom calculation
• Workload-specific tuning
• Resource utilization analysis
• SLA compliance planning

Key Skills Mastered:

✅ Profile and identify performance bottlenecks
✅ Optimize memory usage and reduce allocations
✅ Design comprehensive load testing strategies
✅ Plan capacity for different workloads
✅ Build performance regression testing
✅ Production-ready performance optimization

Ready for Week 13?

Next week we'll focus on configuration management, deployment strategies, and production operations.

Continue to Week 13: Configuration and Deployment →

Course Navigation

All Lessons