Performance Optimization and Tuning

Learning Objectives

  • • Identify performance bottlenecks using profiling tools
  • • Optimize memory usage with object pooling and allocation reduction
  • • Design comprehensive load testing strategies
  • • Plan capacity and tune systems for different workloads
  • • Master Go performance optimization techniques
  • • Build production-ready performance monitoring

Lesson 12.1: Identifying Bottlenecks

Using CPU Profiler

$ go tool pprof cpu.prof

(pprof) top -cum
Showing top functions by cumulative time:
  40%  compaction
  25%  lock_acquire
  20%  serialization
  15%  memcpy

Interpretation

  • compaction is using 40% of CPU time → Focus optimization there first
  • lock_acquire is using 25% of CPU time → Check for lock contention
  • serialization is using 20% of CPU time → Optimize data encoding
  • memcpy is using 15% of CPU time → Ignore (not worth optimizing)

Lock Contention

❌ BAD: Lock held too long

func (s *Store) Get(key string) interface {
    s.mu.Lock()
    defer s.mu.Unlock()
    
    val := s.data[key]          // Fast: 1µs
    processed := expensiveWork(val)  // Slow: 100µs
    return processed
}

✅ GOOD: Minimize critical section

func (s *Store) Get(key string) interface {
    s.mu.Lock()
    val := s.data[key]
    s.mu.Unlock()
    
    // Do expensive work OUTSIDE lock
    return expensiveWork(val)
}

Lesson 12.2: Memory Optimization

Object Pooling

import "sync"

var bufferPool = sync.Pool{
    New: func() interface {
        return make([]byte, 4096)
    },
}

// Get from pool
buf := bufferPool.Get().([]byte)
defer bufferPool.Put(buf)

// Use buffer
// ... work with buf ...

// Result: Reuse instead of allocate

Reduce Allocations

❌ BAD

func formatKey(key []byte) string {
    return "prefix:" + string(key)  // Allocation every time
}

✅ GOOD

func formatKey(key []byte) []byte {
    result := make([]byte, 7+len(key))
    copy(result[0:], []byte("prefix:"))
    copy(result[7:], key)
    return result  // Single allocation
}

Lesson 12.3: Load Testing

Benchmark in Go

func BenchmarkGet(b *testing.B) {
    store := NewStore()
    store.Put("key", []byte("value"))
    
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        store.Get("key")
    }
}

// Run: go test -bench=. -benchmem
// Output: 50000000    20 ns/op    0 B/alloc

Load Test

type LoadTest struct {
    concurrency  int
    duration     time.Duration
    client       *Client
}

func (lt *LoadTest) Run() {
    done := time.Now().Add(lt.duration)
    count := 0
    errors := 0
    
    for i := 0; i < lt.concurrency; i++ {
        go func() {
            for time.Now().Before(done) {
                _, err := lt.client.Get("key")
                count++
                if err != nil {
                    errors++
                }
            }
        }()
    }
    
    fmt.Printf("Requests: %d, Errors: %d\n", count, errors)
}

Lesson 12.4: Capacity Planning

Headroom

Desired throughput:     10,000 ops/sec
Target utilization:     70%
Required capacity:      14,286 ops/sec

Why headroom?
- Traffic spikes
- Maintenance
- Graceful degradation
- SLA buffer

Tuning by Workload

type Config struct {
    // Read-heavy workload
    CacheSize     int64   // Large: 512MB
    ShardCount    int     // High: cores × 4
    WalSyncFreq   int     // Low: every 1000 writes
    
    // Write-heavy workload
    MemtableSize  int64   // Large: 256MB
    ShardCount    int     // Medium: cores × 2
    WalSyncFreq   int     // High: every 100 writes
}

Lab 12.1: Performance Testing

Objective

Build a comprehensive performance testing suite with profiling, load testing, and capacity planning tools.

Requirements

  • • CPU and Memory Profiling: Identify bottlenecks with pprof
  • • Load Testing: Multi-threaded stress testing framework
  • • Memory Optimization: Object pooling and allocation reduction
  • • Capacity Planning: Throughput and latency analysis
  • • Benchmark Suite: Automated performance regression testing
  • • Performance Monitoring: Real-time metrics during testing

Starter Code

type PerformanceTest struct {
    name   string
    store  Store
    result *TestResult
}

type TestResult struct {
    Operations    int64
    Errors        int64
    Throughput    float64
    AvgLatency    time.Duration
    P99Latency    time.Duration
}

func (pt *PerformanceTest) Run(concurrency int, duration time.Duration) *TestResult {
    // Run load test
    // Collect metrics
    // Return results
    return pt.result
}

// TODO: Implement profiling
func (pt *PerformanceTest) StartProfiling() {
    // Start CPU and memory profiling
}

// TODO: Implement load testing
func (pt *PerformanceTest) RunLoadTest(concurrency int, duration time.Duration) {
    // Multi-threaded load test
}

// TODO: Implement capacity planning
func (pt *PerformanceTest) AnalyzeCapacity() *CapacityReport {
    // Analyze throughput, latency, and capacity
    return &CapacityReport{}
}

Test Template

func TestPerformance(t *testing.T) {
    store := NewStore()
    pt := &PerformanceTest{
        name:  "GetOperation",
        store: store,
    }
    
    // Run performance test
    result := pt.Run(100, 30*time.Second)
    
    // Verify performance targets
    assert.True(t, result.Throughput > 10000, "Throughput too low")
    assert.True(t, result.AvgLatency < 1*time.Millisecond, "Latency too high")
    assert.True(t, result.P99Latency < 10*time.Millisecond, "P99 latency too high")
    assert.True(t, result.Errors == 0, "Should have no errors")
}

func BenchmarkGet(b *testing.B) {
    store := NewStore()
    store.Put("key", []byte("value"))
    
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        store.Get("key")
    }
}

func TestMemoryUsage(t *testing.T) {
    // Test memory allocation patterns
    // Verify object pooling is working
    // Check for memory leaks
}

Acceptance Criteria

  • ✅ CPU profiling identifies bottlenecks
  • ✅ Memory profiling shows optimization opportunities
  • ✅ Load testing supports 10,000+ ops/sec
  • ✅ Object pooling reduces allocations by 50%
  • ✅ Capacity planning provides accurate estimates
  • ✅ Benchmark suite catches performance regressions
  • ✅ Real-time monitoring during testing
  • ✅ > 90% code coverage
  • ✅ All tests pass

Summary: Week 12 Complete

By completing Week 12, you've learned and implemented:

1. Bottleneck Identification

  • • CPU profiling with pprof
  • • Memory profiling for leaks
  • • Lock contention analysis
  • • Hot path optimization

2. Memory Optimization

  • • Object pooling with sync.Pool
  • • Allocation reduction techniques
  • • Memory reuse patterns
  • • Garbage collection optimization

3. Load Testing

  • • Go benchmarking framework
  • • Multi-threaded stress testing
  • • Throughput and latency measurement
  • • Error rate monitoring

4. Capacity Planning

  • • Headroom calculation
  • • Workload-specific tuning
  • • Resource utilization analysis
  • • SLA compliance planning

Key Skills Mastered:

  • ✅ Profile and identify performance bottlenecks
  • ✅ Optimize memory usage and reduce allocations
  • ✅ Design comprehensive load testing strategies
  • ✅ Plan capacity for different workloads
  • ✅ Build performance regression testing
  • ✅ Production-ready performance optimization

Ready for Week 13?

Next week we'll focus on configuration management, deployment strategies, and production operations.

Continue to Week 13: Configuration and Deployment →