Learning Objectives

Identify performance bottlenecks using profiling tools
Optimize memory usage with object pooling and allocation reduction
Design comprehensive load testing strategies
Plan capacity and tune systems for different workloads
Master Go performance optimization techniques
Build production-ready performance monitoring

Lesson 12.1: Identifying Bottlenecks

Using CPU Profiler

$ go tool pprof cpu.prof

(pprof) top -cum
Showing top functions by cumulative time:
  40%  compaction
  25%  lock_acquire
  20%  serialization
  15%  memcpy

Interpretation

compaction is using 40% of CPU time → Focus optimization there first
lock_acquire is using 25% of CPU time → Check for lock contention
serialization is using 20% of CPU time → Optimize data encoding
memcpy is using 15% of CPU time → Ignore (not worth optimizing)

Lock Contention

❌ BAD: Lock held too long

func (s *Store) Get(key string) interface{} {
    s.mu.Lock()
    defer s.mu.Unlock()
    
    val := s.data[key]          // Fast: 1µs
    processed := expensiveWork(val)  // Slow: 100µs
    return processed
}

✅ GOOD: Minimize critical section

func (s *Store) Get(key string) interface{} {
    s.mu.Lock()
    val := s.data[key]
    s.mu.Unlock()
    
    // Do expensive work OUTSIDE lock
    return expensiveWork(val)
}

Lesson 12.2: Memory Optimization

Object Pooling

import "sync"

var bufferPool = sync.Pool{
    New: func() interface{} {
        return make([]byte, 4096)
    },
}

// Get from pool
buf := bufferPool.Get().([]byte)
defer bufferPool.Put(buf)

// Use buffer
// ... work with buf ...

// Result: Reuse instead of allocate

Reduce Allocations

❌ BAD

func formatKey(key []byte) string {
    return "prefix:" + string(key)  // Allocation every time
}

✅ GOOD

func formatKey(key []byte) []byte {
    result := make([]byte, 7+len(key))
    copy(result[0:], []byte("prefix:"))
    copy(result[7:], key)
    return result  // Single allocation
}

Lesson 12.3: Load Testing

Benchmark in Go

func BenchmarkGet(b *testing.B) {
    store := NewStore()
    store.Put("key", []byte("value"))
    
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        store.Get("key")
    }
}

// Run: go test -bench=. -benchmem
// Output: 50000000    20 ns/op    0 B/alloc

Load Test

type LoadTest struct {
    concurrency  int
    duration     time.Duration
    client       *Client
}

func (lt *LoadTest) Run() {
    done := time.Now().Add(lt.duration)
    count := 0
    errors := 0
    
    for i := 0; i < lt.concurrency; i++ {
        go func() {
            for time.Now().Before(done) {
                _, err := lt.client.Get("key")
                count++
                if err != nil {
                    errors++
                }
            }
        }()
    }
    
    fmt.Printf("Requests: %d, Errors: %d\n", count, errors)
}

Lesson 12.4: Capacity Planning

Headroom

Desired throughput:     10,000 ops/sec
Target utilization:     70%
Required capacity:      14,286 ops/sec

Why headroom?
- Traffic spikes
- Maintenance
- Graceful degradation
- SLA buffer

Tuning by Workload

type Config struct {
    // Read-heavy workload
    CacheSize     int64   // Large: 512MB
    ShardCount    int     // High: cores × 4
    WalSyncFreq   int     // Low: every 1000 writes
    
    // Write-heavy workload
    MemtableSize  int64   // Large: 256MB
    ShardCount    int     // Medium: cores × 2
    WalSyncFreq   int     // High: every 100 writes
}

Lab 12.1: Performance Testing

Objective

Build a comprehensive performance testing suite with profiling, load testing, and capacity planning tools.

Requirements

CPU and Memory Profiling: Identify bottlenecks with pprof
Load Testing: Multi-threaded stress testing framework
Memory Optimization: Object pooling and allocation reduction
Capacity Planning: Throughput and latency analysis
Benchmark Suite: Automated performance regression testing
Performance Monitoring: Real-time metrics during testing

Starter Code

type PerformanceTest struct {
    name   string
    store  Store
    result *TestResult
}

type TestResult struct {
    Operations    int64
    Errors        int64
    Throughput    float64
    AvgLatency    time.Duration
    P99Latency    time.Duration
}

func (pt *PerformanceTest) Run(concurrency int, duration time.Duration) *TestResult {
    // Run load test
    // Collect metrics
    // Return results
    return pt.result
}

// TODO: Implement profiling
func (pt *PerformanceTest) StartProfiling() {
    // Start CPU and memory profiling
}

// TODO: Implement load testing
func (pt *PerformanceTest) RunLoadTest(concurrency int, duration time.Duration) {
    // Multi-threaded load test
}

// TODO: Implement capacity planning
func (pt *PerformanceTest) AnalyzeCapacity() *CapacityReport {
    // Analyze throughput, latency, and capacity
    return &CapacityReport{}
}

Test Template

func TestPerformance(t *testing.T) {
    store := NewStore()
    pt := &PerformanceTest{
        name:  "GetOperation",
        store: store,
    }
    
    // Run performance test
    result := pt.Run(100, 30*time.Second)
    
    // Verify performance targets
    assert.True(t, result.Throughput > 10000, "Throughput too low")
    assert.True(t, result.AvgLatency < 1*time.Millisecond, "Latency too high")
    assert.True(t, result.P99Latency < 10*time.Millisecond, "P99 latency too high")
    assert.True(t, result.Errors == 0, "Should have no errors")
}

func BenchmarkGet(b *testing.B) {
    store := NewStore()
    store.Put("key", []byte("value"))
    
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        store.Get("key")
    }
}

func TestMemoryUsage(t *testing.T) {
    // Test memory allocation patterns
    // Verify object pooling is working
    // Check for memory leaks
}

Acceptance Criteria

✅ CPU profiling identifies bottlenecks
✅ Memory profiling shows optimization opportunities
✅ Load testing supports 10,000+ ops/sec
✅ Object pooling reduces allocations by 50%
✅ Capacity planning provides accurate estimates
✅ Benchmark suite catches performance regressions
✅ Real-time monitoring during testing
✅ > 90% code coverage
✅ All tests pass

Summary: Week 12 Complete

By completing Week 12, you’ve learned and implemented:

1. Bottleneck Identification

CPU profiling with pprof
Memory profiling for leaks
Lock contention analysis
Hot path optimization

2. Memory Optimization

Object pooling with sync.Pool
Allocation reduction techniques
Memory reuse patterns
Garbage collection optimization

3. Load Testing

Go benchmarking framework
Multi-threaded stress testing
Throughput and latency measurement
Error rate monitoring

4. Capacity Planning

Headroom calculation
Workload-specific tuning
Resource utilization analysis
SLA compliance planning

Key Skills Mastered:

✅ Profile and identify performance bottlenecks
✅ Optimize memory usage and reduce allocations
✅ Design comprehensive load testing strategies
✅ Plan capacity for different workloads
✅ Build performance regression testing
✅ Production-ready performance optimization

Ready for Week 13?

Next week we’ll focus on configuration management, deployment strategies, and production operations.

Continue to Week 13: Configuration and Deployment →

Performance tuning

Course Navigation