Lesson 12
Performance tuning
Profiling, benchmarks, and real bottlenecks.
Course Navigation
Back to courseLearning Objectives
-
Identify performance bottlenecks using profiling tools
-
Optimize memory usage with object pooling and allocation reduction
-
Design comprehensive load testing strategies
-
Plan capacity and tune systems for different workloads
-
Master Go performance optimization techniques
-
Build production-ready performance monitoring
Lesson 12.1: Identifying Bottlenecks
Using CPU Profiler
$ go tool pprof cpu.prof
(pprof) top -cum
Showing top functions by cumulative time:
40% compaction
25% lock_acquire
20% serialization
15% memcpy
Interpretation
-
compaction is using 40% of CPU time → Focus optimization there first
-
lock_acquire is using 25% of CPU time → Check for lock contention
-
serialization is using 20% of CPU time → Optimize data encoding
-
memcpy is using 15% of CPU time → Ignore (not worth optimizing)
Lock Contention
❌ BAD: Lock held too long
func (s *Store) Get(key string) interface{} {
s.mu.Lock()
defer s.mu.Unlock()
val := s.data[key] // Fast: 1µs
processed := expensiveWork(val) // Slow: 100µs
return processed
}
✅ GOOD: Minimize critical section
func (s *Store) Get(key string) interface{} {
s.mu.Lock()
val := s.data[key]
s.mu.Unlock()
// Do expensive work OUTSIDE lock
return expensiveWork(val)
}
Lesson 12.2: Memory Optimization
Object Pooling
import "sync"
var bufferPool = sync.Pool{
New: func() interface{} {
return make([]byte, 4096)
},
}
// Get from pool
buf := bufferPool.Get().([]byte)
defer bufferPool.Put(buf)
// Use buffer
// ... work with buf ...
// Result: Reuse instead of allocate
Reduce Allocations
❌ BAD
func formatKey(key []byte) string {
return "prefix:" + string(key) // Allocation every time
}
✅ GOOD
func formatKey(key []byte) []byte {
result := make([]byte, 7+len(key))
copy(result[0:], []byte("prefix:"))
copy(result[7:], key)
return result // Single allocation
}
Lesson 12.3: Load Testing
Benchmark in Go
func BenchmarkGet(b *testing.B) {
store := NewStore()
store.Put("key", []byte("value"))
b.ResetTimer()
for i := 0; i < b.N; i++ {
store.Get("key")
}
}
// Run: go test -bench=. -benchmem
// Output: 50000000 20 ns/op 0 B/alloc
Load Test
type LoadTest struct {
concurrency int
duration time.Duration
client *Client
}
func (lt *LoadTest) Run() {
done := time.Now().Add(lt.duration)
count := 0
errors := 0
for i := 0; i < lt.concurrency; i++ {
go func() {
for time.Now().Before(done) {
_, err := lt.client.Get("key")
count++
if err != nil {
errors++
}
}
}()
}
fmt.Printf("Requests: %d, Errors: %d\n", count, errors)
}
Lesson 12.4: Capacity Planning
Headroom
Desired throughput: 10,000 ops/sec
Target utilization: 70%
Required capacity: 14,286 ops/sec
Why headroom?
- Traffic spikes
- Maintenance
- Graceful degradation
- SLA buffer
Tuning by Workload
type Config struct {
// Read-heavy workload
CacheSize int64 // Large: 512MB
ShardCount int // High: cores × 4
WalSyncFreq int // Low: every 1000 writes
// Write-heavy workload
MemtableSize int64 // Large: 256MB
ShardCount int // Medium: cores × 2
WalSyncFreq int // High: every 100 writes
}
Lab 12.1: Performance Testing
Objective
Build a comprehensive performance testing suite with profiling, load testing, and capacity planning tools.
Requirements
-
CPU and Memory Profiling: Identify bottlenecks with pprof
-
Load Testing: Multi-threaded stress testing framework
-
Memory Optimization: Object pooling and allocation reduction
-
Capacity Planning: Throughput and latency analysis
-
Benchmark Suite: Automated performance regression testing
-
Performance Monitoring: Real-time metrics during testing
Starter Code
type PerformanceTest struct {
name string
store Store
result *TestResult
}
type TestResult struct {
Operations int64
Errors int64
Throughput float64
AvgLatency time.Duration
P99Latency time.Duration
}
func (pt *PerformanceTest) Run(concurrency int, duration time.Duration) *TestResult {
// Run load test
// Collect metrics
// Return results
return pt.result
}
// TODO: Implement profiling
func (pt *PerformanceTest) StartProfiling() {
// Start CPU and memory profiling
}
// TODO: Implement load testing
func (pt *PerformanceTest) RunLoadTest(concurrency int, duration time.Duration) {
// Multi-threaded load test
}
// TODO: Implement capacity planning
func (pt *PerformanceTest) AnalyzeCapacity() *CapacityReport {
// Analyze throughput, latency, and capacity
return &CapacityReport{}
}
Test Template
func TestPerformance(t *testing.T) {
store := NewStore()
pt := &PerformanceTest{
name: "GetOperation",
store: store,
}
// Run performance test
result := pt.Run(100, 30*time.Second)
// Verify performance targets
assert.True(t, result.Throughput > 10000, "Throughput too low")
assert.True(t, result.AvgLatency < 1*time.Millisecond, "Latency too high")
assert.True(t, result.P99Latency < 10*time.Millisecond, "P99 latency too high")
assert.True(t, result.Errors == 0, "Should have no errors")
}
func BenchmarkGet(b *testing.B) {
store := NewStore()
store.Put("key", []byte("value"))
b.ResetTimer()
for i := 0; i < b.N; i++ {
store.Get("key")
}
}
func TestMemoryUsage(t *testing.T) {
// Test memory allocation patterns
// Verify object pooling is working
// Check for memory leaks
}
Acceptance Criteria
-
✅ CPU profiling identifies bottlenecks
-
✅ Memory profiling shows optimization opportunities
-
✅ Load testing supports 10,000+ ops/sec
-
✅ Object pooling reduces allocations by 50%
-
✅ Capacity planning provides accurate estimates
-
✅ Benchmark suite catches performance regressions
-
✅ Real-time monitoring during testing
-
✅ > 90% code coverage
-
✅ All tests pass
Summary: Week 12 Complete
By completing Week 12, you’ve learned and implemented:
1. Bottleneck Identification
-
CPU profiling with pprof
-
Memory profiling for leaks
-
Lock contention analysis
-
Hot path optimization
2. Memory Optimization
-
Object pooling with sync.Pool
-
Allocation reduction techniques
-
Memory reuse patterns
-
Garbage collection optimization
3. Load Testing
-
Go benchmarking framework
-
Multi-threaded stress testing
-
Throughput and latency measurement
-
Error rate monitoring
4. Capacity Planning
-
Headroom calculation
-
Workload-specific tuning
-
Resource utilization analysis
-
SLA compliance planning
Key Skills Mastered:
-
✅ Profile and identify performance bottlenecks
-
✅ Optimize memory usage and reduce allocations
-
✅ Design comprehensive load testing strategies
-
✅ Plan capacity for different workloads
-
✅ Build performance regression testing
-
✅ Production-ready performance optimization
Ready for Week 13?
Next week we’ll focus on configuration management, deployment strategies, and production operations.
Continue to Week 13: Configuration and Deployment →