Course Navigation
← Back to Course OverviewAll Lessons
1
Introduction and Database Fundamentals 2
Building the Core Data Structure 3
Concurrency and Thread Safety 4
Append-Only Log (Write-Ahead Log) 5
SSTables and LSM Trees 6
Compaction and Optimization 7
TCP Server and Protocol Design 8
Client Library and Advanced Networking 9
Transactions and ACID Properties 10
Replication and High Availability 11
Monitoring, Metrics, and Observability 12
Performance Optimization and Tuning 13
Configuration and Deployment 14
Security and Production Hardening 15
Final Project and Beyond Progress 0%
Performance Optimization and Tuning
Learning Objectives
- • Identify performance bottlenecks using profiling tools
- • Optimize memory usage with object pooling and allocation reduction
- • Design comprehensive load testing strategies
- • Plan capacity and tune systems for different workloads
- • Master Go performance optimization techniques
- • Build production-ready performance monitoring
Lesson 12.1: Identifying Bottlenecks
Using CPU Profiler
$ go tool pprof cpu.prof
(pprof) top -cum
Showing top functions by cumulative time:
40% compaction
25% lock_acquire
20% serialization
15% memcpy Interpretation
- • compaction is using 40% of CPU time → Focus optimization there first
- • lock_acquire is using 25% of CPU time → Check for lock contention
- • serialization is using 20% of CPU time → Optimize data encoding
- • memcpy is using 15% of CPU time → Ignore (not worth optimizing)
Lock Contention
❌ BAD: Lock held too long
func (s *Store) Get(key string) interface {
s.mu.Lock()
defer s.mu.Unlock()
val := s.data[key] // Fast: 1µs
processed := expensiveWork(val) // Slow: 100µs
return processed
} ✅ GOOD: Minimize critical section
func (s *Store) Get(key string) interface {
s.mu.Lock()
val := s.data[key]
s.mu.Unlock()
// Do expensive work OUTSIDE lock
return expensiveWork(val)
} Lesson 12.2: Memory Optimization
Object Pooling
import "sync"
var bufferPool = sync.Pool{
New: func() interface {
return make([]byte, 4096)
},
}
// Get from pool
buf := bufferPool.Get().([]byte)
defer bufferPool.Put(buf)
// Use buffer
// ... work with buf ...
// Result: Reuse instead of allocate Reduce Allocations
❌ BAD
func formatKey(key []byte) string {
return "prefix:" + string(key) // Allocation every time
} ✅ GOOD
func formatKey(key []byte) []byte {
result := make([]byte, 7+len(key))
copy(result[0:], []byte("prefix:"))
copy(result[7:], key)
return result // Single allocation
} Lesson 12.3: Load Testing
Benchmark in Go
func BenchmarkGet(b *testing.B) {
store := NewStore()
store.Put("key", []byte("value"))
b.ResetTimer()
for i := 0; i < b.N; i++ {
store.Get("key")
}
}
// Run: go test -bench=. -benchmem
// Output: 50000000 20 ns/op 0 B/alloc Load Test
type LoadTest struct {
concurrency int
duration time.Duration
client *Client
}
func (lt *LoadTest) Run() {
done := time.Now().Add(lt.duration)
count := 0
errors := 0
for i := 0; i < lt.concurrency; i++ {
go func() {
for time.Now().Before(done) {
_, err := lt.client.Get("key")
count++
if err != nil {
errors++
}
}
}()
}
fmt.Printf("Requests: %d, Errors: %d\n", count, errors)
} Lesson 12.4: Capacity Planning
Headroom
Desired throughput: 10,000 ops/sec
Target utilization: 70%
Required capacity: 14,286 ops/sec
Why headroom?
- Traffic spikes
- Maintenance
- Graceful degradation
- SLA buffer Tuning by Workload
type Config struct {
// Read-heavy workload
CacheSize int64 // Large: 512MB
ShardCount int // High: cores × 4
WalSyncFreq int // Low: every 1000 writes
// Write-heavy workload
MemtableSize int64 // Large: 256MB
ShardCount int // Medium: cores × 2
WalSyncFreq int // High: every 100 writes
} Lab 12.1: Performance Testing
Objective
Build a comprehensive performance testing suite with profiling, load testing, and capacity planning tools.
Requirements
- • CPU and Memory Profiling: Identify bottlenecks with pprof
- • Load Testing: Multi-threaded stress testing framework
- • Memory Optimization: Object pooling and allocation reduction
- • Capacity Planning: Throughput and latency analysis
- • Benchmark Suite: Automated performance regression testing
- • Performance Monitoring: Real-time metrics during testing
Starter Code
type PerformanceTest struct {
name string
store Store
result *TestResult
}
type TestResult struct {
Operations int64
Errors int64
Throughput float64
AvgLatency time.Duration
P99Latency time.Duration
}
func (pt *PerformanceTest) Run(concurrency int, duration time.Duration) *TestResult {
// Run load test
// Collect metrics
// Return results
return pt.result
}
// TODO: Implement profiling
func (pt *PerformanceTest) StartProfiling() {
// Start CPU and memory profiling
}
// TODO: Implement load testing
func (pt *PerformanceTest) RunLoadTest(concurrency int, duration time.Duration) {
// Multi-threaded load test
}
// TODO: Implement capacity planning
func (pt *PerformanceTest) AnalyzeCapacity() *CapacityReport {
// Analyze throughput, latency, and capacity
return &CapacityReport{}
} Test Template
func TestPerformance(t *testing.T) {
store := NewStore()
pt := &PerformanceTest{
name: "GetOperation",
store: store,
}
// Run performance test
result := pt.Run(100, 30*time.Second)
// Verify performance targets
assert.True(t, result.Throughput > 10000, "Throughput too low")
assert.True(t, result.AvgLatency < 1*time.Millisecond, "Latency too high")
assert.True(t, result.P99Latency < 10*time.Millisecond, "P99 latency too high")
assert.True(t, result.Errors == 0, "Should have no errors")
}
func BenchmarkGet(b *testing.B) {
store := NewStore()
store.Put("key", []byte("value"))
b.ResetTimer()
for i := 0; i < b.N; i++ {
store.Get("key")
}
}
func TestMemoryUsage(t *testing.T) {
// Test memory allocation patterns
// Verify object pooling is working
// Check for memory leaks
} Acceptance Criteria
- ✅ CPU profiling identifies bottlenecks
- ✅ Memory profiling shows optimization opportunities
- ✅ Load testing supports 10,000+ ops/sec
- ✅ Object pooling reduces allocations by 50%
- ✅ Capacity planning provides accurate estimates
- ✅ Benchmark suite catches performance regressions
- ✅ Real-time monitoring during testing
- ✅ > 90% code coverage
- ✅ All tests pass
Summary: Week 12 Complete
By completing Week 12, you've learned and implemented:
1. Bottleneck Identification
- • CPU profiling with pprof
- • Memory profiling for leaks
- • Lock contention analysis
- • Hot path optimization
2. Memory Optimization
- • Object pooling with sync.Pool
- • Allocation reduction techniques
- • Memory reuse patterns
- • Garbage collection optimization
3. Load Testing
- • Go benchmarking framework
- • Multi-threaded stress testing
- • Throughput and latency measurement
- • Error rate monitoring
4. Capacity Planning
- • Headroom calculation
- • Workload-specific tuning
- • Resource utilization analysis
- • SLA compliance planning
Key Skills Mastered:
- ✅ Profile and identify performance bottlenecks
- ✅ Optimize memory usage and reduce allocations
- ✅ Design comprehensive load testing strategies
- ✅ Plan capacity for different workloads
- ✅ Build performance regression testing
- ✅ Production-ready performance optimization
Ready for Week 13?
Next week we'll focus on configuration management, deployment strategies, and production operations.
Continue to Week 13: Configuration and Deployment →