Course Navigation
← Back to Course OverviewAll Lessons
Building the Core Data Structure
Learning Objectives
- • Choose the right data structure for your key-value store
- • Implement HashMap with proper hash functions
- • Handle value serialization strategies
- • Build comprehensive testing frameworks
- • Master Go's testing and benchmarking tools
Lesson 2.1: HashMap Implementation
Choosing the Right Data Structure
Before we implement, let's understand the options and trade-offs for our core in-memory key-value store.
Option 1: Go's Native Map (map[string]interface{})
Characteristics:
- • Built-in hash table implementation
- • O(1) average case lookup
- • Not thread-safe (race conditions with concurrent access)
- • Keys are unordered (iteration order randomized)
- • Fast for basic use cases
When to use:
- • Single-threaded applications
- • Prototyping and testing
- • When simplicity matters more than concurrency
Option 2: Custom HashMap with Sharding
Characteristics:
- • Divide keys into N shards (sub-maps)
- • Each shard has its own lock
- • Multiple goroutines can operate on different shards concurrently
- • More complex but much better concurrency
- • Still unordered
Performance improvement:
- • Single lock: throughput = 100K ops/sec (8 cores bottlenecked)
- • 16 shards: throughput = 1.2M ops/sec (8 cores utilized)
When to use:
- • Multi-threaded applications
- • High concurrency, many readers/writers
- • Good balance of simplicity and performance
Option 3: Skip List
Characteristics:
- • Probabilistic data structure maintaining sorted order
- • O(log n) lookup, insert, delete
- • Enables range queries efficiently
- • Can be made lock-free with careful design
- • More memory overhead than hash tables (~33%)
When to use:
- • Need sorted iteration (range queries)
- • Building LSM tree memtable
- • Distributed systems (Bigtable, HBase use this)
Option 4: B-Tree or B+ Tree
Characteristics:
- • Balanced tree structure
- • O(log n) operations
- • High degree (many children per node)
- • Excellent disk I/O patterns
- • More complex implementation
When to use:
- • Disk-based storage (SSTables)
- • Very large datasets
- • Need to minimize disk seeks
Decision: Sharded HashMap for MVP
For Week 2 (MVP): We'll implement a sharded hash map because:
- ✓ Good balance of simplicity and performance
- ✓ Thread-safe for concurrent operations
- ✓ Sufficient for most use cases
- ✓ Foundation for learning concurrency patterns
Later (Week 5): We'll add sorted keys via LSM tree when we add persistence.
Go's Native Map vs Custom Implementation
Go's Map Under the Hood
type hmap struct {
    count      int           // number of elements
    flags      uint8         // flags tracking rehashing state
    B          uint8         // log_2 of buckets array size
    noverflow  uint16        // overflow buckets count
    hash0      uint32        // seed hash
    buckets    unsafe.Pointer // bucket array
    oldbuckets unsafe.Pointer // previous bucket array (during resize)
    nevacuate  uintptr       // progress of old bucket evacuation
    extra      *mapextra     // overflow buckets, if any
}
type bmap struct {
    tophash [8]uint8                    // hash values for 8 entries
    keys    [8]keytype                  // keys
    values  [8]valuetype               // values
    pad     uintptr
    overflow *bmap                      // overflow bucket for collision chain
}Key Points:
- • Bucket size: 8 entries per bucket (cache line optimization)
- • Collision handling: overflow buckets linked as a chain
- • Growth: Doubles bucket count when load factor > 6.5/8 = ~0.8
- • Rehashing: Incremental during operations, not all at once
Why We Can't Just Use map[string]interface{}
Problem: Not thread-safe
// Problem: Not thread-safe
m := make(map[string]interface{})
go func() {
    m["key1"] = "value1"  // goroutine 1 writes
}()
go func() {
    _ = m["key1"]         // goroutine 2 reads
}()
// Result: RACE CONDITION - panics or corrupts dataThread Safety Considerations
What Makes Maps Unsafe?
During a resize operation, the map is internally inconsistent. Two concurrent operations can see different states:
Goroutine A: Reading key="foo" during resize
    1. Compute hash(foo) = 0x123abc
    2. Look in old bucket[3]
    3. Find value = "bar"
    4. Return "bar"
Goroutine B: Writing key="baz" = "qux" during resize
    1. Acquire exclusive access (panic or data corruption)
    2. Modify bucket array
    3. Element A sees inconsistent stateSolution: Synchronization
Option 1: Coarse-grained lock (simple, slower)
type Map struct {
    mu    sync.RWMutex
    data  map[string]interface{}
}
func (m *Map) Get(key string) interface{} {
    m.mu.RLock()
    defer m.mu.RUnlock()
    return m.data[key]
}Option 2: Fine-grained locking (complex, faster)
type ShardedMap struct {
    shards []*mapShard
}
type mapShard struct {
    mu    sync.RWMutex
    data  map[string]interface{}
}Option 3: sync.Map (Go 1.9+, specific use case)
var m sync.Map
m.Store("key", "value")
val, _ := m.Load("key")When to use each:
- • Coarse-grained: Simple, fine for modest concurrency
- • Sharded: High concurrency, many goroutines
- • sync.Map: Write-rarely, read-heavy workloads
Lesson 2.2: Basic Operations (Get, Put, Delete)
Architecture of Our Sharded HashMap
┌─────────────────────────────────┐
│   ShardedMap (thread-safe)      │
├─────────────────────────────────┤
│ shards [16]                     │
└──┬──────────┬────────┬─────────┘
   │          │        │
   ▼          ▼        ▼
┌────────┐ ┌────────┐ ┌────────┐
│Shard 0 │ │Shard 1 │...│Shard15│
├────────┤ ├────────┤ ├────────┤
│Lock    │ │Lock    │ │Lock    │
│HashMap │ │HashMap │ │HashMap │
└────────┘ └────────┘ └────────┘Implementing Get Operation
// Full implementation with documentation
type ShardedMap struct {
    shards []*mapShard
}
type mapShard struct {
    mu   sync.RWMutex
    data map[string]interface{}
}
func NewShardedMap(shardCount int) *ShardedMap {
    shards := make([]*mapShard, shardCount)
    for i := 0; i < shardCount; i++ {
        shards[i] = &mapShard{
            data: make(map[string]interface{}),
        }
    }
    return &ShardedMap{shards: shards}
}
// Get retrieves a value by key
// Returns (value, exists)
// Time complexity: O(1) average case
// Thread-safe: Multiple readers allowed simultaneously
func (m *ShardedMap) Get(key string) (interface{}, bool) {
    // Step 1: Determine which shard owns this key
    shard := m.getShard(key)
    
    // Step 2: Acquire read lock (allows multiple concurrent readers)
    shard.mu.RLock()
    defer shard.mu.RUnlock()
    
    // Step 3: Look up value in shard's map
    val, exists := shard.data[key]
    return val, exists
}
// getShard returns the shard responsible for a given key
// Uses hash function to distribute keys uniformly
func (m *ShardedMap) getShard(key string) *mapShard {
    // Hash the key to get a number
    h := fnv.New64a()
    h.Write([]byte(key))
    
    // Map hash to shard index using modulo
    shardIndex := int(h.Sum64() % uint64(len(m.shards)))
    return m.shards[shardIndex]
}Implementing Put Operation
// Put stores a value at the given key
// If key exists, overwrites the value
// Time complexity: O(1) average case
// Thread-safe: Exclusive access to shard during write
func (m *ShardedMap) Put(key string, value interface{}) error {
    // Input validation
    if key == "" {
        return errors.New("key cannot be empty")
    }
    if value == nil {
        return errors.New("value cannot be nil")
    }
    
    // Step 1: Get the appropriate shard
    shard := m.getShard(key)
    
    // Step 2: Acquire write lock (exclusive access)
    shard.mu.Lock()
    defer shard.mu.Unlock()
    
    // Step 3: Store in map
    shard.data[key] = value
    
    return nil
}
// Thread safety notes:
// - Lock is held for entire operation
// - Other goroutines cannot access this shard during Put
// - Readers on OTHER shards still proceed normally (parallelism)Implementing Delete Operation
// Delete removes a key and its value
// Returns true if key existed, false otherwise
// Time complexity: O(1) average case
// Thread-safe: Exclusive access to shard during deletion
func (m *ShardedMap) Delete(key string) bool {
    shard := m.getShard(key)
    
    shard.mu.Lock()
    defer shard.mu.Unlock()
    
    _, existed := shard.data[key]
    delete(shard.data, key)
    return existed
}Handling Key Collisions
Hash Collision: Two different keys hash to the same value
Example: Hash function: hash(key) = len(key) % 16
  hash("a") = 1
  hash("bcd") = 3
  hash("e") = 1  <- COLLISION with "a"!How Go's Map Handles Collisions:
- 1. Bucket array has 16 buckets (for example)
- 2. Each bucket stores up to 8 entries inline
- 3. Hash collision → same bucket
- 4. If bucket full, create overflow bucket (linked list)
Our Sharded Map:
- • Primary collision handling: Shard hash (% number of shards)
- • Secondary collision handling: Go's map internal chaining
Value Serialization Strategies
Our in-memory store stores interface{} which can be any Go type. But what about networking and persistence?
Option 1: JSON Serialization
type JSONCodec struct{}
func (c JSONCodec) Encode(v interface{}) ([]byte, error) {
    return json.Marshal(v)
}
func (c JSONCodec) Decode(data []byte, v interface{}) error {
    return json.Unmarshal(data, v)
}
// Usage:
codec := JSONCodec{}
data, _ := codec.Encode("hello")  // => []byte(`"hello"`)
var s string
codec.Decode(data, &s)  // => s = "hello"Pros:
- • Human-readable
- • Widely supported
Cons:
- • Verbose
- • Slow
- • Type information lost
Option 2: Protocol Buffers
protobuf
syntax = "proto3";
message Value {
    oneof data {
        string string_value = 1;
        int64 int_value = 2;
        bytes bytes_value = 3;
    }
}Pros:
- • Compact
- • Fast
- • Schema versioning
Cons:
- • Requires code generation
- • Steeper learning curve
Option 3: Custom Binary Encoding
// Simple binary format for our use case
type BinaryCodec struct{}
// Format: [type_byte][length_varint][data...]
func (c BinaryCodec) Encode(v interface{}) ([]byte, error) {
    buf := bytes.NewBuffer(nil)
    
    switch val := v.(type) {
    case string:
        buf.WriteByte(1)  // type: string
        binary.Write(buf, binary.BigEndian, int32(len(val)))
        buf.WriteString(val)
    case int64:
        buf.WriteByte(2)  // type: int64
        binary.Write(buf, binary.BigEndian, val)
    case []byte:
        buf.WriteByte(3)  // type: bytes
        binary.Write(buf, binary.BigEndian, int32(len(val)))
        buf.Write(val)
    default:
        return nil, errors.New("unsupported type")
    }
    
    return buf.Bytes(), nil
}Pros:
- • Minimal overhead
- • Fast
- • Control over format
Cons:
- • Manual implementation
- • Limited type support
Decision for our course: Start with storing byte slices ([]byte)
This simplifies serialization and forces explicit encoding decisions.
Lesson 2.3: Testing Strategy
Unit Testing Fundamentals
Test Structure in Go - Test functions must start with "Test"
// Filename: store_test.go (same directory as store.go)
// Test functions must start with "Test"
func TestStoreGet(t *testing.T) {
    // Arrange: Set up test fixtures
    store := NewShardedStore(4)
    store.Put(context.Background(), []byte("key1"), []byte("value1"))
    
    // Act: Execute the code under test
    result, err := store.Get(context.Background(), []byte("key1"))
    
    // Assert: Verify the result
    if err != nil {
        t.Fatalf("unexpected error: %v", err)
    }
    
    if string(result) != "value1" {
        t.Errorf("got %q, want %q", string(result), "value1")
    }
}Subtests for Organization
func TestStore(t *testing.T) {
    store := NewShardedStore(4)
    
    t.Run("Put and Get", func(t *testing.T) {
        store.Put(context.Background(), []byte("k"), []byte("v"))
        v, _ := store.Get(context.Background(), []byte("k"))
        if string(v) != "v" {
            t.Errorf("got %q, want %q", string(v), "v")
        }
    })
    
    t.Run("Get missing key", func(t *testing.T) {
        _, err := store.Get(context.Background(), []byte("missing"))
        if err != ErrKeyNotFound {
            t.Errorf("got %v, want ErrKeyNotFound", err)
        }
    })
    
    t.Run("Delete", func(t *testing.T) {
        store.Put(context.Background(), []byte("k2"), []byte("v2"))
        store.Delete(context.Background(), []byte("k2"))
        _, err := store.Get(context.Background(), []byte("k2"))
        if err != ErrKeyNotFound {
            t.Errorf("expected key to be deleted")
        }
    })
}Table-Driven Tests
Table-driven tests reduce duplication and make it easy to add more test cases.
func TestStoreOperations(t *testing.T) {
    tests := []struct {
        name      string
        key       []byte
        value     []byte
        setupFn   func(*ShardedStore)
        wantValue []byte
        wantErr   bool
    }{
        {
            name:      "put and get string",
            key:       []byte("user:1"),
            value:     []byte("alice"),
            wantValue: []byte("alice"),
        },
        {
            name:      "put and get binary",
            key:       []byte("binary:1"),
            value:     []byte{0x00, 0x01, 0x02},
            wantValue: []byte{0x00, 0x01, 0x02},
        },
        {
            name:    "get nonexistent key",
            key:     []byte("missing"),
            wantErr: true,
        },
        {
            name:  "overwrite value",
            key:   []byte("counter"),
            value: []byte("2"),
            setupFn: func(s *ShardedStore) {
                s.Put(context.Background(), []byte("counter"), []byte("1"))
            },
            wantValue: []byte("2"),
        },
    }
    
    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            store := NewShardedStore(4)
            if tt.setupFn != nil {
                tt.setupFn(store)
            }
            
            store.Put(context.Background(), tt.key, tt.value)
            got, err := store.Get(context.Background(), tt.key)
            
            if (err != nil) != tt.wantErr {
                t.Errorf("error = %v, wantErr %v", err, tt.wantErr)
            }
            
            if !tt.wantErr && !bytes.Equal(got, tt.wantValue) {
                t.Errorf("got %v, want %v", got, tt.wantValue)
            }
        })
    }
}Benchmarking Basics
Benchmarks measure performance characteristics.
func BenchmarkGet(b *testing.B) {
    store := NewShardedStore(16)
    store.Put(context.Background(), []byte("key"), []byte("value"))
    
    b.ResetTimer()  // Don't count setup time
    
    for i := 0; i < b.N; i++ {
        store.Get(context.Background(), []byte("key"))
    }
}
// Run: go test -bench=. -benchmem
// Output: BenchmarkGet-8   5000000   256 ns/op   64 B/alloc
//                                     ^timing     ^memoryInterpreting results:
- • 5000000 iterations
- • 256 ns/op - nanoseconds per operation
- • 64 B/alloc - bytes allocated per operation
Property-Based Testing Introduction
Property-based testing generates random inputs and verifies invariants.
func TestStoreProperties(t *testing.T) {
    // Property: After Put, Get returns same value
    for trial := 0; trial < 100; trial++ {
        store := NewShardedStore(4)
        
        key := []byte(randomString(10))
        value := []byte(randomString(100))
        
        store.Put(context.Background(), key, value)
        got, _ := store.Get(context.Background(), key)
        
        if !bytes.Equal(got, value) {
            t.Errorf("trial %d: put-get invariant failed", trial)
        }
    }
}
func TestConcurrentSafety(t *testing.T) {
    // Property: Concurrent operations don't cause data corruption
    store := NewShardedStore(16)
    done := make(chan bool)
    
    // Writers
    for w := 0; w < 10; w++ {
        go func(writerID int) {
            for i := 0; i < 1000; i++ {
                key := fmt.Sprintf("key:%d:%d", writerID, i)
                store.Put(context.Background(), []byte(key), []byte("value"))
            }
            done <- true
        }(w)
    }
    
    // Readers
    for r := 0; r < 10; r++ {
        go func() {
            for i := 0; i < 1000; i++ {
                for w := 0; w < 10; w++ {
                    key := fmt.Sprintf("key:%d:%d", w, i)
                    store.Get(context.Background(), []byte(key))
                }
            }
            done <- true
        }()
    }
    
    // Wait for all goroutines
    for i := 0; i < 20; i++ {
        <-done
    }
    
    // If we get here without panic or corruption, test passes
    t.Log("concurrent safety property verified")
}Hands-on Lab: Implement thread-unsafe in-memory store
Build a basic key-value store without concurrency concerns.
Lab Tasks:
- 1. Implement Get method with proper validation
- 2. Implement Put method with value copying
- 3. Implement Delete method
- 4. Write comprehensive tests (basic operations, edge cases)
- 5. Add benchmarks for Get, Put, Delete operations
- 6. Test with binary data and various key types
- 7. Achieve >90% code coverage
Summary & Learning Outcomes
What You've Learned:
- • Data structure trade-offs and when to use each
- • Go's native map implementation details
- • Thread safety considerations and solutions
- • Value serialization strategies
- • Comprehensive testing techniques
- • Benchmarking and performance measurement
Key Takeaways
- • Choose the right data structure for your use case
- • Always consider thread safety in concurrent environments
- • Test-driven development leads to better code quality
- • Benchmark early and often to catch performance issues
- • Property-based testing helps find edge cases