Learning Objectives

Choose the right data structure for your key-value store
Implement HashMap with proper hash functions
Handle value serialization strategies
Build comprehensive testing frameworks
Master Go’s testing and benchmarking tools

Lesson 2.1: HashMap Implementation

Choosing the Right Data Structure

Before we implement, let’s understand the options and trade-offs for our core in-memory key-value store.

Option 1: Go’s Native Map (`map[string]interface{}`)

Characteristics:

Built-in hash table implementation
O(1) average case lookup
Not thread-safe (race conditions with concurrent access)
Keys are unordered (iteration order randomized)
Fast for basic use cases

When to use:

Single-threaded applications
Prototyping and testing
When simplicity matters more than concurrency

Option 2: Custom HashMap with Sharding

Characteristics:

Divide keys into N shards (sub-maps)
Each shard has its own lock
Multiple goroutines can operate on different shards concurrently
More complex but much better concurrency
Still unordered

Performance improvement:

Single lock: throughput = 100K ops/sec (8 cores bottlenecked)
16 shards: throughput = 1.2M ops/sec (8 cores utilized)

When to use:

Multi-threaded applications
High concurrency, many readers/writers
Good balance of simplicity and performance

Option 3: Skip List

Characteristics:

Probabilistic data structure maintaining sorted order
O(log n) lookup, insert, delete
Enables range queries efficiently
Can be made lock-free with careful design
More memory overhead than hash tables (~33%)

When to use:

Need sorted iteration (range queries)
Building LSM tree memtable
Distributed systems (Bigtable, HBase use this)

Option 4: B-Tree or B+ Tree

Characteristics:

Balanced tree structure
O(log n) operations
High degree (many children per node)
Excellent disk I/O patterns
More complex implementation

When to use:

Disk-based storage (SSTables)
Very large datasets
Need to minimize disk seeks

Decision: Sharded HashMap for MVP

For Week 2 (MVP): We’ll implement a sharded hash map because:

✓ Good balance of simplicity and performance
✓ Thread-safe for concurrent operations
✓ Sufficient for most use cases
✓ Foundation for learning concurrency patterns

Later (Week 5): We’ll add sorted keys via LSM tree when we add persistence.

Go’s Native Map vs Custom Implementation

Go’s Map Under the Hood

type hmap struct {
    count      int           // number of elements
    flags      uint8         // flags tracking rehashing state
    B          uint8         // log_2 of buckets array size
    noverflow  uint16        // overflow buckets count
    hash0      uint32        // seed hash
    buckets    unsafe.Pointer // bucket array
    oldbuckets unsafe.Pointer // previous bucket array (during resize)
    nevacuate  uintptr       // progress of old bucket evacuation
    extra      *mapextra     // overflow buckets, if any
}

type bmap struct {
    tophash [8]uint8                    // hash values for 8 entries
    keys    [8]keytype                  // keys
    values  [8]valuetype               // values
    pad     uintptr
    overflow *bmap                      // overflow bucket for collision chain
}

Key Points:

Bucket size: 8 entries per bucket (cache line optimization)
Collision handling: overflow buckets linked as a chain
Growth: Doubles bucket count when load factor > 6.5/8 = ~0.8
Rehashing: Incremental during operations, not all at once

Why We Can’t Just Use map[string]interface{}

Problem: Not thread-safe

// Problem: Not thread-safe
m := make(map[string]interface{})

go func() {
    m["key1"] = "value1"  // goroutine 1 writes
}()

go func() {
    _ = m["key1"]         // goroutine 2 reads
}()
// Result: RACE CONDITION - panics or corrupts data

Thread Safety Considerations

What Makes Maps Unsafe?

During a resize operation, the map is internally inconsistent. Two concurrent operations can see different states:

Goroutine A: Reading key="foo" during resize
    1. Compute hash(foo) = 0x123abc
    2. Look in old bucket[3]
    3. Find value = "bar"
    4. Return "bar"

Goroutine B: Writing key="baz" = "qux" during resize
    1. Acquire exclusive access (panic or data corruption)
    2. Modify bucket array
    3. Element A sees inconsistent state

Solution: Synchronization

Option 1: Coarse-grained lock (simple, slower)

type Map struct {
    mu    sync.RWMutex
    data  map[string]interface{}
}

func (m *Map) Get(key string) interface{} {
    m.mu.RLock()
    defer m.mu.RUnlock()
    return m.data[key]
}

Option 2: Fine-grained locking (complex, faster)

type ShardedMap struct {
    shards []*mapShard
}

type mapShard struct {
    mu    sync.RWMutex
    data  map[string]interface{}
}

Option 3: sync.Map (Go 1.9+, specific use case)

var m sync.Map

m.Store("key", "value")
val, _ := m.Load("key")

When to use each:

Coarse-grained: Simple, fine for modest concurrency
Sharded: High concurrency, many goroutines
sync.Map: Write-rarely, read-heavy workloads

Lesson 2.2: Basic Operations (Get, Put, Delete)

Architecture of Our Sharded HashMap

┌─────────────────────────────────┐
│   ShardedMap (thread-safe)      │
├─────────────────────────────────┤
│ shards [16]                     │
└──┬──────────┬────────┬─────────┘
   │          │        │
   ▼          ▼        ▼
┌────────┐ ┌────────┐ ┌────────┐
│Shard 0 │ │Shard 1 │...│Shard15│
├────────┤ ├────────┤ ├────────┤
│Lock    │ │Lock    │ │Lock    │
│HashMap │ │HashMap │ │HashMap │
└────────┘ └────────┘ └────────┘

Implementing Get Operation

// Full implementation with documentation
type ShardedMap struct {
    shards []*mapShard
}

type mapShard struct {
    mu   sync.RWMutex
    data map[string]interface{}
}

func NewShardedMap(shardCount int) *ShardedMap {
    shards := make([]*mapShard, shardCount)
    for i := 0; i < shardCount; i++ {
        shards[i] = &mapShard{
            data: make(map[string]interface{}),
        }
    }
    return &ShardedMap{shards: shards}
}

// Get retrieves a value by key
// Returns (value, exists)
// Time complexity: O(1) average case
// Thread-safe: Multiple readers allowed simultaneously
func (m *ShardedMap) Get(key string) (interface{}, bool) {
    // Step 1: Determine which shard owns this key
    shard := m.getShard(key)
    
    // Step 2: Acquire read lock (allows multiple concurrent readers)
    shard.mu.RLock()
    defer shard.mu.RUnlock()
    
    // Step 3: Look up value in shard's map
    val, exists := shard.data[key]
    return val, exists
}

// getShard returns the shard responsible for a given key
// Uses hash function to distribute keys uniformly
func (m *ShardedMap) getShard(key string) *mapShard {
    // Hash the key to get a number
    h := fnv.New64a()
    h.Write([]byte(key))
    
    // Map hash to shard index using modulo
    shardIndex := int(h.Sum64() % uint64(len(m.shards)))
    return m.shards[shardIndex]
}

Implementing Put Operation

// Put stores a value at the given key
// If key exists, overwrites the value
// Time complexity: O(1) average case
// Thread-safe: Exclusive access to shard during write
func (m *ShardedMap) Put(key string, value interface{}) error {
    // Input validation
    if key == "" {
        return errors.New("key cannot be empty")
    }
    if value == nil {
        return errors.New("value cannot be nil")
    }
    
    // Step 1: Get the appropriate shard
    shard := m.getShard(key)
    
    // Step 2: Acquire write lock (exclusive access)
    shard.mu.Lock()
    defer shard.mu.Unlock()
    
    // Step 3: Store in map
    shard.data[key] = value
    
    return nil
}

// Thread safety notes:
// - Lock is held for entire operation
// - Other goroutines cannot access this shard during Put
// - Readers on OTHER shards still proceed normally (parallelism)

Implementing Delete Operation

// Delete removes a key and its value
// Returns true if key existed, false otherwise
// Time complexity: O(1) average case
// Thread-safe: Exclusive access to shard during deletion
func (m *ShardedMap) Delete(key string) bool {
    shard := m.getShard(key)
    
    shard.mu.Lock()
    defer shard.mu.Unlock()
    
    _, existed := shard.data[key]
    delete(shard.data, key)
    return existed
}

Handling Key Collisions

Hash Collision: Two different keys hash to the same value

Example: Hash function: hash(key) = len(key) % 16
  hash("a") = 1
  hash("bcd") = 3
  hash("e") = 1  <- COLLISION with "a"!

How Go’s Map Handles Collisions:

1. Bucket array has 16 buckets (for example)
1. Each bucket stores up to 8 entries inline
1. Hash collision → same bucket
1. If bucket full, create overflow bucket (linked list)

Our Sharded Map:

Primary collision handling: Shard hash (% number of shards)
Secondary collision handling: Go’s map internal chaining

Value Serialization Strategies

Our in-memory store stores interface{} which can be any Go type. But what about networking and persistence?

Option 1: JSON Serialization

type JSONCodec struct{}

func (c JSONCodec) Encode(v interface{}) ([]byte, error) {
    return json.Marshal(v)
}

func (c JSONCodec) Decode(data []byte, v interface{}) error {
    return json.Unmarshal(data, v)
}

// Usage:
codec := JSONCodec{}
data, _ := codec.Encode("hello")  // => []byte(`"hello"`)
var s string
codec.Decode(data, &s)  // => s = "hello"

Pros:

Human-readable
Widely supported

Cons:

Verbose
Slow
Type information lost

Option 2: Protocol Buffers

protobuf
syntax = "proto3";

message Value {
    oneof data {
        string string_value = 1;
        int64 int_value = 2;
        bytes bytes_value = 3;
    }
}

Pros:

Compact
Fast
Schema versioning

Cons:

Requires code generation
Steeper learning curve

Option 3: Custom Binary Encoding

// Simple binary format for our use case
type BinaryCodec struct{}

// Format: [type_byte][length_varint][data...]
func (c BinaryCodec) Encode(v interface{}) ([]byte, error) {
    buf := bytes.NewBuffer(nil)
    
    switch val := v.(type) {
    case string:
        buf.WriteByte(1)  // type: string
        binary.Write(buf, binary.BigEndian, int32(len(val)))
        buf.WriteString(val)
    case int64:
        buf.WriteByte(2)  // type: int64
        binary.Write(buf, binary.BigEndian, val)
    case []byte:
        buf.WriteByte(3)  // type: bytes
        binary.Write(buf, binary.BigEndian, int32(len(val)))
        buf.Write(val)
    default:
        return nil, errors.New("unsupported type")
    }
    
    return buf.Bytes(), nil
}

Pros:

Minimal overhead
Fast
Control over format

Cons:

Manual implementation
Limited type support

Decision for our course: Start with storing byte slices ([]byte)

This simplifies serialization and forces explicit encoding decisions.

Lesson 2.3: Testing Strategy

Unit Testing Fundamentals

Test Structure in Go - Test functions must start with “Test”

// Filename: store_test.go (same directory as store.go)
// Test functions must start with "Test"

func TestStoreGet(t *testing.T) {
    // Arrange: Set up test fixtures
    store := NewShardedStore(4)
    store.Put(context.Background(), []byte("key1"), []byte("value1"))
    
    // Act: Execute the code under test
    result, err := store.Get(context.Background(), []byte("key1"))
    
    // Assert: Verify the result
    if err != nil {
        t.Fatalf("unexpected error: %v", err)
    }
    
    if string(result) != "value1" {
        t.Errorf("got %q, want %q", string(result), "value1")
    }
}

Subtests for Organization

func TestStore(t *testing.T) {
    store := NewShardedStore(4)
    
    t.Run("Put and Get", func(t *testing.T) {
        store.Put(context.Background(), []byte("k"), []byte("v"))
        v, _ := store.Get(context.Background(), []byte("k"))
        if string(v) != "v" {
            t.Errorf("got %q, want %q", string(v), "v")
        }
    })
    
    t.Run("Get missing key", func(t *testing.T) {
        _, err := store.Get(context.Background(), []byte("missing"))
        if err != ErrKeyNotFound {
            t.Errorf("got %v, want ErrKeyNotFound", err)
        }
    })
    
    t.Run("Delete", func(t *testing.T) {
        store.Put(context.Background(), []byte("k2"), []byte("v2"))
        store.Delete(context.Background(), []byte("k2"))
        _, err := store.Get(context.Background(), []byte("k2"))
        if err != ErrKeyNotFound {
            t.Errorf("expected key to be deleted")
        }
    })
}

Table-Driven Tests

Table-driven tests reduce duplication and make it easy to add more test cases.

func TestStoreOperations(t *testing.T) {
    tests := []struct {
        name      string
        key       []byte
        value     []byte
        setupFn   func(*ShardedStore)
        wantValue []byte
        wantErr   bool
    }{
        {
            name:      "put and get string",
            key:       []byte("user:1"),
            value:     []byte("alice"),
            wantValue: []byte("alice"),
        },
        {
            name:      "put and get binary",
            key:       []byte("binary:1"),
            value:     []byte{0x00, 0x01, 0x02},
            wantValue: []byte{0x00, 0x01, 0x02},
        },
        {
            name:    "get nonexistent key",
            key:     []byte("missing"),
            wantErr: true,
        },
        {
            name:  "overwrite value",
            key:   []byte("counter"),
            value: []byte("2"),
            setupFn: func(s *ShardedStore) {
                s.Put(context.Background(), []byte("counter"), []byte("1"))
            },
            wantValue: []byte("2"),
        },
    }
    
    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            store := NewShardedStore(4)
            if tt.setupFn != nil {
                tt.setupFn(store)
            }
            
            store.Put(context.Background(), tt.key, tt.value)
            got, err := store.Get(context.Background(), tt.key)
            
            if (err != nil) != tt.wantErr {
                t.Errorf("error = %v, wantErr %v", err, tt.wantErr)
            }
            
            if !tt.wantErr && !bytes.Equal(got, tt.wantValue) {
                t.Errorf("got %v, want %v", got, tt.wantValue)
            }
        })
    }
}

Benchmarking Basics

Benchmarks measure performance characteristics.

func BenchmarkGet(b *testing.B) {
    store := NewShardedStore(16)
    store.Put(context.Background(), []byte("key"), []byte("value"))
    
    b.ResetTimer()  // Don't count setup time
    
    for i := 0; i < b.N; i++ {
        store.Get(context.Background(), []byte("key"))
    }
}

// Run: go test -bench=. -benchmem
// Output: BenchmarkGet-8   5000000   256 ns/op   64 B/alloc
//                                     ^timing     ^memory

Interpreting results:

5000000 iterations
256 ns/op - nanoseconds per operation
64 B/alloc - bytes allocated per operation

Property-Based Testing Introduction

Property-based testing generates random inputs and verifies invariants.

func TestStoreProperties(t *testing.T) {
    // Property: After Put, Get returns same value
    for trial := 0; trial < 100; trial++ {
        store := NewShardedStore(4)
        
        key := []byte(randomString(10))
        value := []byte(randomString(100))
        
        store.Put(context.Background(), key, value)
        got, _ := store.Get(context.Background(), key)
        
        if !bytes.Equal(got, value) {
            t.Errorf("trial %d: put-get invariant failed", trial)
        }
    }
}

func TestConcurrentSafety(t *testing.T) {
    // Property: Concurrent operations don't cause data corruption
    store := NewShardedStore(16)
    done := make(chan bool)
    
    // Writers
    for w := 0; w < 10; w++ {
        go func(writerID int) {
            for i := 0; i < 1000; i++ {
                key := fmt.Sprintf("key:%d:%d", writerID, i)
                store.Put(context.Background(), []byte(key), []byte("value"))
            }
            done <- true
        }(w)
    }
    
    // Readers
    for r := 0; r < 10; r++ {
        go func() {
            for i := 0; i < 1000; i++ {
                for w := 0; w < 10; w++ {
                    key := fmt.Sprintf("key:%d:%d", w, i)
                    store.Get(context.Background(), []byte(key))
                }
            }
            done <- true
        }()
    }
    
    // Wait for all goroutines
    for i := 0; i < 20; i++ {
        <-done
    }
    
    // If we get here without panic or corruption, test passes
    t.Log("concurrent safety property verified")
}

Hands-on Lab: Implement thread-unsafe in-memory store

Build a basic key-value store without concurrency concerns.

Lab Tasks:

Implement Get method with proper validation
Implement Put method with value copying
Implement Delete method
Write comprehensive tests (basic operations, edge cases)
Add benchmarks for Get, Put, Delete operations
Test with binary data and various key types
Achieve >90% code coverage

Summary & Learning Outcomes

What You’ve Learned:

Data structure trade-offs and when to use each
Go’s native map implementation details
Thread safety considerations and solutions
Value serialization strategies
Comprehensive testing techniques
Benchmarking and performance measurement

Key Takeaways

Choose the right data structure for your use case
Always consider thread safety in concurrent environments
Test-driven development leads to better code quality
Benchmark early and often to catch performance issues
Property-based testing helps find edge cases

Core data structures

Course Navigation

Learning Objectives

Lesson 2.1: HashMap Implementation

Choosing the Right Data Structure

Option 1: Go’s Native Map (map[string]interface{})

Option 2: Custom HashMap with Sharding

Option 3: Skip List

Option 4: B-Tree or B+ Tree

Decision: Sharded HashMap for MVP

Go’s Native Map vs Custom Implementation

Go’s Map Under the Hood

Key Points:

Why We Can’t Just Use map[string]interface{}

Problem: Not thread-safe

Thread Safety Considerations

What Makes Maps Unsafe?

Solution: Synchronization

Option 1: Coarse-grained lock (simple, slower)

Option 2: Fine-grained locking (complex, faster)

Option 3: sync.Map (Go 1.9+, specific use case)

When to use each:

Lesson 2.2: Basic Operations (Get, Put, Delete)

Architecture of Our Sharded HashMap

Implementing Get Operation

Implementing Put Operation

Implementing Delete Operation

Handling Key Collisions

Hash Collision: Two different keys hash to the same value

Value Serialization Strategies

Option 1: JSON Serialization

Option 2: Protocol Buffers

Option 3: Custom Binary Encoding

Decision for our course: Start with storing byte slices ([]byte)

Lesson 2.3: Testing Strategy

Unit Testing Fundamentals

Subtests for Organization

Table-Driven Tests

Benchmarking Basics

Interpreting results:

Property-Based Testing Introduction

Hands-on Lab: Implement thread-unsafe in-memory store

Summary & Learning Outcomes

What You’ve Learned:

Key Takeaways

Option 1: Go’s Native Map (`map[string]interface{}`)