Learning Objectives

Understand what key-value databases are and their characteristics
Learn when to use key-value stores vs other database types
Analyze functional and non-functional requirements
Design a layered architecture for our database
Plan the project structure and milestones

Lesson 1.1: Understanding Key-Value Stores

What is a Key-Value Database?

A key-value database is the simplest form of NoSQL database that stores data as a collection of key-value pairs. Think of it like a massive, persistent hash map or dictionary that survives system restarts.

Core Concepts:

Key: A unique identifier (string, integer, or binary data)
Value: The data associated with the key (can be any data type)
Operations: Primarily GET, PUT, DELETE
Simplicity: No schema, no SQL, no complex queries

Real-world example:

Key: "user:1001"          Value: {"name": "John", "age": 30}
Key: "session:abc123"     Value: "active"
Key: "counter:pageviews"  Value: 15420

Characteristics of Key-Value Stores

Strengths:

High Performance: O(1) average lookup time
Horizontal Scalability: Easy to partition data across nodes
Flexibility: Schema-less design allows storing any data
Low Latency: Optimized for fast reads and writes
Simplicity: Easy to understand and implement

Limitations:

No Complex Queries: Can’t search by value or join data
No Relationships: No foreign keys or references
Limited Indexing: Only the key is indexed
Range Queries: Difficult without sorted keys
No Transactions: Limited ACID support in basic implementations

Use Cases and Trade-offs

Ideal Use Cases:

Session Storage Web application sessions with session ID as key

Why: Fast lookup, short-lived data, millions of concurrent sessions

Caching Layer Cache frequently accessed data

Why: Sub-millisecond access, reduces database load

Real-time Analytics Counters, metrics, leaderboards

Why: Atomic increment operations, extremely fast

When NOT to use Key-Value Stores:

Complex reporting and analytics
Data with many relationships
Full-text search requirements
Complex transactions spanning multiple keys
Data with intricate query patterns

Comparison with Other Database Types

Feature Key-Value Relational Document Graph

Query Complexity Simple (by key) Complex (SQL) Moderate (JSON) Graph queries

Performance Fastest Slower Fast Medium

Schema Flexibility None Rigid Flexible Flexible

Scalability Excellent Difficult Good Medium

Real-world Examples

Redis

In-memory key-value store with optional persistence
Supports advanced data types (lists, sets, sorted sets, streams)
Widely used for caching, sessions, real-time analytics
Performance: 100,000+ ops/sec per instance

etcd

Distributed key-value store based on Raft consensus
Strong consistency guarantees
Used for configuration management, service discovery
Built-in watch mechanism for key changes

Memcached

Simple in-memory cache
Very high performance for basic key-value operations
No persistence, data lost on restart
Used for caching in large-scale systems

LevelDB

Embedded key-value store (part of a larger application)
Sorted keys, efficient range queries
Durability through write-ahead logging
Written in C++, bindings available for many languages

Lesson 1.2: Core Requirements Analysis

Defining Functional Requirements

Functional requirements describe what the system does. For our key-value database:

Basic CRUD Operations

Create/Put Operation:

INPUT: key (string), value (any data)
OUTPUT: success/error
SIDE EFFECT: stores or updates value at key

Read/Get Operation:

INPUT: key (string)
OUTPUT: value (if exists), error (if not found)
SIDE EFFECT: none (read-only)

Delete Operation:

INPUT: key (string)
OUTPUT: success/error
SIDE EFFECT: removes key-value pair

Extended Operations

Scan/Range Query: Iterate over keys in sorted order, return all key-value pairs in a range

Batch Operations: Apply multiple operations atomically - all succeed or all fail

Exists/Contains: Check if a key exists without retrieving value

Defining Non-Functional Requirements

Non-functional requirements describe how well the system performs. These are critical for production systems.

Performance Requirements

Throughput:

Single-threaded: 100,000+ ops/sec for in-memory operations
Multi-threaded: 1,000,000+ ops/sec with 16 cores
Network operations: 50,000+ ops/sec over TCP

Latency (p99 - 99th percentile):

In-memory reads: < 1 millisecond
In-memory writes: < 5 milliseconds
Disk reads (cached): < 10 milliseconds
Disk writes (persisted): < 100 milliseconds

Durability Requirements

Write-Ahead Logging (WAL):

Every write must be logged to disk before returning success
Survives system crashes
No data loss on power failure

Durability Levels:

No durability: Fastest, data lost on crash (for caching only)
Write-ahead log: Moderate speed, no data loss
Synchronous writes: Slowest, highest durability guarantee

Consistency vs Availability Trade-off (CAP Theorem)

The CAP theorem states a distributed system can guarantee at most two of three properties:

Consistency All nodes see the same data

Availability System responds to requests

Partition Tolerance System works despite network splits

Lesson 1.3: Architecture Planning

High-level System Design

Our key-value database will have layered architecture:

┌─────────────────────────────────────────────┐
│         Client Applications                  │
└────────────────┬────────────────────────────┘
                 │
┌────────────────▼────────────────────────────┐
│    Network Layer (TCP/Protocol Handling)     │
│  - Connection management                    │
│  - Request/Response serialization           │
│  - Pipelining support                       │
└────────────────┬────────────────────────────┘
                 │
┌────────────────▼────────────────────────────┐
│    Query Processor & Executor                │
│  - Parse commands                           │
│  - Execute operations                       │
│  - Enforce consistency                      │
└────────────────┬────────────────────────────┘
                 │
┌────────────────▼────────────────────────────┐
│    Transaction & Concurrency Manager        │
│  - MVCC implementation                      │
│  - Locking mechanism                        │
│  - Transaction coordination                 │
└────────────────┬────────────────────────────┘
                 │
┌────────────────▼────────────────────────────┐
│    Storage Engine (Memory + Disk)            │
│  ┌──────────────┬──────────────────────────┐│
│  │  In-Memory   │  Persistence Layer       ││
│  │  - HashMap   │  - WAL (Write-Ahead Log) ││
│  │  - LSM Tree  │  - SSTables              ││
│  │  - Cache     │  - Compaction            ││
│  └──────────────┴──────────────────────────┘│
└────────────────┬────────────────────────────┘
                 │
┌────────────────▼────────────────────────────┐
│    Replication & Consensus (optional)       │
│  - Leader-follower sync                     │
│  - Raft consensus                           │
│  - Conflict resolution                      │
└─────────────────────────────────────────────┘

Component Identification

1. Storage Engine (Core)

Responsibility: Physically store and retrieve data

Memtable: In-memory sorted map for recent writes
SSTable: On-disk sorted key-value files
LSM Tree: Multi-level structure for write optimization
Cache: Hot data in memory for faster access
Bloom Filters: Probabilistic lookup structures

2. Persistence Layer

Responsibility: Ensure durability and crash recovery

Write-Ahead Log (WAL): Record all writes before applying
Checkpoint Mechanism: Periodic snapshots of state
Recovery Manager: Replay logs on startup
Compaction: Merge files and reclaim space

3. Concurrency Manager

Responsibility: Handle multiple concurrent operations safely

Lock Manager: Distribute locks, prevent deadlocks
MVCC Engine: Multiple versions for non-blocking reads
Transaction Coordinator: Manage transaction lifecycle
Conflict Detector: Identify concurrent modifications

Interface Design and API Contracts

Core Database Interface

// Store defines the key-value store interface
type Store interface {
    // Single key operations
    Get(ctx context.Context, key []byte) ([]byte, error)
    Put(ctx context.Context, key, value []byte) error
    Delete(ctx context.Context, key []byte) error
    
    // Batch operations
    Batch(ctx context.Context, ops []Operation) error
    
    // Iteration
    Scan(ctx context.Context, start, end []byte) (Iterator, error)
    
    // Lifecycle
    Close() error
    
    // Durability
    Flush() error
    Snapshot() (Snapshot, error)
}

// Iterator for range queries
type Iterator interface {
    Next() bool
    Key() []byte
    Value() []byte
    Err() error
    Close() error
}

Project Structure and Organization

kvdb/
├── cmd/
│   ├── server/           # Main server executable
│   │   └── main.go
│   └── cli/              # Command-line client
│       └── main.go
├── pkg/
│   ├── storage/          # Storage engine
│   │   ├── store.go
│   │   ├── memtable.go
│   │   ├── sstable.go
│   │   └── cache.go
│   ├── persistence/      # WAL & recovery
│   │   ├── wal.go
│   │   └── recovery.go
│   ├── network/          # Network layer
│   │   ├── server.go
│   │   ├── client.go
│   │   └── protocol.go
│   ├── concurrency/      # Concurrency control
│   │   ├── lock.go
│   │   ├── mvcc.go
│   │   └── transaction.go
│   ├── monitor/          # Observability
│   │   ├── metrics.go
│   │   └── logger.go
│   └── config/           # Configuration
│       └── config.go
├── internal/
│   └── testutil/         # Test utilities
├── tests/
│   ├── integration_test.go
│   └── benchmark_test.go
├── docs/
│   ├── architecture.md
│   ├── api.md
│   └── operational_guide.md
├── go.mod
├── go.sum
├── Makefile
└── README.md

Summary & Learning Outcomes

By completing this lesson, you should understand:

What key-value databases are and why they’re different from relational databases
Key characteristics - strengths and limitations
Real-world use cases and when NOT to use them
Functional requirements - CRUD operations and extended features
Non-functional requirements - performance, durability, availability targets
Architecture overview - layered design with clear components
Interface contracts - how components communicate
Project structure - organized codebase layout

Key Takeaways

Key-value stores excel at simple, fast access by exact key match
Non-functional requirements (performance, durability) are as important as functional requirements
Clean architecture with separation of concerns enables testing and evolution
Trade-offs are inevitable - document decisions and their rationale
Start simple, add features incrementally

Next Steps

Next Week: We’ll implement the core in-memory data structure and begin building our storage engine.

Preparation for Lab:

Review Go basics: maps, slices, goroutines, interfaces
Familiarize yourself with Go’s testing framework
Understand benchmarking concepts

Hands-on Lab: Set up project repository and design initial API interface

Create the basic project structure and define the core interfaces for our key-value database.

Lab Tasks:

Create the project directory structure as outlined above
Initialize Go module with proper dependencies
Define the core Store interface in pkg/storage/store.go
Create basic unit tests for the interface
Set up Makefile with common commands (build, test, benchmark)

Introduction & database fundamentals

Course Navigation