Course Navigation
← Back to Course OverviewAll Lessons
Introduction and Database Fundamentals
Learning Objectives
- • Understand what key-value databases are and their characteristics
- • Learn when to use key-value stores vs other database types
- • Analyze functional and non-functional requirements
- • Design a layered architecture for our database
- • Plan the project structure and milestones
Lesson 1.1: Understanding Key-Value Stores
What is a Key-Value Database?
A key-value database is the simplest form of NoSQL database that stores data as a collection of key-value pairs. Think of it like a massive, persistent hash map or dictionary that survives system restarts.
Core Concepts:
- Key: A unique identifier (string, integer, or binary data)
- Value: The data associated with the key (can be any data type)
- Operations: Primarily GET, PUT, DELETE
- Simplicity: No schema, no SQL, no complex queries
Real-world example:
Key: "user:1001"          Value: {"name": "John", "age": 30}
Key: "session:abc123"     Value: "active"
Key: "counter:pageviews"  Value: 15420Characteristics of Key-Value Stores
Strengths:
- • High Performance: O(1) average lookup time
- • Horizontal Scalability: Easy to partition data across nodes
- • Flexibility: Schema-less design allows storing any data
- • Low Latency: Optimized for fast reads and writes
- • Simplicity: Easy to understand and implement
Limitations:
- • No Complex Queries: Can't search by value or join data
- • No Relationships: No foreign keys or references
- • Limited Indexing: Only the key is indexed
- • Range Queries: Difficult without sorted keys
- • No Transactions: Limited ACID support in basic implementations
Use Cases and Trade-offs
Ideal Use Cases:
Session Storage
Web application sessions with session ID as key
Why: Fast lookup, short-lived data, millions of concurrent sessions
Caching Layer
Cache frequently accessed data
Why: Sub-millisecond access, reduces database load
Real-time Analytics
Counters, metrics, leaderboards
Why: Atomic increment operations, extremely fast
When NOT to use Key-Value Stores:
- • Complex reporting and analytics
- • Data with many relationships
- • Full-text search requirements
- • Complex transactions spanning multiple keys
- • Data with intricate query patterns
Comparison with Other Database Types
| Feature | Key-Value | Relational | Document | Graph | 
|---|---|---|---|---|
| Query Complexity | Simple (by key) | Complex (SQL) | Moderate (JSON) | Graph queries | 
| Performance | Fastest | Slower | Fast | Medium | 
| Schema Flexibility | None | Rigid | Flexible | Flexible | 
| Scalability | Excellent | Difficult | Good | Medium | 
Real-world Examples
Redis
- • In-memory key-value store with optional persistence
- • Supports advanced data types (lists, sets, sorted sets, streams)
- • Widely used for caching, sessions, real-time analytics
- • Performance: 100,000+ ops/sec per instance
etcd
- • Distributed key-value store based on Raft consensus
- • Strong consistency guarantees
- • Used for configuration management, service discovery
- • Built-in watch mechanism for key changes
Memcached
- • Simple in-memory cache
- • Very high performance for basic key-value operations
- • No persistence, data lost on restart
- • Used for caching in large-scale systems
LevelDB
- • Embedded key-value store (part of a larger application)
- • Sorted keys, efficient range queries
- • Durability through write-ahead logging
- • Written in C++, bindings available for many languages
Lesson 1.2: Core Requirements Analysis
Defining Functional Requirements
Functional requirements describe what the system does. For our key-value database:
Basic CRUD Operations
Create/Put Operation:
- INPUT: key (string), value (any data)
- OUTPUT: success/error
- SIDE EFFECT: stores or updates value at key
Read/Get Operation:
- INPUT: key (string)
- OUTPUT: value (if exists), error (if not found)
- SIDE EFFECT: none (read-only)
Delete Operation:
- INPUT: key (string)
- OUTPUT: success/error
- SIDE EFFECT: removes key-value pair
Extended Operations
Scan/Range Query:
Iterate over keys in sorted order, return all key-value pairs in a range
Batch Operations:
Apply multiple operations atomically - all succeed or all fail
Exists/Contains:
Check if a key exists without retrieving value
Defining Non-Functional Requirements
Non-functional requirements describe how well the system performs. These are critical for production systems.
Performance Requirements
Throughput:
- • Single-threaded: 100,000+ ops/sec for in-memory operations
- • Multi-threaded: 1,000,000+ ops/sec with 16 cores
- • Network operations: 50,000+ ops/sec over TCP
Latency (p99 - 99th percentile):
- • In-memory reads: < 1 millisecond
- • In-memory writes: < 5 milliseconds
- • Disk reads (cached): < 10 milliseconds
- • Disk writes (persisted): < 100 milliseconds
Durability Requirements
Write-Ahead Logging (WAL):
- • Every write must be logged to disk before returning success
- • Survives system crashes
- • No data loss on power failure
Durability Levels:
- • No durability: Fastest, data lost on crash (for caching only)
- • Write-ahead log: Moderate speed, no data loss
- • Synchronous writes: Slowest, highest durability guarantee
Consistency vs Availability Trade-off (CAP Theorem)
The CAP theorem states a distributed system can guarantee at most two of three properties:
Consistency
All nodes see the same data
Availability
System responds to requests
Partition Tolerance
System works despite network splits
Lesson 1.3: Architecture Planning
High-level System Design
Our key-value database will have layered architecture:
┌─────────────────────────────────────────────┐
│         Client Applications                  │
└────────────────┬────────────────────────────┘
                 │
┌────────────────▼────────────────────────────┐
│    Network Layer (TCP/Protocol Handling)     │
│  - Connection management                    │
│  - Request/Response serialization           │
│  - Pipelining support                       │
└────────────────┬────────────────────────────┘
                 │
┌────────────────▼────────────────────────────┐
│    Query Processor & Executor                │
│  - Parse commands                           │
│  - Execute operations                       │
│  - Enforce consistency                      │
└────────────────┬────────────────────────────┘
                 │
┌────────────────▼────────────────────────────┐
│    Transaction & Concurrency Manager        │
│  - MVCC implementation                      │
│  - Locking mechanism                        │
│  - Transaction coordination                 │
└────────────────┬────────────────────────────┘
                 │
┌────────────────▼────────────────────────────┐
│    Storage Engine (Memory + Disk)            │
│  ┌──────────────┬──────────────────────────┐│
│  │  In-Memory   │  Persistence Layer       ││
│  │  - HashMap   │  - WAL (Write-Ahead Log) ││
│  │  - LSM Tree  │  - SSTables              ││
│  │  - Cache     │  - Compaction            ││
│  └──────────────┴──────────────────────────┘│
└────────────────┬────────────────────────────┘
                 │
┌────────────────▼────────────────────────────┐
│    Replication & Consensus (optional)       │
│  - Leader-follower sync                     │
│  - Raft consensus                           │
│  - Conflict resolution                      │
└─────────────────────────────────────────────┘Component Identification
1. Storage Engine (Core)
Responsibility: Physically store and retrieve data
- • Memtable: In-memory sorted map for recent writes
- • SSTable: On-disk sorted key-value files
- • LSM Tree: Multi-level structure for write optimization
- • Cache: Hot data in memory for faster access
- • Bloom Filters: Probabilistic lookup structures
2. Persistence Layer
Responsibility: Ensure durability and crash recovery
- • Write-Ahead Log (WAL): Record all writes before applying
- • Checkpoint Mechanism: Periodic snapshots of state
- • Recovery Manager: Replay logs on startup
- • Compaction: Merge files and reclaim space
3. Concurrency Manager
Responsibility: Handle multiple concurrent operations safely
- • Lock Manager: Distribute locks, prevent deadlocks
- • MVCC Engine: Multiple versions for non-blocking reads
- • Transaction Coordinator: Manage transaction lifecycle
- • Conflict Detector: Identify concurrent modifications
Interface Design and API Contracts
Core Database Interface
// Store defines the key-value store interface
type Store interface {
    // Single key operations
    Get(ctx context.Context, key []byte) ([]byte, error)
    Put(ctx context.Context, key, value []byte) error
    Delete(ctx context.Context, key []byte) error
    
    // Batch operations
    Batch(ctx context.Context, ops []Operation) error
    
    // Iteration
    Scan(ctx context.Context, start, end []byte) (Iterator, error)
    
    // Lifecycle
    Close() error
    
    // Durability
    Flush() error
    Snapshot() (Snapshot, error)
}
// Iterator for range queries
type Iterator interface {
    Next() bool
    Key() []byte
    Value() []byte
    Err() error
    Close() error
}Project Structure and Organization
kvdb/
├── cmd/
│   ├── server/           # Main server executable
│   │   └── main.go
│   └── cli/              # Command-line client
│       └── main.go
├── pkg/
│   ├── storage/          # Storage engine
│   │   ├── store.go
│   │   ├── memtable.go
│   │   ├── sstable.go
│   │   └── cache.go
│   ├── persistence/      # WAL & recovery
│   │   ├── wal.go
│   │   └── recovery.go
│   ├── network/          # Network layer
│   │   ├── server.go
│   │   ├── client.go
│   │   └── protocol.go
│   ├── concurrency/      # Concurrency control
│   │   ├── lock.go
│   │   ├── mvcc.go
│   │   └── transaction.go
│   ├── monitor/          # Observability
│   │   ├── metrics.go
│   │   └── logger.go
│   └── config/           # Configuration
│       └── config.go
├── internal/
│   └── testutil/         # Test utilities
├── tests/
│   ├── integration_test.go
│   └── benchmark_test.go
├── docs/
│   ├── architecture.md
│   ├── api.md
│   └── operational_guide.md
├── go.mod
├── go.sum
├── Makefile
└── README.mdSummary & Learning Outcomes
By completing this lesson, you should understand:
- • What key-value databases are and why they're different from relational databases
- • Key characteristics - strengths and limitations
- • Real-world use cases and when NOT to use them
- • Functional requirements - CRUD operations and extended features
- • Non-functional requirements - performance, durability, availability targets
- • Architecture overview - layered design with clear components
- • Interface contracts - how components communicate
- • Project structure - organized codebase layout
Key Takeaways
- • Key-value stores excel at simple, fast access by exact key match
- • Non-functional requirements (performance, durability) are as important as functional requirements
- • Clean architecture with separation of concerns enables testing and evolution
- • Trade-offs are inevitable - document decisions and their rationale
- • Start simple, add features incrementally
Next Steps
Next Week: We'll implement the core in-memory data structure and begin building our storage engine.
Preparation for Lab:
- • Review Go basics: maps, slices, goroutines, interfaces
- • Familiarize yourself with Go's testing framework
- • Understand benchmarking concepts
Hands-on Lab: Set up project repository and design initial API interface
Create the basic project structure and define the core interfaces for our key-value database.
Lab Tasks:
- 1. Create the project directory structure as outlined above
- 2. Initialize Go module with proper dependencies
- 3. Define the core Store interface in pkg/storage/store.go
- 4. Create basic unit tests for the interface
- 5. Set up Makefile with common commands (build, test, benchmark)