Lesson 1
Introduction & database fundamentals
Architecture, trade-offs, and planning your engine.
Learning Objectives
-
Understand what key-value databases are and their characteristics
-
Learn when to use key-value stores vs other database types
-
Analyze functional and non-functional requirements
-
Design a layered architecture for our database
-
Plan the project structure and milestones
Lesson 1.1: Understanding Key-Value Stores
What is a Key-Value Database?
A key-value database is the simplest form of NoSQL database that stores data as a collection of key-value pairs. Think of it like a massive, persistent hash map or dictionary that survives system restarts.
Core Concepts:
-
Key: A unique identifier (string, integer, or binary data)
-
Value: The data associated with the key (can be any data type)
-
Operations: Primarily GET, PUT, DELETE
-
Simplicity: No schema, no SQL, no complex queries
Real-world example:
Key: "user:1001" Value: {"name": "John", "age": 30}
Key: "session:abc123" Value: "active"
Key: "counter:pageviews" Value: 15420
Characteristics of Key-Value Stores
Strengths:
-
High Performance: O(1) average lookup time
-
Horizontal Scalability: Easy to partition data across nodes
-
Flexibility: Schema-less design allows storing any data
-
Low Latency: Optimized for fast reads and writes
-
Simplicity: Easy to understand and implement
Limitations:
-
No Complex Queries: Can’t search by value or join data
-
No Relationships: No foreign keys or references
-
Limited Indexing: Only the key is indexed
-
Range Queries: Difficult without sorted keys
-
No Transactions: Limited ACID support in basic implementations
Use Cases and Trade-offs
Ideal Use Cases:
Session Storage Web application sessions with session ID as key
Why: Fast lookup, short-lived data, millions of concurrent sessions
Caching Layer Cache frequently accessed data
Why: Sub-millisecond access, reduces database load
Real-time Analytics Counters, metrics, leaderboards
Why: Atomic increment operations, extremely fast
When NOT to use Key-Value Stores:
-
Complex reporting and analytics
-
Data with many relationships
-
Full-text search requirements
-
Complex transactions spanning multiple keys
-
Data with intricate query patterns
Comparison with Other Database Types
Feature Key-Value Relational Document Graph
Query Complexity Simple (by key) Complex (SQL) Moderate (JSON) Graph queries
Performance Fastest Slower Fast Medium
Schema Flexibility None Rigid Flexible Flexible
Scalability Excellent Difficult Good Medium
Real-world Examples
Redis
-
In-memory key-value store with optional persistence
-
Supports advanced data types (lists, sets, sorted sets, streams)
-
Widely used for caching, sessions, real-time analytics
-
Performance: 100,000+ ops/sec per instance
etcd
-
Distributed key-value store based on Raft consensus
-
Strong consistency guarantees
-
Used for configuration management, service discovery
-
Built-in watch mechanism for key changes
Memcached
-
Simple in-memory cache
-
Very high performance for basic key-value operations
-
No persistence, data lost on restart
-
Used for caching in large-scale systems
LevelDB
-
Embedded key-value store (part of a larger application)
-
Sorted keys, efficient range queries
-
Durability through write-ahead logging
-
Written in C++, bindings available for many languages
Lesson 1.2: Core Requirements Analysis
Defining Functional Requirements
Functional requirements describe what the system does. For our key-value database:
Basic CRUD Operations
Create/Put Operation:
-
INPUT: key (string), value (any data)
-
OUTPUT: success/error
-
SIDE EFFECT: stores or updates value at key
Read/Get Operation:
-
INPUT: key (string)
-
OUTPUT: value (if exists), error (if not found)
-
SIDE EFFECT: none (read-only)
Delete Operation:
-
INPUT: key (string)
-
OUTPUT: success/error
-
SIDE EFFECT: removes key-value pair
Extended Operations
Scan/Range Query: Iterate over keys in sorted order, return all key-value pairs in a range
Batch Operations: Apply multiple operations atomically - all succeed or all fail
Exists/Contains: Check if a key exists without retrieving value
Defining Non-Functional Requirements
Non-functional requirements describe how well the system performs. These are critical for production systems.
Performance Requirements
Throughput:
-
Single-threaded: 100,000+ ops/sec for in-memory operations
-
Multi-threaded: 1,000,000+ ops/sec with 16 cores
-
Network operations: 50,000+ ops/sec over TCP
Latency (p99 - 99th percentile):
-
In-memory reads: < 1 millisecond
-
In-memory writes: < 5 milliseconds
-
Disk reads (cached): < 10 milliseconds
-
Disk writes (persisted): < 100 milliseconds
Durability Requirements
Write-Ahead Logging (WAL):
-
Every write must be logged to disk before returning success
-
Survives system crashes
-
No data loss on power failure
Durability Levels:
-
No durability: Fastest, data lost on crash (for caching only)
-
Write-ahead log: Moderate speed, no data loss
-
Synchronous writes: Slowest, highest durability guarantee
Consistency vs Availability Trade-off (CAP Theorem)
The CAP theorem states a distributed system can guarantee at most two of three properties:
Consistency All nodes see the same data
Availability System responds to requests
Partition Tolerance System works despite network splits
Lesson 1.3: Architecture Planning
High-level System Design
Our key-value database will have layered architecture:
┌─────────────────────────────────────────────┐
│ Client Applications │
└────────────────┬────────────────────────────┘
│
┌────────────────▼────────────────────────────┐
│ Network Layer (TCP/Protocol Handling) │
│ - Connection management │
│ - Request/Response serialization │
│ - Pipelining support │
└────────────────┬────────────────────────────┘
│
┌────────────────▼────────────────────────────┐
│ Query Processor & Executor │
│ - Parse commands │
│ - Execute operations │
│ - Enforce consistency │
└────────────────┬────────────────────────────┘
│
┌────────────────▼────────────────────────────┐
│ Transaction & Concurrency Manager │
│ - MVCC implementation │
│ - Locking mechanism │
│ - Transaction coordination │
└────────────────┬────────────────────────────┘
│
┌────────────────▼────────────────────────────┐
│ Storage Engine (Memory + Disk) │
│ ┌──────────────┬──────────────────────────┐│
│ │ In-Memory │ Persistence Layer ││
│ │ - HashMap │ - WAL (Write-Ahead Log) ││
│ │ - LSM Tree │ - SSTables ││
│ │ - Cache │ - Compaction ││
│ └──────────────┴──────────────────────────┘│
└────────────────┬────────────────────────────┘
│
┌────────────────▼────────────────────────────┐
│ Replication & Consensus (optional) │
│ - Leader-follower sync │
│ - Raft consensus │
│ - Conflict resolution │
└─────────────────────────────────────────────┘
Component Identification
1. Storage Engine (Core)
Responsibility: Physically store and retrieve data
-
Memtable: In-memory sorted map for recent writes
-
SSTable: On-disk sorted key-value files
-
LSM Tree: Multi-level structure for write optimization
-
Cache: Hot data in memory for faster access
-
Bloom Filters: Probabilistic lookup structures
2. Persistence Layer
Responsibility: Ensure durability and crash recovery
-
Write-Ahead Log (WAL): Record all writes before applying
-
Checkpoint Mechanism: Periodic snapshots of state
-
Recovery Manager: Replay logs on startup
-
Compaction: Merge files and reclaim space
3. Concurrency Manager
Responsibility: Handle multiple concurrent operations safely
-
Lock Manager: Distribute locks, prevent deadlocks
-
MVCC Engine: Multiple versions for non-blocking reads
-
Transaction Coordinator: Manage transaction lifecycle
-
Conflict Detector: Identify concurrent modifications
Interface Design and API Contracts
Core Database Interface
// Store defines the key-value store interface
type Store interface {
// Single key operations
Get(ctx context.Context, key []byte) ([]byte, error)
Put(ctx context.Context, key, value []byte) error
Delete(ctx context.Context, key []byte) error
// Batch operations
Batch(ctx context.Context, ops []Operation) error
// Iteration
Scan(ctx context.Context, start, end []byte) (Iterator, error)
// Lifecycle
Close() error
// Durability
Flush() error
Snapshot() (Snapshot, error)
}
// Iterator for range queries
type Iterator interface {
Next() bool
Key() []byte
Value() []byte
Err() error
Close() error
}
Project Structure and Organization
kvdb/
├── cmd/
│ ├── server/ # Main server executable
│ │ └── main.go
│ └── cli/ # Command-line client
│ └── main.go
├── pkg/
│ ├── storage/ # Storage engine
│ │ ├── store.go
│ │ ├── memtable.go
│ │ ├── sstable.go
│ │ └── cache.go
│ ├── persistence/ # WAL & recovery
│ │ ├── wal.go
│ │ └── recovery.go
│ ├── network/ # Network layer
│ │ ├── server.go
│ │ ├── client.go
│ │ └── protocol.go
│ ├── concurrency/ # Concurrency control
│ │ ├── lock.go
│ │ ├── mvcc.go
│ │ └── transaction.go
│ ├── monitor/ # Observability
│ │ ├── metrics.go
│ │ └── logger.go
│ └── config/ # Configuration
│ └── config.go
├── internal/
│ └── testutil/ # Test utilities
├── tests/
│ ├── integration_test.go
│ └── benchmark_test.go
├── docs/
│ ├── architecture.md
│ ├── api.md
│ └── operational_guide.md
├── go.mod
├── go.sum
├── Makefile
└── README.md
Summary & Learning Outcomes
By completing this lesson, you should understand:
-
What key-value databases are and why they’re different from relational databases
-
Key characteristics - strengths and limitations
-
Real-world use cases and when NOT to use them
-
Functional requirements - CRUD operations and extended features
-
Non-functional requirements - performance, durability, availability targets
-
Architecture overview - layered design with clear components
-
Interface contracts - how components communicate
-
Project structure - organized codebase layout
Key Takeaways
-
Key-value stores excel at simple, fast access by exact key match
-
Non-functional requirements (performance, durability) are as important as functional requirements
-
Clean architecture with separation of concerns enables testing and evolution
-
Trade-offs are inevitable - document decisions and their rationale
-
Start simple, add features incrementally
Next Steps
Next Week: We’ll implement the core in-memory data structure and begin building our storage engine.
Preparation for Lab:
-
Review Go basics: maps, slices, goroutines, interfaces
-
Familiarize yourself with Go’s testing framework
-
Understand benchmarking concepts
Hands-on Lab: Set up project repository and design initial API interface
Create the basic project structure and define the core interfaces for our key-value database.
Lab Tasks:
-
Create the project directory structure as outlined above
-
Initialize Go module with proper dependencies
-
Define the core Store interface in pkg/storage/store.go
-
Create basic unit tests for the interface
-
Set up Makefile with common commands (build, test, benchmark)