Introduction and Database Fundamentals

Learning Objectives

  • • Understand what key-value databases are and their characteristics
  • • Learn when to use key-value stores vs other database types
  • • Analyze functional and non-functional requirements
  • • Design a layered architecture for our database
  • • Plan the project structure and milestones

Lesson 1.1: Understanding Key-Value Stores

What is a Key-Value Database?

A key-value database is the simplest form of NoSQL database that stores data as a collection of key-value pairs. Think of it like a massive, persistent hash map or dictionary that survives system restarts.

Core Concepts:

  • Key: A unique identifier (string, integer, or binary data)
  • Value: The data associated with the key (can be any data type)
  • Operations: Primarily GET, PUT, DELETE
  • Simplicity: No schema, no SQL, no complex queries

Real-world example:

Key: "user:1001"          Value: {"name": "John", "age": 30}
Key: "session:abc123"     Value: "active"
Key: "counter:pageviews"  Value: 15420

Characteristics of Key-Value Stores

Strengths:

  • • High Performance: O(1) average lookup time
  • • Horizontal Scalability: Easy to partition data across nodes
  • • Flexibility: Schema-less design allows storing any data
  • • Low Latency: Optimized for fast reads and writes
  • • Simplicity: Easy to understand and implement

Limitations:

  • • No Complex Queries: Can't search by value or join data
  • • No Relationships: No foreign keys or references
  • • Limited Indexing: Only the key is indexed
  • • Range Queries: Difficult without sorted keys
  • • No Transactions: Limited ACID support in basic implementations

Use Cases and Trade-offs

Ideal Use Cases:

Session Storage

Web application sessions with session ID as key

Why: Fast lookup, short-lived data, millions of concurrent sessions

Caching Layer

Cache frequently accessed data

Why: Sub-millisecond access, reduces database load

Real-time Analytics

Counters, metrics, leaderboards

Why: Atomic increment operations, extremely fast

When NOT to use Key-Value Stores:

  • • Complex reporting and analytics
  • • Data with many relationships
  • • Full-text search requirements
  • • Complex transactions spanning multiple keys
  • • Data with intricate query patterns

Comparison with Other Database Types

Feature Key-Value Relational Document Graph
Query Complexity Simple (by key) Complex (SQL) Moderate (JSON) Graph queries
Performance Fastest Slower Fast Medium
Schema Flexibility None Rigid Flexible Flexible
Scalability Excellent Difficult Good Medium

Real-world Examples

Redis

  • • In-memory key-value store with optional persistence
  • • Supports advanced data types (lists, sets, sorted sets, streams)
  • • Widely used for caching, sessions, real-time analytics
  • • Performance: 100,000+ ops/sec per instance

etcd

  • • Distributed key-value store based on Raft consensus
  • • Strong consistency guarantees
  • • Used for configuration management, service discovery
  • • Built-in watch mechanism for key changes

Memcached

  • • Simple in-memory cache
  • • Very high performance for basic key-value operations
  • • No persistence, data lost on restart
  • • Used for caching in large-scale systems

LevelDB

  • • Embedded key-value store (part of a larger application)
  • • Sorted keys, efficient range queries
  • • Durability through write-ahead logging
  • • Written in C++, bindings available for many languages

Lesson 1.2: Core Requirements Analysis

Defining Functional Requirements

Functional requirements describe what the system does. For our key-value database:

Basic CRUD Operations

Create/Put Operation:
  • INPUT: key (string), value (any data)
  • OUTPUT: success/error
  • SIDE EFFECT: stores or updates value at key
Read/Get Operation:
  • INPUT: key (string)
  • OUTPUT: value (if exists), error (if not found)
  • SIDE EFFECT: none (read-only)
Delete Operation:
  • INPUT: key (string)
  • OUTPUT: success/error
  • SIDE EFFECT: removes key-value pair

Extended Operations

Scan/Range Query:

Iterate over keys in sorted order, return all key-value pairs in a range

Batch Operations:

Apply multiple operations atomically - all succeed or all fail

Exists/Contains:

Check if a key exists without retrieving value

Defining Non-Functional Requirements

Non-functional requirements describe how well the system performs. These are critical for production systems.

Performance Requirements

Throughput:
  • • Single-threaded: 100,000+ ops/sec for in-memory operations
  • • Multi-threaded: 1,000,000+ ops/sec with 16 cores
  • • Network operations: 50,000+ ops/sec over TCP
Latency (p99 - 99th percentile):
  • • In-memory reads: < 1 millisecond
  • • In-memory writes: < 5 milliseconds
  • • Disk reads (cached): < 10 milliseconds
  • • Disk writes (persisted): < 100 milliseconds

Durability Requirements

Write-Ahead Logging (WAL):
  • • Every write must be logged to disk before returning success
  • • Survives system crashes
  • • No data loss on power failure
Durability Levels:
  • • No durability: Fastest, data lost on crash (for caching only)
  • • Write-ahead log: Moderate speed, no data loss
  • • Synchronous writes: Slowest, highest durability guarantee

Consistency vs Availability Trade-off (CAP Theorem)

The CAP theorem states a distributed system can guarantee at most two of three properties:

Consistency

All nodes see the same data

Availability

System responds to requests

Partition Tolerance

System works despite network splits

Lesson 1.3: Architecture Planning

High-level System Design

Our key-value database will have layered architecture:

┌─────────────────────────────────────────────┐
│         Client Applications                  │
└────────────────┬────────────────────────────┘
                 │
┌────────────────▼────────────────────────────┐
│    Network Layer (TCP/Protocol Handling)     │
│  - Connection management                    │
│  - Request/Response serialization           │
│  - Pipelining support                       │
└────────────────┬────────────────────────────┘
                 │
┌────────────────▼────────────────────────────┐
│    Query Processor & Executor                │
│  - Parse commands                           │
│  - Execute operations                       │
│  - Enforce consistency                      │
└────────────────┬────────────────────────────┘
                 │
┌────────────────▼────────────────────────────┐
│    Transaction & Concurrency Manager        │
│  - MVCC implementation                      │
│  - Locking mechanism                        │
│  - Transaction coordination                 │
└────────────────┬────────────────────────────┘
                 │
┌────────────────▼────────────────────────────┐
│    Storage Engine (Memory + Disk)            │
│  ┌──────────────┬──────────────────────────┐│
│  │  In-Memory   │  Persistence Layer       ││
│  │  - HashMap   │  - WAL (Write-Ahead Log) ││
│  │  - LSM Tree  │  - SSTables              ││
│  │  - Cache     │  - Compaction            ││
│  └──────────────┴──────────────────────────┘│
└────────────────┬────────────────────────────┘
                 │
┌────────────────▼────────────────────────────┐
│    Replication & Consensus (optional)       │
│  - Leader-follower sync                     │
│  - Raft consensus                           │
│  - Conflict resolution                      │
└─────────────────────────────────────────────┘

Component Identification

1. Storage Engine (Core)

Responsibility: Physically store and retrieve data

  • • Memtable: In-memory sorted map for recent writes
  • • SSTable: On-disk sorted key-value files
  • • LSM Tree: Multi-level structure for write optimization
  • • Cache: Hot data in memory for faster access
  • • Bloom Filters: Probabilistic lookup structures

2. Persistence Layer

Responsibility: Ensure durability and crash recovery

  • • Write-Ahead Log (WAL): Record all writes before applying
  • • Checkpoint Mechanism: Periodic snapshots of state
  • • Recovery Manager: Replay logs on startup
  • • Compaction: Merge files and reclaim space

3. Concurrency Manager

Responsibility: Handle multiple concurrent operations safely

  • • Lock Manager: Distribute locks, prevent deadlocks
  • • MVCC Engine: Multiple versions for non-blocking reads
  • • Transaction Coordinator: Manage transaction lifecycle
  • • Conflict Detector: Identify concurrent modifications

Interface Design and API Contracts

Core Database Interface

// Store defines the key-value store interface
type Store interface {
    // Single key operations
    Get(ctx context.Context, key []byte) ([]byte, error)
    Put(ctx context.Context, key, value []byte) error
    Delete(ctx context.Context, key []byte) error
    
    // Batch operations
    Batch(ctx context.Context, ops []Operation) error
    
    // Iteration
    Scan(ctx context.Context, start, end []byte) (Iterator, error)
    
    // Lifecycle
    Close() error
    
    // Durability
    Flush() error
    Snapshot() (Snapshot, error)
}

// Iterator for range queries
type Iterator interface {
    Next() bool
    Key() []byte
    Value() []byte
    Err() error
    Close() error
}

Project Structure and Organization

kvdb/
├── cmd/
│   ├── server/           # Main server executable
│   │   └── main.go
│   └── cli/              # Command-line client
│       └── main.go
├── pkg/
│   ├── storage/          # Storage engine
│   │   ├── store.go
│   │   ├── memtable.go
│   │   ├── sstable.go
│   │   └── cache.go
│   ├── persistence/      # WAL & recovery
│   │   ├── wal.go
│   │   └── recovery.go
│   ├── network/          # Network layer
│   │   ├── server.go
│   │   ├── client.go
│   │   └── protocol.go
│   ├── concurrency/      # Concurrency control
│   │   ├── lock.go
│   │   ├── mvcc.go
│   │   └── transaction.go
│   ├── monitor/          # Observability
│   │   ├── metrics.go
│   │   └── logger.go
│   └── config/           # Configuration
│       └── config.go
├── internal/
│   └── testutil/         # Test utilities
├── tests/
│   ├── integration_test.go
│   └── benchmark_test.go
├── docs/
│   ├── architecture.md
│   ├── api.md
│   └── operational_guide.md
├── go.mod
├── go.sum
├── Makefile
└── README.md

Summary & Learning Outcomes

By completing this lesson, you should understand:

  • • What key-value databases are and why they're different from relational databases
  • • Key characteristics - strengths and limitations
  • • Real-world use cases and when NOT to use them
  • • Functional requirements - CRUD operations and extended features
  • • Non-functional requirements - performance, durability, availability targets
  • • Architecture overview - layered design with clear components
  • • Interface contracts - how components communicate
  • • Project structure - organized codebase layout

Key Takeaways

  • • Key-value stores excel at simple, fast access by exact key match
  • • Non-functional requirements (performance, durability) are as important as functional requirements
  • • Clean architecture with separation of concerns enables testing and evolution
  • • Trade-offs are inevitable - document decisions and their rationale
  • • Start simple, add features incrementally

Next Steps

Next Week: We'll implement the core in-memory data structure and begin building our storage engine.

Preparation for Lab:

  • • Review Go basics: maps, slices, goroutines, interfaces
  • • Familiarize yourself with Go's testing framework
  • • Understand benchmarking concepts

Hands-on Lab: Set up project repository and design initial API interface

Create the basic project structure and define the core interfaces for our key-value database.

Lab Tasks:

  • 1. Create the project directory structure as outlined above
  • 2. Initialize Go module with proper dependencies
  • 3. Define the core Store interface in pkg/storage/store.go
  • 4. Create basic unit tests for the interface
  • 5. Set up Makefile with common commands (build, test, benchmark)