Production-Grade · Written in Rust · Open Source

Distributed
Key-Value
Storage

A battle-hardened distributed KV store built from scratch in Rust — Raft consensus, custom LSM tree, consistent hashing, MVCC, and gRPC API. Six crates. Zero compromises.

# start a 3-node cluster
$ rustkvd server --node-id 1 --port 7001 --peers 7002,7003
$ rustkvd server --node-id 2 --port 7002 --peers 7001,7003
$ rustkvd server --node-id 3 --port 7003 --peers 7001,7002

# interact via CLI client
$ rustkvd client put "hello" "world"
$ rustkvd client get "hello"
6 Rust Crates
150 Virtual Nodes (hash ring)
3 Replication Factor
4 KB SSTable Blocks
100% Safe Rust

Systems engineering,
the Rust way

rustkvd is a production-ready distributed key-value store built entirely from scratch in Rust. It demonstrates every layer of modern distributed systems — from disk I/O and consensus to cluster membership and client APIs.

The project features a complete Raft consensus implementation for fault-tolerant leader election and log replication, a custom LSM-tree storage engine with WAL crash safety and background compaction, and a consistent hash ring for automatic key routing across nodes.

All communication happens over gRPC (via tonic), with a clean CLI client for both server management and key-value operations. The codebase is structured as a Cargo workspace with six focused crates, each with a single well-defined responsibility.

Rust 1.95 tokio tonic gRPC prost parking_lot dashmap
📦 rustkvd/Cargo.toml — workspace members
🔷
common
Shared types, error enums, metrics, tracing
💾
storage
WAL · MemTable · SSTable · BloomFilter · MVCC · Compaction
⚖️
raft
Leader election · Log replication · Snapshots
🔗
cluster
Consistent hash ring · Membership · Router
🖥️
server
gRPC server · Node runner · Request dispatch
📡
client
KVClient · CLI · Watch streaming

Three-node cluster
anatomy

Each node runs the full stack: gRPC server, Raft state machine, LSM storage engine, and cluster membership. The leader handles writes; followers serve reads locally.

gRPC Client Consistent Hash Router NODE 1 — LEADER port :7001 KVStore gRPC Service Raft Leader Raft Log LSM: WAL + MemTable SSTable L0 → L1 → L2 MVCC + BloomFilter NODE 2 — FOLLOWER port :7002 gRPC Service Raft Follower WAL + MemTable SSTables MVCC NODE 3 — FOLLOWER port :7003 KVStore gRPC Service Raft Follower Raft Log LSM: WAL + MemTable SSTable L0 → L1 → L2 MVCC + BloomFilter AppendEntries AppendEntries

See it in action

From cluster bootstrap to key-value operations, watch rustkvd handle everything in real time.

rustkvd — distributed kv store
# Clone and build git clone https://github.com/vignesh2027/rustkvd && cd rustkvd cargo build --release Compiling rustkvd workspace (6 crates)... Finished release [optimized] target(s)   # Start 3-node cluster ./target/release/server --node-id 1 --port 7001 --peers 7002,7003 & [INFO] Node 1 started, listening on :7001 [INFO] Raft: starting election (term=1) [INFO] Raft: elected as leader (term=1, votes=2/3)   # Write and read ./target/release/client put user:1001 '{"name":"Alice","role":"admin"}' ✓ OK [routed to node 2, replicated to 3 nodes, version=1]   ./target/release/client get user:1001 key: user:1001 value: {"name":"Alice","role":"admin"} [version=1, ttl=none, node=2]   # Watch for changes (streaming gRPC) ./target/release/client watch user:1001 Watching key "user:1001" — press Ctrl+C to stop [event] PUT user:1001 → {"name":"Alice","role":"superadmin"} (v2)  

Everything a production
system needs

Built from first principles — no shortcuts, no third-party consensus libraries. Every component is written in safe, idiomatic Rust.

⚖️

Raft Consensus

Full Raft implementation: leader election with randomized timeouts, AppendEntries log replication, InstallSnapshot for lagging followers, and linearizable reads.

💾

LSM Storage Engine

Write-Ahead Log with CRC32 checksums and fsync, MemTable backed by BTreeMap, SSTable files with 4KB block format, sparse index, and background compaction.

🔗

Consistent Hashing

SHA-256 consistent hash ring with 150 virtual nodes per physical node. Automatic key routing, replication factor 3, and seamless node join/leave.

🕒

MVCC

Multi-Version Concurrency Control with monotonic version counters and optional TTL per key. Snapshot reads and atomic version tracking without write locks.

📡

gRPC API

Complete Protocol Buffers service definition: Put, Get, Delete, Scan, and Watch (server-streaming). Generated via tonic-build with prost.

🌸

Bloom Filters

Per-SSTable bloom filters eliminate unnecessary disk reads for missing keys. Configurable false-positive rate with compact bit-array representation.

Built from scratch,
not a library

The Raft implementation covers every phase of the protocol — from leader election and log replication to snapshot transfer and cluster reconfiguration.

01

Leader Election

Randomized election timeouts (150–300ms) prevent split votes. RequestVote RPC with term checking and log completeness verification ensures only up-to-date nodes can become leader.

02

Log Replication

AppendEntries RPC replicates entries in parallel to all followers. Commit index advances once a quorum (⌊N/2⌋ + 1) acknowledges an entry. Pipelining maximizes throughput.

03

Snapshot Transfer

InstallSnapshot RPC catches up lagging followers without replaying thousands of log entries. Snapshots are chunked and streamed over gRPC with CRC verification.

Property Guarantee Mechanism Status
Election Safety At most one leader per term Majority vote + term monotonicity Implemented
Log Matching Identical entries at same (term, index) AppendEntries consistency check Implemented
Leader Completeness Leader has all committed entries Log completeness in RequestVote Implemented
State Machine Safety All nodes apply same entries in order Commit index, sequential apply Implemented
Linearizability Reads reflect latest committed writes ReadIndex protocol Planned

Custom LSM tree,
no compromises

The storage engine is a full Log-Structured Merge tree implementation: optimized for write throughput, with crash safety and O(log N) reads.

✍️ Write client PUT
📋 WAL CRC32 + fsync
🧠 MemTable BTreeMap
🗃️ L0 SSTable 4KB blocks
🗜️ Compaction L0→L1→L2

Write-Ahead Log (WAL)

  • CRC32 checksum per entry
  • fsync before ack to client
  • Replay on crash recovery
  • Put / Delete / Checkpoint entries

MemTable

  • In-memory BTreeMap for sorted iteration
  • Configurable size threshold (default 64MB)
  • Immutable flush to L0 SSTable
  • Concurrent reads via parking_lot RwLock

SSTable Format

  • 4KB data blocks with block index
  • Sparse in-memory index (every 16 entries)
  • Bloom filter per file (false-positive <1%)
  • Footer: magic bytes + index offset

Compaction

  • Tiered compaction: L0→L1→L2
  • Background tokio task, non-blocking
  • Merge-sort to eliminate duplicates
  • Tombstone GC after level promotion

MVCC

  • Monotonic AtomicU64 version per write
  • MVCCValue: value, version, deleted, ttl_secs
  • Snapshot isolation for scan operations
  • TTL expiry check on read path

Clean API,
expressive client

The Rust client library wraps the gRPC calls into an ergonomic async API. All operations are async/await native with tokio.

📄 examples/put.rs
use rustkvd_client::{KVClient};

async fn main() -> Result<()> {
    let mut client = KVClient::connect("http://127.0.0.1:7001").await?;

    // Simple put
    client.put("greeting", "hello, world").await?;

    // Put with TTL (expires in 60 seconds)
    client.put_with_ttl("session:abc", "token_data", 60).await?;

    println!("✓ Keys written successfully");
    Ok(())
}
📄 examples/get.rs
use rustkvd_client::{KVClient};

async fn main() -> Result<()> {
    let mut client = KVClient::connect("http://127.0.0.1:7001").await?;

    match client.get("greeting").await? {
        Some(value) => {
            println!("value: {}", String::from_utf8_lossy(&value));
        }
        None => println!("key not found"),
    }
    Ok(())
}
📄 examples/watch.rs
use rustkvd_client::{KVClient};
use futures::StreamExt;

async fn main() -> Result<()> {
    let mut client = KVClient::connect("http://127.0.0.1:7001").await?;

    // Server-streaming watch — yields on every change
    let mut stream = client.watch("config:flags").await?;

    while let Some(event) = stream.next().await {
        let ev = event?;
        println!("[{}] {} → {:?}", ev.event_type, ev.key, ev.value);
    }
    Ok(())
}
📄 examples/scan.rs
use rustkvd_client::{KVClient};

async fn main() -> Result<()> {
    let mut client = KVClient::connect("http://127.0.0.1:7001").await?;

    // Scan a key range with limit
    let pairs = client
        .scan("user:", "user:~", 100)
        .await?;

    for pair in pairs {
        println!("{} → {}",
            pair.key,
            String::from_utf8_lossy(&pair.value));
    }
    Ok(())
}
📄 examples/delete.rs
use rustkvd_client::{KVClient};

async fn main() -> Result<()> {
    let mut client = KVClient::connect("http://127.0.0.1:7001").await?;

    // Delete a key (writes a tombstone into LSM)
    client.delete("session:abc").await?;

    println!("✓ Deleted");
    Ok(())
}

Everything you need
to build and deploy

Complete documentation covering setup, configuration, operations, and internals — from first clone to production cluster.

🚀

Getting Started

From zero to running cluster in minutes

  • Prerequisites: Rust 1.75+, protoc
  • Building from source with cargo
  • Single-node quickstart
  • Multi-node cluster setup
  • First put / get / delete
⚙️

Configuration

All server and client options documented

  • --node-id, --port, --peers flags
  • --data-dir storage path
  • --raft-heartbeat-ms interval
  • --compaction-threshold MB
  • Environment variable overrides
⚖️

Raft Internals

Deep dive into the consensus layer

  • State machine: Follower → Candidate → Leader
  • Term and vote management
  • Log entry lifecycle
  • Commit index and apply pipeline
  • Snapshot creation and transfer
💾

Storage Engine

LSM tree design and implementation

  • WAL format: header + CRC32 + payload
  • MemTable flush policy
  • SSTable binary format specification
  • Bloom filter sizing and FPR
  • Compaction scheduling and merging
🔗

Cluster Operations

Managing distributed cluster lifecycle

  • Adding a new node to the ring
  • Graceful node removal
  • Monitoring cluster health
  • Key rebalancing strategy
  • Network partition handling
🧪

Testing & Benchmarks

Running tests and measuring performance

  • cargo test --workspace (all unit tests)
  • Integration tests with tempfile
  • Chaos testing: node kill/restart
  • Throughput benchmark: put/get/scan
  • Latency percentiles under load

gRPC service
specification

All operations are defined in a single Protocol Buffers schema and served over HTTP/2 via tonic. The client library wraps these in async Rust methods.

PUT

KVStore.Put(PutRequest) → PutResponse

Write a key

Writes a key-value pair. The request is routed to the Raft leader, appended to the log, and committed once a quorum acknowledges. Returns version of the written entry.

FieldTypeDescription
keystringKey to write (UTF-8, any length)
valuebytesArbitrary byte payload
ttl_secsoptional uint64Expiry in seconds from now
GET

KVStore.Get(GetRequest) → GetResponse

Read a key

Reads the latest committed value for a key. Checks bloom filter first to avoid unnecessary disk reads. Returns found=false if the key does not exist or has expired.

FieldTypeDescription
keystringKey to look up
valuebytesReturned value (if found)
foundboolWhether the key exists
DEL

KVStore.Delete(DeleteRequest) → DeleteResponse

Remove a key

Writes a deletion tombstone into the LSM tree. The key becomes invisible to reads immediately after commit. Tombstones are garbage-collected during compaction.

FieldTypeDescription
keystringKey to delete
successboolTrue if key existed
SCAN

KVStore.Scan(ScanRequest) → ScanResponse

Range scan

Returns all key-value pairs in the range [start_key, end_key) up to the specified limit. Results are sorted lexicographically. Efficient due to SSTable sorted ordering.

FieldTypeDescription
start_keystringInclusive lower bound
end_keystringExclusive upper bound
limituint32Max results to return
STREAM

KVStore.Watch(WatchRequest) → stream WatchEvent

Real-time change stream

Opens a long-lived server-streaming RPC. The server pushes a WatchEvent for every Put or Delete that affects the watched key. Uses tokio::sync::broadcast internally.

FieldTypeDescription
keystringKey to watch
event_typestring"PUT" or "DELETE"
valuebytesNew value (for PUT events)

How rustkvd compares

Built for educational clarity and systems understanding, while shipping all the primitives real production systems rely on.

Feature rustkvd etcd Redis TiKV
Raft Consensus from scratch
LSM Storage custom B-tree in-memory RocksDB
Consistent Hashing SHA-256 ~ range slots range
MVCC
gRPC API RESP
Bloom Filters
Watch / Streaming ~ pub/sub
Written in Rust Go C
Open / Readable Code 6 crates ~ ~ complex

Up and running
in three steps

1

Clone & Build

# requires Rust 1.75+ and protoc
git clone https://github.com/
  vignesh2027/rustkvd
cd rustkvd
cargo build --release
2

Start Cluster

# node 1 (leader)
./target/release/server \
  --node-id 1 --port 7001 \
  --peers 7002,7003 &

# repeat for nodes 2 & 3
3

Write & Read

# write a key
./target/release/client \
  put "hello" "world"

# read it back
./target/release/client get "hello"