A battle-hardened distributed KV store built from scratch in Rust — Raft consensus, custom LSM tree, consistent hashing, MVCC, and gRPC API. Six crates. Zero compromises.
# start a 3-node cluster
$ rustkvd server --node-id 1 --port 7001 --peers 7002,7003
$ rustkvd server --node-id 2 --port 7002 --peers 7001,7003
$ rustkvd server --node-id 3 --port 7003 --peers 7001,7002
# interact via CLI client
$ rustkvd client put "hello" "world"
$ rustkvd client get "hello"
rustkvd is a production-ready distributed key-value store built entirely from scratch in Rust. It demonstrates every layer of modern distributed systems — from disk I/O and consensus to cluster membership and client APIs.
The project features a complete Raft consensus implementation for fault-tolerant leader election and log replication, a custom LSM-tree storage engine with WAL crash safety and background compaction, and a consistent hash ring for automatic key routing across nodes.
All communication happens over gRPC (via tonic), with a clean CLI client for both server management and key-value operations. The codebase is structured as a Cargo workspace with six focused crates, each with a single well-defined responsibility.
Each node runs the full stack: gRPC server, Raft state machine, LSM storage engine, and cluster membership. The leader handles writes; followers serve reads locally.
From cluster bootstrap to key-value operations, watch rustkvd handle everything in real time.
Built from first principles — no shortcuts, no third-party consensus libraries. Every component is written in safe, idiomatic Rust.
Full Raft implementation: leader election with randomized timeouts, AppendEntries log replication, InstallSnapshot for lagging followers, and linearizable reads.
Write-Ahead Log with CRC32 checksums and fsync, MemTable backed by BTreeMap, SSTable files with 4KB block format, sparse index, and background compaction.
SHA-256 consistent hash ring with 150 virtual nodes per physical node. Automatic key routing, replication factor 3, and seamless node join/leave.
Multi-Version Concurrency Control with monotonic version counters and optional TTL per key. Snapshot reads and atomic version tracking without write locks.
Complete Protocol Buffers service definition: Put, Get, Delete, Scan, and Watch (server-streaming). Generated via tonic-build with prost.
Per-SSTable bloom filters eliminate unnecessary disk reads for missing keys. Configurable false-positive rate with compact bit-array representation.
The Raft implementation covers every phase of the protocol — from leader election and log replication to snapshot transfer and cluster reconfiguration.
Randomized election timeouts (150–300ms) prevent split votes. RequestVote RPC with term checking and log completeness verification ensures only up-to-date nodes can become leader.
AppendEntries RPC replicates entries in parallel to all followers. Commit index advances once a quorum (⌊N/2⌋ + 1) acknowledges an entry. Pipelining maximizes throughput.
InstallSnapshot RPC catches up lagging followers without replaying thousands of log entries. Snapshots are chunked and streamed over gRPC with CRC verification.
| Property | Guarantee | Mechanism | Status |
|---|---|---|---|
| Election Safety | At most one leader per term | Majority vote + term monotonicity | Implemented |
| Log Matching | Identical entries at same (term, index) | AppendEntries consistency check | Implemented |
| Leader Completeness | Leader has all committed entries | Log completeness in RequestVote | Implemented |
| State Machine Safety | All nodes apply same entries in order | Commit index, sequential apply | Implemented |
| Linearizability | Reads reflect latest committed writes | ReadIndex protocol | Planned |
The storage engine is a full Log-Structured Merge tree implementation: optimized for write throughput, with crash safety and O(log N) reads.
The Rust client library wraps the gRPC calls into an ergonomic async API. All operations are async/await native with tokio.
use rustkvd_client::{KVClient}; async fn main() -> Result<()> { let mut client = KVClient::connect("http://127.0.0.1:7001").await?; // Simple put client.put("greeting", "hello, world").await?; // Put with TTL (expires in 60 seconds) client.put_with_ttl("session:abc", "token_data", 60).await?; println!("✓ Keys written successfully"); Ok(()) }
use rustkvd_client::{KVClient}; async fn main() -> Result<()> { let mut client = KVClient::connect("http://127.0.0.1:7001").await?; match client.get("greeting").await? { Some(value) => { println!("value: {}", String::from_utf8_lossy(&value)); } None => println!("key not found"), } Ok(()) }
use rustkvd_client::{KVClient}; use futures::StreamExt; async fn main() -> Result<()> { let mut client = KVClient::connect("http://127.0.0.1:7001").await?; // Server-streaming watch — yields on every change let mut stream = client.watch("config:flags").await?; while let Some(event) = stream.next().await { let ev = event?; println!("[{}] {} → {:?}", ev.event_type, ev.key, ev.value); } Ok(()) }
use rustkvd_client::{KVClient}; async fn main() -> Result<()> { let mut client = KVClient::connect("http://127.0.0.1:7001").await?; // Scan a key range with limit let pairs = client .scan("user:", "user:~", 100) .await?; for pair in pairs { println!("{} → {}", pair.key, String::from_utf8_lossy(&pair.value)); } Ok(()) }
use rustkvd_client::{KVClient}; async fn main() -> Result<()> { let mut client = KVClient::connect("http://127.0.0.1:7001").await?; // Delete a key (writes a tombstone into LSM) client.delete("session:abc").await?; println!("✓ Deleted"); Ok(()) }
Complete documentation covering setup, configuration, operations, and internals — from first clone to production cluster.
From zero to running cluster in minutes
All server and client options documented
Deep dive into the consensus layer
LSM tree design and implementation
Managing distributed cluster lifecycle
Running tests and measuring performance
All operations are defined in a single Protocol Buffers schema and served over HTTP/2 via tonic. The client library wraps these in async Rust methods.
Write a key
Writes a key-value pair. The request is routed to the Raft leader, appended to the log, and committed once a quorum acknowledges. Returns version of the written entry.
| Field | Type | Description |
|---|---|---|
key | string | Key to write (UTF-8, any length) |
value | bytes | Arbitrary byte payload |
ttl_secs | optional uint64 | Expiry in seconds from now |
Read a key
Reads the latest committed value for a key. Checks bloom filter first to avoid unnecessary disk reads. Returns found=false if the key does not exist or has expired.
| Field | Type | Description |
|---|---|---|
key | string | Key to look up |
value | bytes | Returned value (if found) |
found | bool | Whether the key exists |
Remove a key
Writes a deletion tombstone into the LSM tree. The key becomes invisible to reads immediately after commit. Tombstones are garbage-collected during compaction.
| Field | Type | Description |
|---|---|---|
key | string | Key to delete |
success | bool | True if key existed |
Range scan
Returns all key-value pairs in the range [start_key, end_key) up to the specified limit. Results are sorted lexicographically. Efficient due to SSTable sorted ordering.
| Field | Type | Description |
|---|---|---|
start_key | string | Inclusive lower bound |
end_key | string | Exclusive upper bound |
limit | uint32 | Max results to return |
Real-time change stream
Opens a long-lived server-streaming RPC. The server pushes a WatchEvent for every Put or Delete that affects the watched key. Uses tokio::sync::broadcast internally.
| Field | Type | Description |
|---|---|---|
key | string | Key to watch |
event_type | string | "PUT" or "DELETE" |
value | bytes | New value (for PUT events) |
Built for educational clarity and systems understanding, while shipping all the primitives real production systems rely on.
| Feature | rustkvd | etcd | Redis | TiKV |
|---|---|---|---|---|
| Raft Consensus | ✓ from scratch | ✓ | ✗ | ✓ |
| LSM Storage | ✓ custom | ✗ B-tree | ✗ in-memory | ✓ RocksDB |
| Consistent Hashing | ✓ SHA-256 | ~ range | ✓ slots | ✓ range |
| MVCC | ✓ | ✓ | ✗ | ✓ |
| gRPC API | ✓ | ✓ | ✗ RESP | ✓ |
| Bloom Filters | ✓ | ✗ | ✗ | ✓ |
| Watch / Streaming | ✓ | ✓ | ~ pub/sub | ✓ |
| Written in Rust | ✓ | ✗ Go | ✗ C | ✓ |
| Open / Readable Code | ✓ 6 crates | ~ | ✓ | ~ complex |