System Design Building Block: K-V Store
What is a K-V store?
Key-value Store also referred as key-value Database, which is a type of nosql DB.
the key is unique and value can be any type of data.
Why we need K-V store?
The traditional DB, which is relational DB, is relatively heavy weight and hard to scale, because we need to maintain its relations with other tables and keep its ACID properties.
In scalable system, not all data need the strict schema, some of them are free-formed, K-V is a good option in this scenario.
Components
In distributed version K-V store, we need at least following components/requirements
- data partition: this is a familiar concept for us, we need to split data into smaller partitions and distribute them among multiple servers.
- consistent hashing is a perfect solution here.
- data replication
- on consistent hashing, after a key is mapped to a position, we can also walk clockwise from the position and choose first N server to store data copies.
- consistency
- quorum consensus can guarantee the consistency, basically based on the number of servers we have, we can setup W R to make sure W + R is bigger than our server. we can guarantee at least one server has the latest data.
- inconsistency resolution
- to resolve inconsistency, first we need a technique to detect the inconsistency.
- versioning and vector locks are used to solve inconsistency issues.
- vector clock is something like [server, version] which associated with data item, enabling us to check if one version precedes, succeeds or in conflict with others.
- failure handling
- like above, to handle the failure, first we need a technique to detect the failure.
- it’s insufficient to believe a server is down because other servers say so.
- better solution is to use decentralized failure detection method like gossip protocol:
- each node maintain the node membership list
- each node periodically increment its heartbeat counter
- each node periodically send heartbeats to a set of random nodes, which in turn propagate to another sets of nodes
- once nodes receive heartbeats, membership list is updated to latest info.
- failure can be divided as temporary failure or permanent failure
- temporary failure: the node is down temporarily and will come up shortly
- in this failure, we can use technology called sloppy quorum. instead of pick first W servers on consistent ring, we pick first W healthy server to write, then when the down server come back, data will transfer to it
- permanent failure: the system is down permanently and cannot recover in short, in this way, we cannot assume the data on each server is correct.
- we can adopt anti-entropy protocol to keep replicas in sync. it involves comparing each piece of data and update each replica to the newest version
- temporary failure: the node is down temporarily and will come up shortly
- like above, to handle the failure, first we need a technique to detect the failure.
Integrations
- A client communicate with K-V store through API
- A coordinator is a node that act like a proxy between client and K-V store
Write Path
- Coordinator knows which node to write the data, and forward the client to that node
- Write request will first persist on commit log file
- Then data is saved in memory cache
- When memory cache is full, data is flushed into Sorted String table on disk
Read Path
- Coordinator knows which node to read the data, and forward the client to that node
- first check cache, if the data is there
- if doesn’t find the data, then go do SS table to fetch the data.