System Design: Building Blocks - Database

4 minute read

I would say, Database is the most important component in a system, when you design a system, and you don’t consider a lot regarding the database, then the design wouldn’t be successful.

A proper database is essential for every business or organization.

It’s impossible to cover all the information of DB in this article.

What’s going to be covered

Types of database
Replication
Partition/Sharding

What is Database

A database is an organized collection of data that can be managed and accessed easily.

Types of Database

Based on the intended use case, the type of information they hold, and the storage method, we can divide the database into following two types.

Relational Database(SQL)

The data stored in relational database has prior structure.

The instance are stored in rows, an the attributes of each instance stored as columns.

When we say relational database, first thing come to our mind has to be ACID: atomic, consistency, isolation and durability.

With the ACID, we can always assume the data stored in database is correct.

I would say, Database is the most important component in a system, when you design a system, and you don’t consider a lot regarding the database, then the design wouldn’t be successful.

Atomic

The transaction is treated as an atomic operation, means, the transaction can either success or fail, there is no space for the grey area.

If the statement within the transaction fails, it would be rolled back.

Consistent

If multiple user want to review a record from the database, it should return a similar result each time.

Isolation

When there are multiple transactions run at the same time, they shouldn’t affect each other, the final result should be the same as the transactions were executed sequentially.

Durability

The succussed transaction will be persisted forever.

Non-relational Database(NO-SQL)

These database are used in application that requires a large volume of semi-structured and unstructured data.

because we don’t need to maintain the schema for stored data, the advantage of no-sql database including:

simple design

it’s easier to write less code, debug and maintain.

scalability

NoSQL make it easier to scale out since the data related to specific employee is stored in one document

availability

NoSQL database support data replication to ensure high availability

Comparison

	pros	cons
Relational Database	concurrency, integration	impedance mismatch
Non-relational Database	serialize/deserialize data data volume is big

Data Replication

Replication refers to keep multiple copies of the same data, at various physical nodes, to achieve the availability, and performance.

The main problem of replication is when we have to maintain the change in the replicated data overtime.

Replication Model

usually we have adopt two types of replication model in practice

primary-secondary replication

in this mode, only one node accept the write operation, which is referred as primary node, then sync the change with other nodes, which is secondary node.

this model is appropriate when our application is read-heavy, and the secondary nodes shouldn’t too many.

The problem of this model is the consistency.

if we adopt the synchronous replication, then all read operations of system would suffer, if we adopt asynchronous replication, then the consistency of system could be a concern

another problem, which is more severe, is single point of failure, if the primary node fails, then system can only serve read request, cannot accept any write operations.

leaderless replication

leaderless replication resolves the problem of primary-secondary model.

in this model, all nodes can accept write operations. So inevitably, it may leads to concurrent write. An effective resolution is quorums

so what is quorums?

based on wiki definition, a quorums is the minimum number of members of a deliberative assembly necessary to conduct the business of that group.

let take an example: assume we have 3 nodes, for each write operation, if we ensure more than half nodes, in our case is 2, perform the operation. Then for each read request, if more than half nodes serve the request, the latest data will definitely be served

Data Partition / Sharding

When the data volume cannot fit on a single node, we need to partition data across multiple nodes, while we should still provide single-node like properties.

Usually there are two ways of partition: vertical partition or horizontal partition

vertical partition

in vertical partition, we can divide one table into multiple tables, each sub-table contains the subset of columns of original table.

a typical use case of vertical partition is we can divide some frequent accessed column with those not that frequent, like the Blob column or very wide text fields.

Also, the partition need to be careful, to make the columns which has stronger connection stay in one table, those columns with weaker connection on separate table.

horizontal partition

in horizontal partition, we divide the table in row wise.

so there are two ways to perform row wise partition: range split or hash based

there is an interesting hash method, called consistent hashing

in consistent hashing, we assign each server or item in a distinguished hash table a place on an abstract circle, called rings. this permits server and objects to scale without compromising the system’s overall performance.

request routing

how does client know which node to connect while making the request?

this problem is known as service discovery,

the main challenge of service discovery is, how to make client or routing service, know the updates in the partitioning of the nodes.

zookeeper can resolve this issue, each node connect to zookeeper to update the information, then zookeeper will notify the routing tier about the change

Evaluation

By partition or replication, what we can bring to overall system is

reliability

by replication the data across different side, we can increase the availability of system, because more than one site can serve the request

performance

by partition the data across different node, we also reduce the burden of each node so that the performance is better

Chengze Li

System Design: Building Blocks - Database

What is Database

Types of Database

Relational Database(SQL)

Non-relational Database(NO-SQL)

Comparison

Data Replication

Replication Model

Data Partition / Sharding

request routing

Evaluation

You May Also Enjoy

Algorithm: Leetcode Contest 418

Algorithm: Leetcode Contest 417

Algorithm: Leetcode Contest 414

Algorithm: Leetcode Contest 412