System Design: Blob Store

3 minute read

What is Blob Store?

Blob store is a storage solution for unstructured data, including vide, photos, binary executable data etc.

Blob means binary large object, which consists of a collection of binary data stored as a single unit.

why we need blob store?

data intensive application require a storage solution that is easily scalable, reliable and highly available, so that they can store large media files.

Design

Requirements

CRD: users should be able to create, request or delete the data.

High level design

Components

clients: this is users/program that performs any of the API functions
rate limiter: limit the number of requests from same IP at the same time
load balancer: distribute the incoming network traffic among a group of servers
front-end server: forward the user’s request for adding or deleting data to appropriate storage servers
data node: hold the actual blob data. blobs are split into small, fix-sized pieces called chunks
manager node: core component that manages all data nodes. it stores the information about storage path.
metadata storage: distributed database that’s used by manager node to store all the metadata,
- blob metadata: consists of where each blob is stored
monitoring service

Integration

write

client generate the upload request. if the client request successfully pass rate limiter, load balancer will forward the request to one of front-end servers.
front-end server then request the manager node for the data node it should contact to store the blob
manager node assign the blob a unique id, then split the large-size blob into smaller chunks, then assign each chunk a data node where the chunk is eventually stored.
after determining the mapping of chunks to data nodes, front-end server write the chunk to assigned data nodes
we also replicate each chunk for redundancy purpose. this is made at manager node. so manager node also allocates the storage and data nodes for storing replicas.
manager node stores the blob metadata in the metadata storage.
after writing the blob, a fully qualified path of the blob is returned to the client.

read

read request reach the front-end server, it asks manager node for the blob’s metadata
manager node will authorize the blob, then looks for the chunks for that blob in metadata storage
manager node return the chunks and their mapping to the clients
clients then read the chunk data from data node

delete

since the blob chunks are placed at different data nodes, deleting from many different nodes takes time.

we don’t actually remove the blob from the blob store, we just mark it as “DELETED”,

then we have a service called garbage collector, this service will clean up the metadata inconsistency offline.

Considerations

there are billions of blobs that are stored and read.

if we look for the data nodes that contain specific blobs, it would be a very slow process.

solution is,we can group data nodes and call each group a partition.

we can partition the blobs based on complete path of it, which is the combination of account ID, container ID and blob ID.

Evaluation

As a storage solution, its most important feature is availability

availability

we make several redundancies to increase the availability.
- we keep four replicas for each blob, the placements of replica is well-considered as well
- we build cache at several layers as well, say the front-end server, or the clients
we keep a backup for manager node

scalability

partition and split the blobs into small-sized chunks helps us increase the scalability
provide load balance for data nodes.
- while at the same time, the manage nodes might become the bottleneck.

Consistency

we synchronously replicate the disk data blocks inside a storage, this is done on user’s critical path, which prove the strong consistency.

Retrospect

the design of rate limiter -> load balancer -> front-end server -> real service is common pattern
manager node -> metadata storage -> data node is also classic design
delete operation can usually be performed offline to increase the performance.

Chengze Li

System Design: Blob Store

What is Blob Store?

Design

Requirements

High level design

Considerations

Evaluation

Retrospect

You May Also Enjoy

Algorithm: Leetcode Contest 418

Algorithm: Leetcode Contest 417

Algorithm: Leetcode Contest 414

Algorithm: Leetcode Contest 412