3 minute read

What is Blob Store?

Blob store is a storage solution for unstructured data, including vide, photos, binary executable data etc.

Blob means binary large object, which consists of a collection of binary data stored as a single unit.

why we need blob store?

data intensive application require a storage solution that is easily scalable, reliable and highly available, so that they can store large media files.

Design

Requirements

  • CRD: users should be able to create, request or delete the data.

High level design

Components

  • clients: this is users/program that performs any of the API functions
  • rate limiter: limit the number of requests from same IP at the same time
  • load balancer: distribute the incoming network traffic among a group of servers
  • front-end server: forward the user’s request for adding or deleting data to appropriate storage servers
  • data node: hold the actual blob data. blobs are split into small, fix-sized pieces called chunks
  • manager node: core component that manages all data nodes. it stores the information about storage path.
  • metadata storage: distributed database that’s used by manager node to store all the metadata,
    • blob metadata: consists of where each blob is stored
  • monitoring service

Integration

write

  • client generate the upload request. if the client request successfully pass rate limiter, load balancer will forward the request to one of front-end servers.
  • front-end server then request the manager node for the data node it should contact to store the blob
  • manager node assign the blob a unique id, then split the large-size blob into smaller chunks, then assign each chunk a data node where the chunk is eventually stored.
  • after determining the mapping of chunks to data nodes, front-end server write the chunk to assigned data nodes
  • we also replicate each chunk for redundancy purpose. this is made at manager node. so manager node also allocates the storage and data nodes for storing replicas.
  • manager node stores the blob metadata in the metadata storage.
  • after writing the blob, a fully qualified path of the blob is returned to the client.

read

  • read request reach the front-end server, it asks manager node for the blob’s metadata
  • manager node will authorize the blob, then looks for the chunks for that blob in metadata storage
  • manager node return the chunks and their mapping to the clients
  • clients then read the chunk data from data node

delete

since the blob chunks are placed at different data nodes, deleting from many different nodes takes time.

we don’t actually remove the blob from the blob store, we just mark it as “DELETED”,

then we have a service called garbage collector, this service will clean up the metadata inconsistency offline.

Considerations

there are billions of blobs that are stored and read.

if we look for the data nodes that contain specific blobs, it would be a very slow process.

solution is,we can group data nodes and call each group a partition.

we can partition the blobs based on complete path of it, which is the combination of account ID, container ID and blob ID.

Evaluation

As a storage solution, its most important feature is availability

availability

  • we make several redundancies to increase the availability.
    • we keep four replicas for each blob, the placements of replica is well-considered as well
    • we build cache at several layers as well, say the front-end server, or the clients
  • we keep a backup for manager node

scalability

  • partition and split the blobs into small-sized chunks helps us increase the scalability
  • provide load balance for data nodes.
    • while at the same time, the manage nodes might become the bottleneck.

Consistency

we synchronously replicate the disk data blocks inside a storage, this is done on user’s critical path, which prove the strong consistency.

Retrospect

  • the design of rate limiter -> load balancer -> front-end server -> real service is common pattern
  • manager node -> metadata storage -> data node is also classic design
  • delete operation can usually be performed offline to increase the performance.