2 minute read

Everyone knows youtube, it has a very rich features and impossible to cover all of them in a single article.

In this article, we can only discuss a specific set of features in Youtube.

Step1: Scope the problem

Let’s assume the problem scope is:

  • ability to upload videos
  • smooth video streaming
  • can leverage existing cloud service

Step2: High Level Design

CDN and blob storage are the cloud services we will leverage.

Within the limited time frame, choosing the right technology to do the job right is more important than explaining how the technology works in details.

From the most high level perspective, system can be broken into two flows:

  • video uploading
  • video watching

Video Uploading

  • Metadata DB: video metadata are stored in Metadata DB.
  • Original Storage: A blob storage system which is used to store the original video.
  • Transcoding Server: it’s a process to convert a video format to other formats, so that we can provide the best video stream for different devices and bandwidth capability
  • CDN: Videos are cached in CDN, when you watch the video, it’s streamed from CDN
  • Completion Queue: A message queue that stores the information about the transcoding results

Step3: Detailed Design

In high level design, there are several area deserving more discussions. for example

Video Transcoding Details

Transcoding a video is a computationally expensive and time consuming.

To make this process easier for developer, we can implement it in “Direct Acyclic Graph “ programming model.

For example, original video can be split into “video”, “audio” and “metadata”. These three part can be executed in parallel.

  1. pre-processor:

    1. split video: videos can be split into small chunks so can increase the papalism
    2. DAG generation: generate DAG configuration files
  2. DAG scheduler:

    1. split the DAG graph into stages and put them in task queue in resource manager
  3. resource manager:

    responsible for managing the efficiency of resource allocation. it contains 3 main components

    1. task queue: priority queue contains tasks to be executed
    2. worker queue: priority queue contains all the worker utilization info
    3. running queue: contains the info of current running tasks
  4. task worker: run the task based on DAG configuration file

  5. encoded video: final output of transcoding service

to make the system more loosely coupled, we can introduce MQ

Step4: Wrap Up

There are some interesting topic not covered above, for examples

  1. video copyrights protection: we can leverage DRM(Digital Rights Manager) system to protect the copyrights
  2. visual watermarking: we put an image overlay on top of video
  3. streaming latency: to speed up the video watching experience, we can leverage CDN to bring popular videos closer to user.