System Design Case Study: Distributed Email System

3 minute read

Let’s design a large-scale distributed email service like Gmail or outlook.

In 2020, Gmail has 1.8 B active users and Outlook has 400M active users

Step1: Scope the problem

email service has changed significantly in complexity and scale.

Since a email system can have multiple features, let’s assume we will design:

send and receive emails
fetch all emails
search emails by subject / senders / body

Step2: High level design

back of envelope estimation

1B users
QPS for sending email = 10 ^ 5
storage requirement: 1000 PB

it’s clear we should deal with lots of data.

Email Knowledge 101

Historically, most mail servers use mail protocols such as POP, IMAP and SMTP

SMTP: Simple Mail Transfer Protocol

The standard protocol for sending emails from one mail server to another.

POP: Post Office Protocol

standard mail protocol to receive and download emails from remote mail to local email client.

Once the email downloaded to your computer, they are deleted from the email server.

IMAP: also a standard mail protocol to receive emails for a local email client.

Traditional Mail Server

The process consists of following steps

user 1 login outlook client, compose an email and press send button.
email is sent to outlook mail server. The communication between outlook client and mail server is SMTP
outlook mail server queries DNS to find the address of the recipient’s SMTP server, in this case, it’s Gmail SMTP
outlook mail server send the email to Gmail mail server
Gmail server store the email to make it available to user 2
Gmail client fetch new emails through IMAP/POP server when user 2 logins

Distributed Mail Server

Let’s examine mail sending flow first

Email Sending Flow

load balancer make sure it doesn’t exceed the rate limiting
web server are responsible for
1. email validation
2. pass the email to message queue
SMTP outgoing worker: pull messages from the outgoing queue and make sure emails are virus free
outgoing email are stored in “Sent Folder”

Email Receiving Flow

incoming email arrives at SMTP load balancer
load balancer distribute traffic among SMTP servers
emails are put in the incoming email queue
mail processing worker are responsible for some time consuming jobs like validation,
email passed validation will be stored in storage
when receiver login the email client, client will fetch the available emails from the storage.

Step3: Design deep dive

Metadata DB

Let’s examine the pattern of email metadata

headers are usually small and frequently accessed
email body can range from small to big
mails owns by a user are only accessible by that user.
data recency impacts data usage. user usually read recent emails

At high level, an email service should support following queries

get all emails for a user
create/delete a specific email
fetch all read/unread email
mark unread emails as read

Based on trade-off, we can choose relational DB for this use case.

Consistency

Distributed DB that relies on replication for high availability must make a fundamental trade-off between consistency and availability.

We decide to trade availability in favor of consistency.

Search

The search feature in email system has a lot more write than read.

We can leverage Elastic Search to build reverse index and support search features.

Step4: Wrap up

In this article, we started from traditional email architecture, then evaluate how to scale it up on sending flow and receiving flow separately.

Then we deep dive the DB solution choice, what factors we should consider during this process, then we explore how to support search features in the system.

Some takeaways:

When to introduce a message queue: when find a component is time consuming, e.g. SMTP Worker, put a message queue before it to increase the system performance.
When evaluate the DB solution, some thing to consider:
1. the frequent used query
2. data access pattern

Chengze Li