2 minute read

Step1: Scope the problem

the most important thing is to understand which types of chat system to design. either one-on-one or group chat.

other important questions to ask:

  1. scale?
  2. content type supported
  3. how long shall we store the chat history.

Step2: high level design

Each client connect to chat service, which supports all the features mentioned above.

For a chat system, choose a proper network protocol is important.

on sender side, HTTP is fine. while on receiver side, the situation is tricker, cause HTTP is initiated by client, not trivial to send messages from the server.

There are many techniques to simulate a server-initiated connection:

  1. polling
  2. long polling
  3. WebSocket

Polling

polling is a technique that the client periodically ask server if there are any messages available.

It can be costy depends on its frequency.

Long polling

Client hold the connection open until there are actually new messages available or a timeout threshold has been reached.

In this method, server don’t have a good way to know if client is disconnected.

Also this is not sufficient as well, if user doesn’t chat freqently, long polling still need to rebuild the connection after timeouts.

Web Sockets

WebSocket connection is initiated by client.

It’s bi-directional and persistent. It starts its life as a HTTP connection, then can be “upgraded” via some well-defined handshake to a Websocket connection.

So in our design, we will use WebSockets for both sender and receiver side.

While since the WebSocket connections are persistent, efficient connection management is critical on server side.

High Level Design

chat service is stateful because each clients maintains a persistent network connection to a chat server.

  • chat service facilitate message sending/receiving
  • presence service manage online/offline status
  • K-V store is used to store the chat history. When offline user come to online, she will see all her previous chat history.

why we choose K-V store as data layer?

It’s important to understand the R/W pattern of chat history database.

  • only recent chats are accessed frequently
  • handles key-word search
  • R/W ratio is about to 1:1

The advantage of K-V store:

  1. it allows horizontal scaling
  2. low latency when accessing data
  3. can easily build index

message id

message id is an interesting topic, since it carris the responsiblity of ensuring the order of messages.

We can leverage local ID incrementor to implement this by allocating the channel on same machine.

Step3: Design deep dive

It’s interesting tjo understand the e2e flow of a chat system.

In this section, we will explore 1 on 1 chat flow.

  1. user A send a chat message to chat server1
  2. chat server 1 obtains a message ID from ID generator
  3. chat server 1 send the message to message sync queue
  4. message is stored into K-V store
  5. if user B is online, message is forwarded to chat server 2 where B is connected
  6. if user B is offline, we only store the message into K-V store
    1. when user B come to online, device will fetch the latest message from DB

Online Presense

When user login to our system and build the WebSocket connection, userA’s online status and active_ts is stored in database.

We also introduce heartbeat mechinism to determine if user has disconnected with our system

Step4: Wrap Up

Serveral important points have discussed:

  • We choose WebSocket as communication protocol
  • We choose K-V store as data layer
  • We enable the message sync mechanism when user come to online

some topic which may discuss more

  • error handling
  • how to support media in message
  • message retry mechanism etc.