System Design Case Study: Chat System
Step1: Scope the problem
the most important thing is to understand which types of chat system to design. either one-on-one or group chat.
other important questions to ask:
- scale?
- content type supported
- how long shall we store the chat history.
Step2: high level design
Each client connect to chat service, which supports all the features mentioned above.
For a chat system, choose a proper network protocol is important.
on sender side, HTTP is fine. while on receiver side, the situation is tricker, cause HTTP is initiated by client, not trivial to send messages from the server.
There are many techniques to simulate a server-initiated connection:
- polling
- long polling
- WebSocket
Polling
polling is a technique that the client periodically ask server if there are any messages available.
It can be costy depends on its frequency.
Long polling
Client hold the connection open until there are actually new messages available or a timeout threshold has been reached.
In this method, server don’t have a good way to know if client is disconnected.
Also this is not sufficient as well, if user doesn’t chat freqently, long polling still need to rebuild the connection after timeouts.
Web Sockets
WebSocket connection is initiated by client.
It’s bi-directional and persistent. It starts its life as a HTTP connection, then can be “upgraded” via some well-defined handshake to a Websocket connection.
So in our design, we will use WebSockets for both sender and receiver side.
While since the WebSocket connections are persistent, efficient connection management is critical on server side.
High Level Design
chat service is stateful because each clients maintains a persistent network connection to a chat server.
- chat service facilitate message sending/receiving
- presence service manage online/offline status
- K-V store is used to store the chat history. When offline user come to online, she will see all her previous chat history.
why we choose K-V store as data layer?
It’s important to understand the R/W pattern of chat history database.
- only recent chats are accessed frequently
- handles key-word search
- R/W ratio is about to 1:1
The advantage of K-V store:
- it allows horizontal scaling
- low latency when accessing data
- can easily build index
message id
message id is an interesting topic, since it carris the responsiblity of ensuring the order of messages.
We can leverage local ID incrementor to implement this by allocating the channel on same machine.
Step3: Design deep dive
It’s interesting tjo understand the e2e flow of a chat system.
In this section, we will explore 1 on 1 chat flow.
- user A send a chat message to chat server1
- chat server 1 obtains a message ID from ID generator
- chat server 1 send the message to message sync queue
- message is stored into K-V store
- if user B is online, message is forwarded to chat server 2 where B is connected
- if user B is offline, we only store the message into K-V store
- when user B come to online, device will fetch the latest message from DB
Online Presense
When user login to our system and build the WebSocket connection, userA’s online status and active_ts is stored in database.
We also introduce heartbeat mechinism to determine if user has disconnected with our system
Step4: Wrap Up
Serveral important points have discussed:
- We choose WebSocket as communication protocol
- We choose K-V store as data layer
- We enable the message sync mechanism when user come to online
some topic which may discuss more
- error handling
- how to support media in message
- message retry mechanism etc.