Whatsapp is a messaging service that allows users to send and receive encrypted messages and calls from their phones and computers.
Requirements -> Core Entities -> API or Interface -> Data Flow -> High-Level Design -> Deep Dives
Functional Requirements:
- Start group chats
- Send/Receive messages
- Send/Receive media
- Access messages after I’ve been offline
- Try to keep requirements short Non-Functional Requirements:
- Delivered with low-latency < 500 ms
- Guarantee delivery of messages
- Billions of users, high throughput
- Messages not stored unnecessarily
- Fault-tolerant
Core Entities: User Chat Messages Client/Device
Latency sensitive: yes - keep going no - simple polling
Frequent Bi-Directional Communication yes - keep going no - SSE
Peer to Peer? Audio/Video yes - WebRTC no - websocket
Commands Sent:
- createChar
- sendMessage
- createAttatchment
- modifyParticipants Commands Received:
- newMessage
- chatUpdate
Make sure you let interviewer that is final design.
High-Level Design
- Build simple system first
- Client will make connection to chat server (one node for now)
- Chat Server will connect to DynamoDB (database)
- Chat Table
- id
- name
- metadata
- Chat Participant
- chatId
- participantId
- Attatchments
- data
- Messages
- id
- content
- creatorId
- timeStamp
- Inbox
- recipientId
- messageId
- Find all participants in a chat
- Find all chats in which I am a participant
- Hashtable to map clients to associated web sockets eg. clientA:WS1
- Chat Table
- s3 Blob storage for media content that is connected to chat server, but chat server still having a lot of bandwidth
- Pre-signed URL’s allowing client to directly upload from s3
- Send ack messages when messages received and delete entry from inbox
- When client connects show them messages from Inbox
Deep Dives
- How to handle billions of users
- Load balancer for web servers is fine because it is stateless
- But with websockets, we need to retain connection between client and server so we need layer 4 load balancer by creating a TCP connection. Load balancer will choose chat server with the least connections.
- We need chat servers do be able to talk to each other
- Use kafka
- Have topics for each user
- When clients connect to chat server, subscribe to that topic
- But is not good solution b/c it is not made for this scale
- Use consistent hash ring
- Chat Registry connected to client
- Zookeeper
- So that we are aware where user is
- Client will connect to chat registry to find which chat server they are connected to. Use information in zookeeper to find where user B is.
- Redis Pub/Sub
- Use kafka