Database Selection
1. MessageDb
Requirements of MessageDb database are as follows:
- Huge data volume (billions of messages)
- Mostly recent chats are accessed frequently
- Users should be able to random ascess the chats which is useful for feature like
search, jump to specific messages, view your mentionsetc. - Read-to-write ratio is roughly 1:1 (for 1:1 chats)
Best-to-use: Best Choice: NoSQL (Wide-Column DB like Cassandra / HBase)
Why Columar Db (Need to Verify, when CassandraDb is well understood)
-
High write throughput
- Messaging systems are write-heavy → Cassandra handles massive writes efficiently
-
Efficient time-based queries
- Messages are usually fetched like:
"Give me last N messages of a chat"- Wide-column DBs are great for range queries on sorted keys
-
Horizontal scalability
- Easy to shard data across nodes (based on user_id / chat_id)
-
Partitioning support
- Example:
Partition Key = chat_id
Clustering Key = timestamp (or message_id)- This ensures:
- Messages of a chat are stored together
- Already sorted → fast reads
messageId bigInt
from_userId bigInt
to_userId bigInt
media_url text
media_metadata text
message text
created_at timestamp
status sent| delivered| read
Note: Since, messages need to be sorted along time, and 2 messages could be created on same time, We can additionaly avoid this, by allocating messageId in a chronological order. Such that Id's allocated at a given time t will always be greater than t - 1. Eg. Snowflake Ids
2. GroupsDB
- Simple Queries
- Get all users of a group
- Update Group Metadata
Best-to-use: NoSql
{
"group_id" : "",
"group_members": [
"user_1",
"user_2",
...
],
"metadata":{
"title": "",
"description": "",
...
}
}
3. Last-Seen
- Very High Write Volume: Writes are done after every heart-beat period for all connected users.
- Simple Query
Best-to-use: NoSql
{
"user_id" : "",
"last_seen": ""
}
4. Ws-Connection-Info
- Persistence is not strict: If data is lost, we will just kickstart a new Websocket connection
- Ultra low latency reads/writes.
Best-to-use: In-Memory Database like Redis
{
"user_id": "websocket_session_id",
}