Spotify
- Clarifying Questions:
- Spotify:
- Songs/music
- Playlists
- Users
- Artists
- Podcasts
- Use cases:
- Finding and Playing Music
- High Level (Metrics)
- Numbers:
- Users: 1 Billions
- Songs: 100 million
- MP3 audio file - 5MB
- Total audio: 500 TB
- 3x replication -> 1.5PB
- 100B per song metadata -> 10 GB songs
- 1KB per user >> 1TB
- Design
- Spotify App
- Load Balancer
- Spotify web-server (multiple)
- Database
- Song Audio DB (AWS s3) files are blob data, immutable. Scales linearly. Usually just read data.
- Metadata (users, songs, artists, …) DB (AWS RDS)
- Songs table:
- sond_id
- song_url (sharing)
- artist
- genre
- link to album cover
- audio link
- Run through use case for finding music
- Use CDN as cache (AWS cloudfront)
- Load balancing
- Make sure web servers are not overloaded (a lot of requests coming in, network bandwidth)
- Consider multiple metrics when applying load balancers
- Replication
- For events where we have data outtages
- Place replicas close to users. Eg. BTS data close to Korea. Geo-aware strategy of data.