Design Spotify: Music Streaming System Design Guide
Design audio streaming, recommendations, and playlist management for 500M+ users
Founder
Why Design Spotify?
Spotify tests unique challenges that differ from typical web applications: continuous media streaming, recommendation algorithms, and offline-first mobile design. It's increasingly popular in interviews at Spotify, Apple, Amazon Music, and general system design rounds.
- Audio streaming: Adaptive bitrate, buffer management, gapless playback
- Recommendations: Discover Weekly, Daily Mix, Radio - personalized at scale
- Content delivery: 100M+ songs, served globally with low latency
- Offline mode: Download management, DRM, storage optimization
Step 1: Requirements
Functional Requirements
Core features:
- Search for songs, artists, albums, playlists
- Stream audio in real-time
- Create and manage playlists
- Follow artists and users
- Personalized recommendations (home feed, Discover Weekly)
Out of scope:
- Podcasts
- Social features (group sessions)
- Artist upload portal
- Ads system (free tier)Non-Functional Requirements
Scale:
- 500M monthly active users
- 200M concurrent streams during peak
- 100M+ song catalog
- Average song: 3.5 minutes, 10MB at 320kbps
Performance:
- Song playback starts within 200ms of pressing play
- Gapless playback between songs
- Search results in < 300ms
- 99.9% availability
Key insight: This is a READ-heavy streaming system.
Audio files are immutable - perfect for aggressive caching.Step 2: Capacity Estimation
Storage:
- 100M songs x 10MB average = 1 PB of audio files
- Multiple quality levels (96, 160, 320 kbps) = ~3 PB total
- Metadata: 100M songs x 1KB = 100 GB
Bandwidth:
- 200M concurrent streams x 320kbps = 64 Tbps peak
- This is enormous - CDN is absolutely essential
Daily streams:
- 500M users x average 30 min/day = 250M hours of audio/day
- ~4B individual song plays per dayStep 3: High-Level Architecture
┌─────────────────────────────────────────────────────────────┐
│ CLIENT APPS │
│ (iOS, Android, Web, Desktop) │
│ Local cache, playback engine, offline storage │
└───────────────────────┬─────────────────────────────────────┘
│
┌───────────┼───────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌────────┐ ┌──────────┐
│ CDN │ │ API │ │ Search │
│ (audio) │ │Gateway │ │ Service │
└──────────┘ └───┬────┘ └──────────┘
│
┌─────────────┼─────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Catalog │ │ Playlist │ │ Reco │
│ Service │ │ Service │ │ Engine │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Metadata │ │ Playlist │ │ ML │
│ DB │ │ DB │ │ Pipeline │
└──────────┘ └──────────┘ └──────────┘Step 4: Deep Dive - Audio Streaming
Audio streaming is NOT like video streaming (no visual seeking).
Key differences from video:
- Audio files are small (3-10MB vs 1-10GB for video)
- Gapless playback matters (no buffering between songs)
- Bitrate adaptation is simpler (3 levels vs many for video)
Streaming flow:
1. Client requests song → API returns CDN URL + metadata
2. Client starts HTTP range request to CDN
3. CDN serves from edge cache (cache hit rate: 95%+)
4. Client buffers 10-30 seconds ahead
5. Adaptive bitrate: switch quality based on connection speed
File format:
- Ogg Vorbis (Spotify's primary) at 96/160/320 kbps
- Files pre-encoded and stored at all quality levels
- Each song stored as ~10 second chunks for efficient seeking
Gapless playback:
- While current song plays, pre-fetch next song's first chunk
- Start decoding next song before current ends
- Crossfade or gap removal at the audio decoder level
Client-side caching:
- LRU cache of recently played songs (configurable size)
- Downloaded songs for offline (encrypted with device key)
- Reduces CDN bandwidth by ~30%Step 5: Deep Dive - Search
Search across 100M songs, 10M artists, billions of playlists.
Architecture:
- Elasticsearch cluster for full-text search
- Separate indexes: songs, artists, albums, playlists, users
- Custom ranking that combines text relevance + popularity
Search ranking signals:
1. Text match quality (exact > prefix > fuzzy)
2. Popularity (stream count, follower count)
3. Recency (newer releases boosted slightly)
4. Personalization (artists you listen to ranked higher)
5. Region (local artists boosted in their country)
Typeahead / autocomplete:
- Trie-based index for prefix matching
- Updated every few hours from search logs
- Top 5-10 suggestions in < 50ms
Handling misspellings:
- Levenshtein distance for fuzzy matching
- n-gram tokenization (break "metallica" into "met", "eta", "tal"...)
- Phonetic matching (Soundex/Metaphone) for name searchesStep 6: Recommendation Engine
Spotify's recommendations power Discover Weekly, Daily Mix, and Radio.
Three approaches combined:
1. Collaborative Filtering
"Users who liked X also liked Y"
- Build user-song matrix (500M users x 100M songs - sparse)
- Matrix factorization (ALS algorithm) to find latent features
- Find similar users, recommend their top songs
2. Content-Based Filtering
"Songs that sound like what you play"
- Audio features: tempo, key, energy, danceability, valence
- Extracted via ML models from raw audio
- Recommend songs with similar audio fingerprints
3. Natural Language Processing
"What people say about this music"
- Crawl blogs, reviews, social media for music descriptions
- Build word vectors for each song/artist
- Match user taste profile to song descriptions
Discover Weekly pipeline:
1. Run collaborative filtering weekly (batch job)
2. Filter: remove songs user already knows
3. Filter: ensure genre diversity
4. Rank by predicted listen probability
5. Generate 30 songs per user
6. Cache results, serve Monday morning
Infrastructure:
- Apache Spark for batch recommendation jobs
- Feature store (user listening history, song features)
- A/B testing framework for algorithm improvementsStep 7: Playlist Service
Playlists are surprisingly complex at scale:
- Billions of playlists
- Collaborative playlists (multiple editors)
- Ordering, deduplication, version history
Data model:
- Playlist metadata: id, name, owner, description, cover_image
- Playlist tracks: ordered list of (track_id, added_by, added_at)
- Store as ordered array in database
Collaborative playlists:
- Operational Transform or CRDT for concurrent edits
- In practice: last-write-wins with conflict resolution
- Version history allows undo
Popular playlists (millions of followers):
- Cache aggressively (read-heavy)
- Update propagation via pub/sub when playlist changes
- Eventual consistency is fine (30-second delay acceptable)Key Takeaways for the Interview
- Audio ≠ video: Different streaming challenges - gapless playback, smaller files, simpler bitrate adaptation
- CDN is critical: 95%+ cache hit rate for audio. Pre-encode at all quality levels
- Three recommendation approaches: Collaborative filtering + content-based + NLP, combined
- Client caching saves bandwidth: Local LRU cache reduces CDN load by ~30%
- Search is multi-signal: Text relevance + popularity + personalization + region
Practice This on HireReady
Streaming system design questions appear at Spotify, Apple, Netflix, and YouTube. Practice talking through your design with our AI voice interviewer.
Article Details
This guide is part of HireReady's interview prep library and is maintained to reflect current hiring practices.
Further Reading
Keep Reading
Design Uber: Ride-Sharing System Design Guide
Master the Uber system design interview. Learn to design ride matching, real-time location tracking, surge pricing, and ETA calculation at scale.
Read moreDesign Twitter: A Step-by-Step System Design Walkthrough
Master the Twitter system design interview. Learn to design the feed, handle celebrity users, scale tweet storage, and implement real-time notifications.
Read moreDesign Instagram: A Step-by-Step System Design Walkthrough
Master the Instagram system design interview. Learn to design photo sharing, news feed generation, story features, and scale to billions of users.
Read more