AI Recommendation Systems: How Netflix and Spotify Predict

Key takeaways

Recommendation systems predict what you’ll like next — movies, songs, products, posts — based on your history, similar users, and content features.
Three broad families: collaborative filtering (find similar users), content-based (recommend similar items), and hybrid/neural approaches that combine signals.
Modern systems at Netflix, YouTube, TikTok, Spotify, and Amazon are deep-learning pipelines involving candidate generation, ranking, and diversification stages.
The business stakes are enormous — recommendation quality directly drives engagement, revenue, and retention for platforms.
Challenges include filter bubbles, cold-start users, optimization-for-engagement pitfalls, and regulatory scrutiny of algorithmic feeds.

Why recommendation matters

The catalog-to-user ratio keeps growing. Netflix has hundreds of thousands of titles; Spotify has over 100 million tracks; TikTok produces billions of videos. No user can browse this manually. Recommendation systems are how people discover new content — and how platforms capture attention. Netflix has publicly estimated that over 80% of what users watch is driven by their recommendation system, not active search.

Streaming menu on a TV, representing recommendation-driven content discovery — Photo by www.kaboompics.com on Pexels

Good recommendations improve user outcomes (finding content you love) and platform outcomes (sessions, subscriptions, purchases). Bad recommendations feel like noise. The gap between “usable” and “magical” is where companies compete. For the underlying machinery, see our machine learning primer.

Collaborative filtering

The classical approach. Find users similar to you, recommend what they liked but you haven’t seen. Netflix’s 2006-2009 $1M Netflix Prize popularized matrix factorization — decomposing the sparse user-item rating matrix into user and item embedding vectors whose dot product predicts rating.

Collaborative filtering works without knowing anything about item content — it only needs user-item interaction data. But it struggles with cold-start: a brand-new item has no interactions, so it can’t be recommended; a brand-new user has no history.

Content-based recommendation

Recommend items similar to what you’ve liked, based on item features. A book recommendation system might use genre, author, length, reading level; a music system might use instrumentation, tempo, mood. The embeddings capturing these features can be hand-engineered or learned.

Content-based methods handle cold-start better than collaborative filtering — a new item with descriptive features can be recommended immediately. The downside is serendipity: you only see items similar to ones you already know, which narrows exploration.

Hybrid and neural recommenders

Modern production systems combine signals in deep-learning architectures. Google’s 2016 YouTube paper described a deep-learning recommender with two stages: candidate generation (quickly narrow the 1-billion-video corpus to a few hundred candidates) and ranking (precisely score each candidate for this user). The architecture has since been standard across major platforms.

Features include user embeddings (learned from history), item embeddings, context (time of day, device), content features (genres, tags, embeddings of text or images), and engagement signals (dwell time, replay rate, skip rate). The model learns to predict the engagement signal that matters — click, watch time, subscription — as a function of all the features. For background on vector representations, see our embeddings primer.

The two-stage pattern

Candidate generation

Given a user, quickly find ~100-1000 items they might like out of millions. Uses approximate-nearest-neighbour search over item embeddings, collaborative filtering signals, recently interacted items, and heuristics. Must be fast — a few tens of milliseconds.

Ranking

Score each candidate precisely for this user in this context. Uses a heavier model — often a deep neural network or gradient-boosted tree — with many features. Can be slower per item because only hundreds of items are scored.

Diversification and business rules

Post-ranking, apply rules: don’t show the same creator three times, ensure topic diversity, promote new creators, demote repeated content, adhere to content-safety rules. This is where platform policy (not just accuracy) shapes what you see.

TikTok’s innovation

TikTok’s For You feed became the gold-standard demonstration of pure recommendation-driven content consumption. Unlike platforms where you follow specific accounts, TikTok’s feed is algorithmically curated from everything on the platform. The result is extremely tight user-interest profiling based on micro-behaviours — pause, replay, how long before swipe — that ingest vastly more signal than explicit follows or likes. Other platforms (Instagram Reels, YouTube Shorts, X) have adapted variants.

Common business metrics

Recommendation systems are usually optimized for engagement metrics — watch time, click-through rate, session length, return frequency. Each of these has known failure modes: watch-time optimization can push toward clickbait and outrage; CTR optimization can favour sensational but disappointing content. Thoughtful teams include satisfaction metrics (did the user return? did they skip fast? did they rate positively?) alongside raw engagement.

Long-term revenue is the deepest metric but hardest to optimize against directly. A/B tests on recommendation changes typically run for weeks to months, measuring effects that propagate through user behaviour. For deep-learning fundamentals, see our deep learning coverage.

Personalization vs. filter bubbles

Heavily personalized feeds can narrow perspectives — users see primarily content that matches their existing preferences. Whether this creates “filter bubbles” that harden political or cultural views is a contested question with mixed empirical evidence. Platforms respond variably — YouTube has adjusted its recommender to reduce borderline content amplification; X’s algorithm is more permissive; Meta has toggled between approaches.

Transparency reports and researcher access (DSA-mandated for very large platforms in the EU) are starting to produce more concrete data on how algorithmic recommendation shapes information diets.

Cold-start and exploration

New users and new items pose a recurring problem. Common solutions:

Onboarding flows that collect explicit preferences (liked genres, artists).
Content-based bootstrapping from item features.
Multi-armed bandit algorithms that balance exploitation (recommend known-good) with exploration (try something new).
Contextual bandits that use early signals to rapidly narrow uncertainty.

Privacy considerations

Recommendation systems are data-hungry. Personalization requires detailed interaction history, which raises privacy concerns. Regulations — GDPR, CCPA, India’s DPDP Act — give users rights over their profile data. Techniques like differential privacy, federated recommendations, and privacy-preserving feature stores have seen research attention but limited production deployment.

What’s next

LLM-augmented recommendation is the growing frontier. Language models can capture richer content representations, generate personalized explanations (“recommended because you liked X”), and support natural-language queries (“show me something optimistic but not silly”). Early experiments suggest meaningful quality gains for some categories, though cost per query is higher than classical recommender architectures. The next five years will likely see hybrid stacks — fast classical retrieval plus LLM ranking or explanation layers.

Frequently asked questions

Why does the algorithm keep showing me things I am not interested in?
Because it is still exploring. Recommendation systems deliberately show some novel content to learn your preferences and avoid trapping you in a narrow slice of the catalog. The system also acts on many signals, some of which are noisy — a video you watched because a friend sent it may read as genuine interest. Explicit “not interested” feedback, when the platform offers it, usually helps the model update faster.

Is it true that the algorithm is secretly manipulating me?
It is optimizing for its objective function, which is usually engagement. Whether that counts as “manipulation” depends on perspective. The systems do not have intent; they have math. The design choices — what to optimize for, what guardrails to apply — are human choices made by platform teams. Transparency reports, research access, and regulatory scrutiny are slowly making those choices more visible.

How do I opt out of algorithmic recommendations?
Varies by platform. YouTube lets you turn off watch-history-based recommendations (at the cost of worse recommendations). TikTok has a “following” feed. X offers a chronological timeline option. LinkedIn lets you sort chronologically. Most platforms resist truly opt-out defaults because algorithmic feeds produce more engagement. The EU DSA requires very large platforms to offer at least one non-personalized feed option to users.