Machine Learning Explained: Supervised, Unsupervised, and Reinforcement Learning

Machine learning is the subfield of AI where systems learn patterns from data rather than following hand-coded rules.
There are three main types: supervised learning (learn from labelled examples), unsupervised learning (find structure without labels), and reinforcement learning (learn by trial and error with reward signals).
Most commercial AI you encounter — spam filters, recommendation systems, fraud detection, speech recognition — is supervised learning.
Unsupervised learning is how language models are pre-trained on raw text before fine-tuning.
Reinforcement learning drives game-playing AI like AlphaGo and is a key component of modern chatbot training through RLHF.

What does “learning from data” mean?

Traditional software is written rule by rule. If a developer wants a program to identify email spam, they write rules: “if the message contains ‘urgent’ and ‘wire transfer’, flag it.” This works for small, well-understood problems. It fails for anything complex — the rules explode in number and still miss edge cases.

Machine learning flips the script. Instead of writing rules, you give the system many examples — say, ten million emails each labelled as spam or not — and a flexible mathematical model (often a neural networks). The model finds the statistical patterns that distinguish the two classes. When a new email arrives, the model applies those learned patterns to classify it. As Tom Mitchell’s classic definition puts it: a program learns if its performance at a task improves with experience.

Supervised learning

Supervised learning is the most common flavour of machine learning and the workhorse of commercial AI. The “supervision” comes from labels — each training example is paired with the correct answer. The model’s job is to learn the mapping from input to output.

Classification vs. regression

Supervised learning splits into two problem types. Classification predicts a category — spam/not spam, cat/dog/horse, positive/negative sentiment. Regression predicts a continuous number — tomorrow’s temperature, a house’s market price, a patient’s blood pressure. The architecture of the model is similar; the difference is the output format and the loss function.

Where you see it

Email spam filters, credit-card fraud detection, medical image diagnosis, speech-to-text, autocomplete, face recognition on your phone, insurance claim triage — nearly every AI application visible in consumer and enterprise products is supervised learning. The limit is always the same: you need a lot of labelled data, and the labels have to be accurate. Many companies spend more on data labelling than on modelling.

Unsupervised learning

Unsupervised learning asks a different question: given a big pile of unlabelled data, can you find structure in it? There is no “correct answer” to match; the model has to extract patterns on its own.

Clustering

The classic unsupervised task is clustering — grouping similar items together. Customer-segmentation tools (“people who buy X also buy Y”) are clustering. So are tools that group news articles into topics, or genes into functional families. Common algorithms: k-means, hierarchical clustering, DBSCAN.

Dimensionality reduction

When data has hundreds or thousands of features, it becomes hard to visualize or process. Dimensionality reduction techniques like principal component analysis (PCA) and t-SNE compress data into fewer dimensions while preserving structure. This underlies much data visualization and is a common preprocessing step.

Self-supervised learning

A newer subclass, self-supervised learning, has become the dominant technique for training large language models. The idea is to turn an unlabelled dataset into a labelled one by hiding part of each example and asking the model to predict it. For text, hide every fifth word and train the model to guess it. For images, hide a patch and ask the model to fill it in. This gives you billions of “free” training examples. It is how GPT, Claude, and Gemini are pre-trained. For more, see our deep learning primer.

Reinforcement learning

Reinforcement learning (RL) is the type of machine learning closest to how humans and animals learn through experience. An agent interacts with an environment, takes actions, and receives rewards. The goal is to learn a policy — a strategy for choosing actions — that maximizes cumulative reward over time.

When to use it

RL shines in settings where you can simulate many trial-and-error attempts. Game-playing is the canonical example — DeepMind’s AlphaGo beat the world champion Go player in 2016 using a combination of supervised learning from human games and reinforcement learning via self-play. Later variants (AlphaZero, MuZero) learned solely from self-play without any human data. Robotics, autonomous driving, and resource optimization (data center cooling, recommendation ranking) are other RL domains.

RLHF — reinforcement learning from human feedback

The most commercially important use of RL today is in training large language models. After the base model is pre-trained on text, a round of reinforcement learning from human feedback (RLHF) tunes it to be helpful and safe. Human labellers rank model outputs, a reward model is trained on those rankings, and the language model is fine-tuned to produce outputs that score high. ChatGPT’s distinctive “helpful-assistant” personality largely came from RLHF.

Which type should you use?

In practice, the choice is dictated by what data you have. If you have labelled examples of the answer you want, use supervised learning. If you have piles of raw data and want to explore or pre-train, use unsupervised or self-supervised learning. If you can simulate an environment where the agent can take actions and you can define a reward, consider reinforcement learning — but be warned that RL is hard to get working reliably and requires far more engineering than supervised learning.

Modern AI systems typically use all three. A ChatGPT-style assistant is pre-trained with self-supervised learning on the web, fine-tuned with supervised learning on human-written conversations, and further refined with reinforcement learning from human feedback. For a broader look at the industry, see our ai industry coverage.

Frequently asked questions

Is machine learning the same as artificial intelligence?
Not exactly. Artificial intelligence is the broader goal of building machines that behave intelligently. Machine learning is one specific family of techniques for doing so — learn from data rather than follow hand-coded rules. Today ML is the dominant approach within AI, which is why the two terms are often used interchangeably in casual conversation. Historically, AI has also included rule-based expert systems, logic programming, and search algorithms that do not “learn” in the ML sense.

Do I need a lot of data to use machine learning?
It depends on the problem and the model. Simple problems with classical models (linear regression, decision trees) can work with thousands of examples. Modern deep learning typically wants millions of examples to shine — though techniques like transfer learning let you start from a model pre-trained on a large dataset and fine-tune it with relatively little task-specific data. For most business problems, “more is better” holds, but diminishing returns set in.

Can machine learning make mistakes?
Yes, and routinely. A machine-learning model is a statistical pattern-matcher; it makes errors when the input is unusual, when the training data was biased or incomplete, when the environment changes after deployment, and when the task requires reasoning outside the patterns it learned. This is why production machine-learning systems need monitoring, human review for high-stakes decisions, and ongoing retraining as conditions change.