Neural Networks Explained: How AI Systems Learn Patterns

A neural network is a machine-learning model loosely inspired by the brain — layers of simple computing units (“neurons”) connected by weighted links.
Neural networks do not follow hand-written rules. They learn their weights from examples, through a process called training.
Training uses a feedback loop: make a prediction, measure the error, adjust every weight a tiny bit, and repeat — often millions of times. This is called backpropagation.
Deeper networks (more layers) can represent more complex patterns, but need more data and compute to train well.
Nearly every headline AI system today — ChatGPT, Midjourney, self-driving perception — is built on neural networks with many layers, a field called deep learning.

What a neural network actually is

The name “neural network” suggests something biological, but the metaphor is loose. A modern neural network is a mathematical function — you put numbers in, numbers come out — built from many small components arranged in layers. The components are called neurons or units; each one takes several input numbers, multiplies them by weights, adds them up, applies a simple nonlinear transformation, and passes the result on.

The key idea is that the behaviour of the whole network is controlled entirely by its weights — the numbers attached to each connection. A small network has thousands of weights. A large language model has hundreds of billions. Every weight is learned from data. Nobody hand-sets them.

Layers: input, hidden, output

Neural networks are organized into layers, as Google’s ML crash course explains. The input layer receives raw data — pixel values for an image, tokens for text, sensor readings for a robot. The hidden layers, between input and output, do the actual transformation: each layer takes what the previous layer produced and extracts a slightly more abstract feature. The output layer produces the final answer — a classification, a number, a sequence of words.

“Deep” learning is simply shorthand for neural networks with many hidden layers. A shallow network might have one or two; a vision model or language model might have dozens or hundreds. See our deep learning primer for more on why depth matters.

How neural networks learn

Training a neural network is a numerical optimization problem. You hand the network many examples of input-output pairs (say, photographs labelled with what they contain) and it searches for the weight values that make its outputs match the correct answers as closely as possible. The process has three repeating steps.

Step 1: forward pass

Feed an input through the network with whatever weights it currently has. Numbers flow from input, through each hidden layer, to the output. The output is the network’s current guess. Early in training, this guess is essentially random.

Step 2: measure the error

Compare the network’s guess to the correct answer using a loss function. For a classification task (cat vs. dog), a common loss is cross-entropy. For a regression task (predict a number), a common loss is mean squared error. The loss is a single number that quantifies “how wrong was the network on this example?”.

Step 3: backpropagation and weight update

Here is the magic. Using calculus (the chain rule), backpropagation computes, for every single weight in the network, how much that weight contributed to the error. Then an optimizer — typically a variant of stochastic gradient descent — nudges each weight in the direction that would reduce the error, by a small amount called the learning rate.

Repeat steps 1–3 for every example in the training dataset, many times over. Each pass through the entire dataset is called an “epoch”. A well-designed network trained on enough data eventually converges to weights that make accurate predictions, even on inputs it never saw during training.

The components that matter

Activation functions

Without a nonlinear transformation at each neuron, a stack of layers would collapse into a single linear operation — useless for most interesting problems. Modern networks use activation functions like ReLU (rectified linear unit), GELU, and Swish. These introduce the nonlinearity that lets the network model curves, boundaries, and patterns that a straight line could never capture.

Architecture

How the layers are arranged determines what the network is good at. Convolutional neural networks (CNNs) dominate image tasks — their architecture encodes an assumption that nearby pixels matter more than distant ones. Recurrent networks and transformers dominate sequence tasks like language — transformers in particular power nearly every modern large language model. See our transformers explainer for the architecture that made ChatGPT possible.

Training data

The quality, quantity, and diversity of training data shape the network’s abilities more than any other factor. A network trained only on daytime driving footage will perform poorly at night. A language model trained only on English will struggle in Swahili. The principle — “garbage in, garbage out” — is older than AI, but neural networks amplify it because they imitate the statistics of their training set faithfully.

Why neural networks took over

For decades, neural networks were one of many competing AI techniques, often beaten by simpler methods. That changed in 2012, when a deep CNN called AlexNet won the ImageNet image-recognition competition by a wide margin. Three ingredients came together: large labelled datasets, graphics processing units (GPUs) fast enough to train deep networks, and improved training techniques like dropout and ReLU activations.

Since 2012, the scale of state-of-the-art networks has grown by roughly a million-fold in parameter count and training compute. Today’s frontier models — GPT-5, Claude Opus 4, Gemini 3 — are giant neural networks trained on a large fraction of the public web. They are still neural networks, still trained with backpropagation, still following the same basic template. The difference is scale, and the engineering tricks that make scale work. For a broader view of the industry, see our machine learning coverage.

What neural networks do well — and don’t

Neural networks excel at problems where the pattern is subtle, data is abundant, and explicit rules are hard to write. Image recognition, speech recognition, machine translation, game playing, protein-structure prediction — all classic success stories.

They struggle when data is scarce, when the task requires precise symbolic reasoning (long arithmetic, formal logic), or when distribution shifts between training and deployment mean the real world no longer matches what the network saw during training. Neural networks also make confident mistakes — a phenomenon particularly visible in large language models, which can produce fluent but factually wrong statements.

Frequently asked questions

Do neural networks actually work like the human brain?
Only in a very loose sense. Biological neurons are vastly more complex than their software analogues — they use electrochemical signalling, have thousands of synaptic inputs, and learn through mechanisms that are still not fully understood. The “neural” in neural networks refers to the historical inspiration of the 1940s, not a claim of equivalence. A useful mental model is that neural networks borrow the idea of connected simple units, then optimize them with math that has no biological counterpart.

How much data does a neural network need?
It depends heavily on the task and the network size. A small network solving a narrow problem can learn from a few thousand labelled examples. Modern large language models are trained on trillions of tokens of text — a meaningful fraction of everything ever written on the public internet. The rough rule is that bigger networks can absorb and benefit from more data, which is why frontier AI labs have invested so heavily in data pipelines.

What is the difference between a neural network and deep learning?
Deep learning is a subset of neural-network research focused on networks with many layers. Before about 2010, most neural networks were “shallow” — one or two hidden layers — because deeper networks were hard to train. Advances in optimization, hardware, and architecture design unlocked deep networks, and the resulting progress was so dramatic that “deep learning” became the dominant name for the field. In practical terms today, when someone says “neural network” they almost always mean a deep neural network.