AI Embeddings Explained: How Machines Represent Meaning

An embedding is a list of numbers — a dense vector — that represents the meaning of a word, sentence, image, or other item in a way machines can compare.
Items with similar meaning land close together in vector space; the distance between vectors is a proxy for semantic similarity.
The modern era began with Google’s word2vec in 2013, which famously placed analogies like “king − man + woman ≈ queen” in geometry.
Similarity is usually measured with cosine similarity — the angle between two vectors, ranging from −1 to 1.
Embeddings power semantic search, retrieval-augmented generation (RAG), recommendation systems, clustering, and deduplication; storing and querying them at scale is the job of vector databases.

What is an embedding?

An embedding is a dense numerical vector — typically hundreds or thousands of floating-point numbers — that encodes the meaning of an item so that semantically similar items have similar vectors. Instead of treating “dog” and “puppy” as unrelated symbols, an embedding places them close together in a high-dimensional space, letting software reason about meaning rather than exact characters.

The contrast is with older “one-hot” or keyword representations, where every word is an isolated token with no notion of relatedness — “cat” and “feline” would be as different as “cat” and “bulldozer.” Embeddings instead come from neural networks trained on large text corpora, so meaning is learned from context. A modern model such as OpenAI’s text-embedding-3-small outputs vectors of 1,536 dimensions by default, while many open-source sentence models produce 384- or 768-dimensional vectors. Embeddings are foundational to most modern natural language processing.

A short history: from word2vec to sentence models

Embeddings became practical in 2013 when a Google team led by Tomas Mikolov released word2vec, a shallow neural network that learned word vectors from raw text far faster than earlier methods. Its key insight — that a word’s meaning is defined by the company it keeps — turned the decades-old “distributional hypothesis” into an efficient, widely usable tool.

word2vec and the analogy trick

The word2vec paper showed that simple vector arithmetic captured relationships: subtract the vector for “man” from “king,” add “woman,” and the nearest vector is “queen.” Stanford’s GloVe (2014) achieved similar results by factorising word co-occurrence statistics. These were static embeddings, however — every occurrence of “bank” got the same vector whether it meant a riverbank or a financial institution.

Contextual and sentence embeddings

The 2018 arrival of transformer models like BERT produced contextual embeddings, where the same word gets different vectors depending on the surrounding sentence. Building on this, Sentence-BERT (Reimers and Gurevych, 2019) made it efficient to embed whole sentences, reporting in its paper that it reduced the time to find the most similar sentence pair “from 65 hours with BERT” to about five seconds while preserving accuracy. Today’s commercial and open models embed paragraphs, code, and images this way.

How similarity is measured

Embedding similarity is almost always measured with cosine similarity, which compares the angle between two vectors rather than their length. The score ranges from 1 (pointing the same direction, very similar) through 0 (unrelated, orthogonal) to −1 (opposite). Because it ignores magnitude, cosine similarity focuses purely on semantic direction, which is what matters for meaning.

Cosine, dot product, and Euclidean distance

The three common metrics are cosine similarity, the dot product, and Euclidean (straight-line) distance. When embeddings are normalised to unit length — a common practice — cosine similarity and dot product become equivalent, and ranking by cosine matches ranking by Euclidean distance. In practice, most semantic-search systems normalise vectors and use cosine or dot product because they are fast to compute across millions of items.

What embeddings power

Embeddings underpin nearly every system that needs to compare meaning at scale: semantic search, recommendation, retrieval-augmented generation, clustering, classification, and deduplication. The recipe is consistent — encode everything into vectors once, then answer queries by finding the nearest vectors. This “encode then retrieve” pattern is the workhorse of applied AI in the 2020s.

Semantic search and RAG

In semantic search, a query is embedded and compared against a corpus of pre-embedded documents, returning matches by meaning rather than keyword overlap — so a search for “how to fix a flat tyre” can surface a document titled “repairing a punctured wheel.” This same retrieval step is the backbone of retrieval augmented generation, where relevant chunks are fetched and fed to a language model so it answers from grounded source material instead of memory alone.

Recommendation and clustering

Recommendation systems embed users and items so that “users who liked this also liked that” becomes a nearest-neighbour lookup. Spotify, YouTube, and large e-commerce platforms publicly describe embedding-based retrieval as a core component of their recommenders. Embeddings also enable unsupervised clustering — grouping news articles by topic, or de-duplicating near-identical support tickets — by treating closeness in vector space as topical similarity.

Beyond text: image, audio, and multimodal embeddings

Embeddings are not limited to language. The same idea applies to images, audio, and video, and multimodal models embed different media into a shared space so that text and images can be compared directly. OpenAI’s CLIP, published in 2021 and trained on roughly 400 million image-caption pairs, places a photo of a dog and the words “a photo of a dog” near each other in one common vector space. That shared geometry is what lets you search a photo library with a text query, caption images automatically, or rank visual results by how well they match a prompt — all reduced to the same nearest-neighbour comparison used for text. Audio embeddings power similar tasks in music recommendation and voice search, while code embeddings let developers search a codebase by intent rather than exact symbol names.

Storing and searching embeddings at scale

Once a corpus is embedded, the practical challenge is finding nearest neighbours among millions or billions of vectors fast enough to be useful. Brute-force comparison is too slow at scale, so specialised infrastructure uses approximate nearest-neighbour algorithms that trade a little accuracy for enormous speed gains.

Algorithms such as HNSW (Hierarchical Navigable Small World graphs) and IVF indexes can return the closest matches in milliseconds across very large collections. This is the role of vector databases — systems like Pinecone, Weaviate, Qdrant, Milvus, and pgvector that index embeddings and serve similarity queries with filtering and metadata. Choosing an embedding model and a similarity-search index together is the central engineering decision in building semantic-search and RAG applications.

Frequently asked questions

What exactly is an embedding in AI?
An embedding is a dense list of numbers — a vector — that represents the meaning of a word, sentence, image, or other item. A neural network learns to position similar things close together in this high-dimensional space, so machines can measure relatedness by computing the distance between vectors. Embeddings let software reason about meaning rather than exact text, which is why they underpin semantic search, recommendation systems, and retrieval-augmented generation.

How is similarity between two embeddings measured?
Most systems use cosine similarity, which measures the angle between two vectors and returns a score from 1 (very similar) to −1 (opposite), with 0 meaning unrelated. Because cosine ignores vector length and focuses on direction, it captures semantic similarity well. When vectors are normalised to unit length, cosine similarity and the dot product give the same ranking, and many production systems use whichever is faster for their hardware.

What was word2vec and why did it matter?
Word2vec, released by a Google team led by Tomas Mikolov in 2013, was an efficient neural method for learning word vectors from large text corpora. It mattered because it made high-quality word embeddings cheap to train and revealed that vector arithmetic could capture analogies — famously, “king − man + woman” lands near “queen.” Word2vec kicked off the modern embedding era and influenced nearly every later representation-learning method in natural language processing.