Hallucinations in AI: Why LLMs Make Things Up and How to Fix It

A hallucination is a confident-sounding model output that is factually wrong, unsupported, or fabricated.
Hallucinations are not bugs in the conventional sense — they are a predictable consequence of how LLMs generate text by sampling from learned statistics.
Two broad types: intrinsic (output contradicts the input or prior turn) and extrinsic (output invents facts the user cannot verify).
The most effective mitigation is grounding — retrieval-augmented generation, tool use, and explicit citation.
No current technique eliminates hallucinations entirely. Any production LLM workflow needs verification for high-stakes outputs.

Why LLMs hallucinate

A language model is trained to predict the most likely next token given a context. It has no fact-checker, no awareness of what it knows versus what it is guessing, and no built-in uncertainty signal. When asked about something it has seen many times in training, it produces accurate text. When asked about something rare or absent from training, it produces something statistically plausible — which often means confident-sounding prose built from the right shapes of words but the wrong facts.

This is fundamentally different from how humans get things wrong. A human who does not know a fact usually says so, or signals uncertainty through hedging. An LLM has learned that confident, specific responses are rewarded in training data and through RLHF, so it defaults to confidence even when fabricating. For the underlying models, see our large language models primer.

Types of hallucinations

Intrinsic hallucinations

The output contradicts the input. You paste an article into the context and ask for a summary, and the summary misstates a number from the article. Or a chatbot contradicts what it said two turns ago. These are failures of fidelity to the context window.

Extrinsic hallucinations

The output invents facts that cannot be verified from the input. Classic examples: fabricated academic paper citations, non-existent URLs, invented legal cases, made-up biographical details, wrong statistics. The model is filling an output slot with something plausible-looking.

Reasoning hallucinations

The model walks through a chain of reasoning that looks correct but contains errors. Each individual step seems valid, but the conclusion is wrong because one of the steps depended on a fabricated premise.

Why they are hard to fix

Modern training objectives reward fluency and confidence. A model that constantly said “I don’t know” would be penalized during RLHF because users rated those responses lower. So models learn to produce something — anything — that resembles a good answer. The economic incentive and the technical incentive both point away from expressed uncertainty.

Even if we wanted to train models to say “I don’t know”, they have no reliable internal signal for when they should. Recent interpretability work has started extracting uncertainty estimates from model internals, but this is still research, not production.

Mitigations that actually help

Retrieval-augmented generation

The single biggest improvement for factual tasks is grounding the model in retrieved documents. If the model has the right facts in its context window, it is far less likely to invent alternatives. See our rag explainer for the full pipeline. RAG does not eliminate hallucinations — the model can still ignore or misrepresent retrieved context — but it dramatically reduces them on factual questions.

Tool use and search

Letting the model call a search engine, a calculator, or a domain-specific API shifts the factual workload from the model’s memory to the tool. The model then reports what the tool returned. Modern agents rely heavily on this pattern.

Structured output and constraints

When you constrain the model’s output to a schema (JSON, SQL, a specific function signature), hallucinations have fewer places to hide. The model is forced to fill specific fields, and schema validation catches obvious failures.

Self-verification

Generate an answer, then prompt the model to critique its own output, check for internal consistency, or flag claims that might be wrong. Multiple sampling followed by majority vote or a separate verification model can catch a surprising fraction of hallucinations.

Citation requirements

Ask the model to cite its sources inline. Then verify the citations (many will be fake or wrong). This is the approach used by search-oriented AI products (Perplexity, You.com, and assistants with web search enabled).

Lower temperature on factual tasks

Sampling at low temperature produces more deterministic, conservative outputs. For creative writing, high temperature adds diversity. For factual QA, low temperature reduces hallucinations modestly.

Explicit “I don’t know” permission

Adding to the prompt “If you do not know the answer, say ‘I don’t know’ rather than guessing” helps some — but only moderately. Models trained to be helpful resist admitting ignorance even when instructed.

What doesn’t help much

Telling the model to be careful. Vague instructions like “be accurate” have small effects.
Larger models alone. Scale reduces the rate of hallucinations modestly but does not eliminate them. Frontier models in 2025 still hallucinate confidently on niche facts.
Sycophantic agreement. When users correct the model, it often “confesses” it was wrong — sometimes falsely, because the user was wrong and the model was right.
Prompt tricks alone. There is no magic incantation that stops hallucinations. Structural fixes (grounding, tools, verification) beat clever prompting.

Domain-specific hallucination risks

Legal

Fabricated case citations have caused real embarrassment. Lawyers in multiple jurisdictions have been sanctioned for filing briefs that cited non-existent cases generated by ChatGPT. Legal workflows need mandatory citation verification.

Medical

Invented drug interactions, dosages, or study references can be dangerous. Medical AI systems typically use heavy RAG over curated sources like UpToDate, PubMed, and FDA labels.

Code

LLMs routinely hallucinate functions or packages that do not exist, or call real functions with invented parameters. IDE integrations that execute or type-check code in the loop catch many of these.

Financial and numerical

Arithmetic is a weak spot. A model can confidently state a wrong multiplication or percentage. Using code execution or a calculator for quantitative work is far more reliable than having the model compute in its head.

The safety angle

From a safety perspective, hallucinations are a reliability problem, not an alignment problem in the narrow sense. A model might hallucinate perfectly innocuous content (wrong trivia) or dangerous content (wrong medical advice) depending on context. The distinction matters for deployment — high-stakes domains require verification layers even when the base model is well-aligned. For more, see our ai safety coverage.

Frequently asked questions

Why does the model make up URLs and citations so confidently?
Citations and URLs are highly structured — a valid URL looks like arXiv.org/abs/XXXX.XXXXX, a valid paper citation has a specific format. Models learn these structures during training. When asked for a citation the model does not actually remember, it generates something that matches the format — arxiv.org/abs/ followed by plausible-looking digits. The output looks legitimate because the template is correct, but the content is invented. This is why verification (clicking the URL, checking the paper actually exists) is essential.

Are newer models hallucinating less?
Rates are trending down, particularly when RAG and tool use are in the loop. Frontier models like GPT-5, Claude Opus 4.7, and Gemini 3 hallucinate measurably less than GPT-3.5 did — but the rate is still meaningful on factual tasks, and the tail of confident-wrong answers remains. Research benchmarks like TruthfulQA and FreshQA track progress, but real-world hallucination rates depend heavily on the use case and how well the system is grounded.

Should I trust any LLM output for facts?
Trust with verification. LLMs are reliable for low-stakes or easily-verifiable claims, and useful as first drafts that a human checks. For high-stakes domains — medical, legal, financial, safety-critical — every factual claim needs verification against an authoritative source. Production systems handle this with retrieval, citation requirements, and human review layers.