DefinedTerm · Glossary

What is Hallucination (LLM)

Hallucination in a large language model (LLM) refers to the generation of output that is factually incorrect, fabricated, or unsupported by the model's training data or the context provided, yet stated with apparent confidence. The term was formalised in the NLP literature by Maynez et al. (ACL 2020) for abstractive summarisation and has since become the standard term for any LLM output that is unfaithful to verifiable facts. Hallucination rates vary significantly by task: retrieval-augmented generation pipelines reduce — but do not eliminate — hallucination by grounding responses in retrieved passages.

edu-lopez-paradaPublicado 27 May 2026Actualizado 27 May 2026

Full definition

In natural language processing, hallucination describes any model output in which the generated text contains claims that are factually wrong, internally inconsistent, or not grounded in the provided source material, despite being expressed with the fluency and confidence of accurate information. The phenomenon arises because language models are trained to produce statistically plausible sequences of tokens — a process that is orthogonal to factual accuracy.

Two primary categories are distinguished in the research literature:

Intrinsic hallucination: the model contradicts information that was explicitly present in the input context (a retrieved document, a system prompt, or user-provided data).
Extrinsic hallucination: the model introduces information that is neither supported nor contradicted by the context — it is simply generated from parametric memory, which may be outdated, under-represented in training data, or simply wrong.

A third practical category — temporal hallucination — occurs when a model states a fact that was accurate at training time but has since become false (e.g., citing a company's CEO who has since left the role).

Why it matters in 2026

For businesses that depend on accurate brand representation in AI-generated answers — a category that now includes every company with a digital presence — hallucination poses direct reputational and commercial risks. An AI assistant may cite incorrect prices, describe non-existent services, or attribute quotes to individuals who never made them.

The Huang et al. survey (arXiv 2311.05232, updated 2024) found hallucination rates in open-domain question answering ranging from 3 % to over 30 % depending on model size, domain specificity, and whether retrieval augmentation was used. For niche industries such as construction and home services, where training data is sparse, rates tend toward the higher end.

Mitigation is now a shared responsibility: AI providers reduce hallucination through RLHF, retrieval grounding, and citation enforcement; content publishers reduce it by making accurate, structured content available to RAG pipelines.

How it works

Hallucination originates in the core mechanism of autoregressive language models. At each step, the model predicts the next token based on learned probability distributions over its training corpus. When a query falls outside high-density regions of that corpus — obscure entities, recent events, highly specific numerical data — the model generates tokens that are contextually plausible but factually disconnected from ground truth.

RAG pipelines reduce this by prepending retrieved passages to the model's context window, giving it factual scaffolding. However, hallucination persists when:

The retrieved passage itself contains errors.
The model misreads or misweights the passage during generation.
The retrieved content does not cover the specific claim the model needs to make, so it supplements from parametric memory.

Difference from confabulation and error

Term	Domain	Core meaning	Intentionality
Hallucination	NLP / AI	Fluent output that departs from verifiable fact	None — statistical artefact
Confabulation	Neuroscience / psychology	False memory presented as true without intent to deceive	None — cognitive artefact
Error	General	Any incorrect output, including calculation mistakes	None — execution failure
Fabrication	General	Deliberately invented information	Intentional

LLM hallucination is closest to confabulation: neither the model nor its developers intend the output to be false.

What is Hallucination (LLM)

Full definition

Why it matters in 2026

How it works

Difference from confabulation and error

Related terms

Fuentes

Términos relacionados