Skip to content
Made For Builders iconoMade For Builders

DefinedTerm · Glossary

What is llms.txt

llms.txt is a plain-text file placed at a website's root that gives large language models a structured, markdown-formatted index of the site's most important content. Proposed by Jeremy Howard in September 2024 as an emerging standard, it had been adopted by over 844,000 sites by November 2025 according to BuiltWith. However, a ten-week Search Engine Land experiment found no detectable crawling of the file by any of the four major AI engines.

edu-lopez-paradaPublicado Actualizado

Full definition

llms.txt is a markdown-formatted file placed at the root of a domain (e.g., https://example.com/llms.txt). It provides large language models with a curated index of the site's canonical content, organized into named sections with links to the corresponding markdown versions of each page.

The specification, proposed by Jeremy Howard in September 2024 on llmstxt.org, defines two required elements: an H1 heading containing the site name and a short descriptive paragraph. Everything else — thematic sections, resource links, and descriptions — is optional but encouraged.

Unlike robots.txt, llms.txt does not govern crawler access. Its purpose is editorial: it signals which content the site owner considers most authoritative for AI training or citation, in a format models can consume without rendering JavaScript.

Why it matters in 2026

The adoption story is significant, but the impact data is more nuanced. BuiltWith reported more than 844,000 sites serving llms.txt by November 2025, making it one of the fastest-adopted web standards in recent years. Yet a controlled ten-week experiment published by Search Engine Land, monitoring 50 sites, detected no HTTP requests to the llms.txt file from ChatGPT, Perplexity, Claude, or Gemini during the observation window.

A large-scale Ahrefs study by Linehan and Guan (75,000 brands, December 2025) found that llms.txt presence correlated with AI visibility at just 0.127 — the weakest individual signal among the top eight factors studied. For reference, having an active YouTube channel correlated at 0.737.

The practical implication: implementing llms.txt carries negligible cost and may provide future benefit as AI crawlers evolve, but it should not substitute for content quality, structured data, or citation-readiness.

How it works

  1. Create a file named llms.txt at the root of the domain.
  2. Open with an H1 containing the site name, followed by a brief description paragraph.
  3. Add thematic sections (H2 headings) grouping related resources.
  4. List resources as markdown links, each pointing to a publicly accessible markdown version of the page.
  5. Optionally serve an extended llms-full.txt that includes the full content inline rather than links.

Each linked URL should resolve to a clean markdown response — not HTML — so that LLMs can ingest the content without a rendering layer.

Difference from robots.txt and sitemap.xml

FilePrimary functionWho reads itStatus in 2026
robots.txtControls crawler access to URLsAll web crawlersMature standard, universally respected
sitemap.xmlURL index for search-engine indexingSearch-engine crawlersMature standard, broadly adopted
llms.txtCurated content index for LLMsLLMs (in theory)Emerging standard, active crawling unconfirmed

Related terms

AEO (Answer Engine Optimization), GEO (Generative Engine Optimization), AI Overviews.

Fuentes

Términos relacionados

  • aeo-answer-engine-optimization
  • geo-generative-engine-optimization
  • ai-overview