DefinedTerm · Glossary
What is llms.txt
llms.txt is a plain-text file placed at a website's root that gives large language models a structured, markdown-formatted index of the site's most important content. Proposed by Jeremy Howard in September 2024 as an emerging standard, it had been adopted by over 844,000 sites by November 2025 according to BuiltWith. However, a ten-week Search Engine Land experiment found no detectable crawling of the file by any of the four major AI engines.
Full definition
llms.txt is a markdown-formatted file placed at the root of a domain (e.g., https://example.com/llms.txt). It provides large language models with a curated index of the site's canonical content, organized into named sections with links to the corresponding markdown versions of each page.
The specification, proposed by Jeremy Howard in September 2024 on llmstxt.org, defines two required elements: an H1 heading containing the site name and a short descriptive paragraph. Everything else — thematic sections, resource links, and descriptions — is optional but encouraged.
Unlike robots.txt, llms.txt does not govern crawler access. Its purpose is editorial: it signals which content the site owner considers most authoritative for AI training or citation, in a format models can consume without rendering JavaScript.
Why it matters in 2026
The adoption story is significant, but the impact data is more nuanced. BuiltWith reported more than 844,000 sites serving llms.txt by November 2025, making it one of the fastest-adopted web standards in recent years. Yet a controlled ten-week experiment published by Search Engine Land, monitoring 50 sites, detected no HTTP requests to the llms.txt file from ChatGPT, Perplexity, Claude, or Gemini during the observation window.
A large-scale Ahrefs study by Linehan and Guan (75,000 brands, December 2025) found that llms.txt presence correlated with AI visibility at just 0.127 — the weakest individual signal among the top eight factors studied. For reference, having an active YouTube channel correlated at 0.737.
The practical implication: implementing llms.txt carries negligible cost and may provide future benefit as AI crawlers evolve, but it should not substitute for content quality, structured data, or citation-readiness.
How it works
- Create a file named
llms.txtat the root of the domain. - Open with an H1 containing the site name, followed by a brief description paragraph.
- Add thematic sections (H2 headings) grouping related resources.
- List resources as markdown links, each pointing to a publicly accessible markdown version of the page.
- Optionally serve an extended
llms-full.txtthat includes the full content inline rather than links.
Each linked URL should resolve to a clean markdown response — not HTML — so that LLMs can ingest the content without a rendering layer.
Difference from robots.txt and sitemap.xml
| File | Primary function | Who reads it | Status in 2026 |
|---|---|---|---|
| robots.txt | Controls crawler access to URLs | All web crawlers | Mature standard, universally respected |
| sitemap.xml | URL index for search-engine indexing | Search-engine crawlers | Mature standard, broadly adopted |
| llms.txt | Curated content index for LLMs | LLMs (in theory) | Emerging standard, active crawling unconfirmed |
Related terms
AEO (Answer Engine Optimization), GEO (Generative Engine Optimization), AI Overviews.
Fuentes
Términos relacionados
- aeo-answer-engine-optimization
- geo-generative-engine-optimization
- ai-overview