How AI Chooses Which Sources to Cite

Generative engines like ChatGPT, Perplexity, and Google AI Overviews retrieve a handful of web pages, then quote and cite the ones that best support a clear, well-structured answer. The peer-reviewed GEO paper (KDD 2024) found that adding citations, quotations, and statistics to a page raised its visibility in generative responses by up to roughly 40%. For contractors, getting cited means writing specific, self-contained, source-backed answers; using clear headings and FAQs; keeping content fresh; and earning authority through reviews, structured data, and a clean site.
A homeowner used to type "best plumber near me" into Google and scan a list of links. Increasingly, they ask ChatGPT, Perplexity, or Google's AI Overview a full question — "who should I call for a slab leak in Round Rock, and what does it usually cost?" — and read a synthesized answer that names a few businesses and cites a few web pages. The contractor who gets named and cited in that answer wins attention the others never see.
So the question for any local business is no longer only "how do I rank?" It is "how do I become one of the handful of sources a generative engine decides to quote?" That is a different game with its own emerging science. The foundational research, GEO: Generative Engine Optimization (Aggarwal et al., presented at KDD 2024), tested what actually moves the needle and found that adding citations, quotations, and statistics to a page raised its visibility in generative responses by up to roughly 40%.
This article explains how generative engines select and cite sources, what the research says makes a page citable, and the practical tactics a contractor can apply. It builds on what is GEO for contractors and how to show up in ChatGPT and Perplexity, within the visibility pillar.
How Generative Engines Find and Cite Sources
Most consumer-facing generative engines that show citations work through retrieval-augmented generation. When a query needs current or factual information, the system does not invent an answer from memory alone. It runs a search, retrieves a small set of candidate web pages, and composes an answer grounded in those pages — citing the ones it actually drew from.
That two-stage process is the key to everything that follows.
| Stage | What happens | What it rewards |
|---|---|---|
| Retrieval | The engine searches and pulls a small candidate set of pages | Crawlability, relevance, authority — classic SEO |
| Selection | The model composes an answer and cites the pages best supporting it | Structure, specificity, source-backed claims |
To be cited, you must first survive retrieval (be findable and relevant), then win selection (be the clearest, most quotable support for the answer). Traditional SEO gets you into the room; GEO decides whether you get quoted.

What the GEO Research Found
The GEO paper is the most rigorous public study of this question. The authors built a benchmark of diverse queries, then tested specific content changes to see which ones increased a source's visibility inside generative answers. The standouts:
- Cite sources. Adding citations to credible references increased visibility.
- Add quotations. Incorporating relevant quotes from authoritative sources helped.
- Add statistics. Including concrete, relevant numbers helped.
Together, these methods improved source visibility by up to roughly 40% across queries. Notably, naive keyword stuffing did not help and could hurt — the same tactic that has been dead in SEO for years is also dead in GEO.
The throughline is credibility expressed as specificity. A page that says "we are the best plumber in town" gives a model nothing to quote. A page that says "slab leaks in this area are commonly caused by foundation movement; a typical detection-and-repair runs in a documented range; here is the process, with a cited reference" gives the model exactly the kind of specific, attributable material it prefers to surface.

What Makes a Page Citable: The Checklist
Translating the research and practical experience into a contractor's checklist:
1. Specific, self-contained answers
Write passages that fully answer one question on their own, so a model can lift them without needing the rest of the page for context. This is exactly what well-formed FAQ content does — a question, then a complete answer.
2. Clear structure
Descriptive headings, short paragraphs, and lists make the page parseable. A wall of undifferentiated text is hard for a model to segment and quote. Our service and city pages guide shows the structure that helps here.
3. Source-backed claims
Where you make a factual claim, attribute it. Citing a credible reference is both honest and, per the GEO research, a visibility advantage. Vague assertions are not quotable; sourced facts are.
4. Structured data
LocalBusiness, Service, and FAQPage schema make the meaning of your content explicit and easy to
attribute. See schema for home services.
5. Freshness
Dated, regularly updated pages are favored over stale ones. A "last updated" date and genuine refreshes signal current accuracy.
6. Authority signals
Reviews, consistent business information, and a clean, fast site all feed the authority that helps you survive retrieval. The same reviews and trust signals that convert humans also build the credibility engines look for.
GEO Versus Traditional SEO
GEO does not replace SEO; it extends it. The retrieval stage runs on SEO fundamentals, so abandoning SEO to "do GEO" is a mistake.
| Traditional SEO | Generative engine optimization (GEO) | |
|---|---|---|
| Goal | Rank in a list of links | Be quoted and cited inside an answer |
| User action | Clicks a link | May read the answer without clicking |
| Key levers | Relevance, authority, crawlability | Structure, specificity, source-backed claims |
| Relationship | Foundation | Built on top of SEO |
The honest framing for a contractor: do the SEO work to be retrievable, then do the GEO work to be quotable. One without the other leaves results on the table.

The Crawler Mistake That Erases You
There is one error that quietly removes a business from generative answers entirely: blocking the AI crawlers in robots.txt. If the bots that feed a generative engine cannot access your site, you are not in the candidate pool, and you cannot be cited no matter how good the content is.
This matters because some businesses block these crawlers reflexively, treating them like scrapers. For a local contractor competing for visibility, that is self-sabotage — it trades a small, speculative downside for the certainty of being invisible to the fastest-growing discovery channel. Stay crawlable. Google documents how its AI features and crawling interact, and the principle generalizes: access is the price of citation.
A Practical GEO Workflow for Contractors
- Stay crawlable. Do not block the crawlers that feed generative engines.
- Structure every key page with descriptive headings, short paragraphs, and a real FAQ.
- Answer specific questions completely, in self-contained passages a model can lift.
- Back claims with sources and numbers wherever you make a factual statement.
- Add schema (
LocalBusiness,Service,FAQPage) and validate it. - Keep content fresh, with genuine updates and an honest "last updated" date.
- Build authority through reviews, consistent business data, and a fast, clean site.
- Measure mentions: periodically ask the engines your customers' questions and see whether you are named and cited.
None of this is exotic. It is disciplined, specific, honest content built on solid SEO — which is also what serves human readers best. The pages that win citations are usually the pages that deserve them.
Conclusion
Generative engines retrieve a few pages and quote the ones that best support a clear answer. The peer-reviewed GEO research shows that specificity, citations, quotations, and statistics — not keyword tricks — are what raise a page's odds of being that quoted source. For contractors, getting cited is less about gaming a new system and more about being genuinely the clearest, most source-backed answer to a customer's question.
Stay crawlable, structure your content, answer real questions completely, back your claims, keep things fresh, and build authority. To go deeper, read what is GEO for contractors, how to show up in ChatGPT and Perplexity, schema and structured data for home services, service and city pages for local SEO, the visibility pillar, and the full blog.
We answer before we start
Q/01How does an AI like ChatGPT or Perplexity actually decide which sources to cite?
Most consumer-facing generative engines that cite sources use retrieval-augmented generation: when a query needs current or factual information, the system runs a search, retrieves a small set of candidate web pages, and then generates an answer grounded in those pages, citing the ones it drew from. So citation depends on two stages. First, retrieval: your page has to be findable and relevant enough to be pulled into the candidate set, which still rewards traditional SEO fundamentals. Second, selection: among the retrieved pages, the model favors those that most clearly and specifically support the answer it is composing. Clear, self-contained, source-backed passages are easier to quote and attribute.
Sources & resourcesQ/02What does the GEO research say actually makes a page more likely to be cited?
The GEO paper (Aggarwal et al., presented at KDD 2024) tested content changes against generative engines and found that adding citations to credible sources, adding relevant quotations, and adding statistics were among the most effective methods, improving a source's visibility in generative responses by up to roughly 40% across queries. Keyword stuffing, by contrast, did not help and sometimes hurt. The practical reading for contractors is that pages which make specific, verifiable, well-attributed claims are easier for a generative engine to lift and cite than vague marketing copy.
Sources & resourcesQ/03Is getting cited by AI different from ranking on Google?
It overlaps but is not identical. Traditional SEO optimizes for a ranked list of links a human clicks. Generative engine optimization (GEO) optimizes for being quoted and attributed inside a synthesized answer the user may never click past. The retrieval stage still leans on SEO fundamentals (crawlability, relevance, authority), so good SEO remains the foundation. The selection stage adds new emphasis on structure, specificity, and source-backed claims. You are no longer only competing for a click; you are competing to be the passage the model decides to quote.
Sources & resourcesQ/04Do I need special markup or can I block AI crawlers and still get cited?
If you block the crawlers that feed a generative engine, you generally forfeit the chance to be cited by it. Many AI systems rely on web crawling, and access is controlled through robots.txt and crawler-specific directives. Blocking those bots is a common and costly mistake for local businesses, because it removes you from the candidate pool entirely. You do not need exotic markup to be cited, but structured data and clean HTML make your content easier to parse and attribute. The baseline is: stay crawlable, be well-structured, and make specific, sourced claims.
Sources & resourcesQ/05How long does it take for new content to get picked up and cited by AI engines?
It varies by engine. Tools like Perplexity and ChatGPT's browsing mode that retrieve live web results can surface and cite new pages relatively quickly once they are crawlable and indexed. Features baked into a search index, such as Google AI Overviews, depend on normal crawling and indexing timelines, so fresh content appears as the index updates. Either way, freshness is a factor the GEO research and general experience both reward: regularly updated, dated, accurate pages tend to be favored over stale ones. There is no fixed number; keep content current and crawlable and the lag shrinks.
Sources & resources
