How Generative AI Chooses Sources to Cite

Generative AI assistants such as ChatGPT, Perplexity and Google AI Overviews answer questions by retrieving sources and synthesising them, then citing a selection. They favour content that is clearly written, well-structured, demonstrably credible, and matches the user's query closely. The peer-reviewed GEO study by Aggarwal and colleagues (arXiv:2311.09735, KDD 2024) tested which content changes improve visibility in generative engines and found that adding citations, quotations and statistics from credible sources measurably increased a page's prominence. For UK trades, the path to being cited is the same as being genuinely trustworthy and clearly presented.
A growing share of UK homeowners no longer scroll a list of blue links to find a tradesperson. They ask a question and read an answer. "Who is the best heating engineer near me?" "How much does a rewire cost?" "Is this plumber reputable?" The answer comes from a generative AI assistant, ChatGPT, Perplexity, or Google's AI Overviews, that has read a handful of sources and synthesised a reply, often citing a few of them by name.
For a trades business, this raises an urgent question: when an AI assembles that answer, what makes it choose your page as a source? The reassuring news is that this is not a black box you cannot influence. There is now peer-reviewed research on the subject. The study GEO: Generative Engine Optimization by Pranjal Aggarwal and colleagues, published at KDD 2024, tested which content changes improve a source's visibility inside generative engine answers, and found concrete, repeatable patterns.
This guide explains, honestly and without hype, how generative AI selects and cites sources, what the research actually shows, and how a UK trades business can become the kind of source these systems reach for.
How a Generative Engine Builds an Answer
Most consumer AI assistants that cite sources work in two broad stages. Understanding them is the key to understanding citation.
- Retrieval. The system finds a set of candidate sources relevant to the query, typically by querying a search index. This stage inherits much of traditional search ranking: if your page is not findable and relevant, it is not even a candidate.
- Synthesis. A large language model reads the retrieved sources, composes an answer, and attributes parts of it to specific pages.
A page can only be cited if it survives both stages: it must be retrieved (findable and relevant) and then selected during synthesis (clear and credible enough for the model to draw a confident statement from it). This is why GEO and traditional SEO overlap so heavily, and why our guide to how to show up in ChatGPT and Perplexity and the primer what is GEO for trades both stress the same fundamentals.

What the GEO Research Actually Found
The value of the Aggarwal et al. paper is that it moves the conversation from speculation to measurement. The researchers built a benchmark of queries and tested a range of content modifications to see which improved a source's visibility within generative engine responses.
Their headline finding, stated carefully: content modifications that added credible citations, quotations from authoritative sources, and relevant statistics produced measurable improvements in visibility for many query types. In other words, the generative engines in the study rewarded content that demonstrated its credibility, not merely content stuffed with keywords.
This is a genuinely useful and honest signal for trades content:
- It suggests AI engines favour demonstrable trustworthiness over assertion.
- It rewards the same qualities a discerning human reader values: evidence, clarity, specificity.
- It does not reward fabrication. Inventing statistics to game the system would be both dishonest and, when checked, self-defeating, because credibility collapses on inspection.
A necessary caveat: the study measured visibility in a research benchmark. Real-world results vary by engine, query, and competition. Treat the findings as a well-evidenced direction of travel, not a guaranteed lever. This is exactly the kind of careful, sourced claim the research suggests AI engines themselves prefer.
What Makes a Trades Page Citable
Translating the research and the two-stage model into practice, a citable page tends to share these characteristics:
| Quality | Why it helps citation |
|---|---|
| Clearly answers a specific question | Matches the query precisely during retrieval and synthesis |
| Well-structured (headings, lists, tables) | Easy for a model to extract a clean, attributable statement |
| Genuinely credible claims, sourced | The model can draw a confident factual statement from it |
| Accurate, specific, and current | Reduces the risk of the model picking a competitor instead |
| Technically accessible to crawlers | A blocked page cannot be retrieved at all |
| Consistent business facts (NAP, services) | Reinforced by structured data; reduces ambiguity |
Notice how much of this is simply good content done well. The page that is genuinely the clearest, most accurate, most useful answer to a homeowner's question is also the page an AI is most likely to cite. That alignment is the whole opportunity.

Structure and Clarity: Helping the Synthesis Step
The synthesis stage rewards content a model can parse cleanly. Practical structural habits:
- Lead with a direct answer. State the answer to the page's core question early and plainly, before the supporting detail. This is the logic behind the answer capsule at the top of every article on this site.
- Use descriptive headings. Headings that name the question they answer help a model locate the relevant passage.
- Use lists and tables for discrete facts. Structured data points are easier to extract and attribute than the same information buried in dense prose.
- Keep claims self-contained. A statement that makes sense on its own is easier to quote accurately than one that depends on three paragraphs of prior context.
These are the same habits that make content readable for humans, which is the point: there is no trade-off between writing for people and writing for generative engines.
Credibility Signals AI Can See
Beyond clarity, the research points to credibility as a driver. For a trades business, credibility is expressed through several reinforcing channels:
- Genuine reviews and ratings, the subject of the science of online reviews and social proof and trust for trades.
- Accurate, consistent business information, made machine-readable through schema and structured data.
- Real expertise on the page, demonstrated through specific, first-hand knowledge rather than generic copy, the same principle behind service and location pages that avoid thinness.
- Sourced claims, where a factual statement links to a credible reference rather than floating unsupported.
These signals serve double duty: they build trust with the human reading the page and give the AI the credibility cues it appears to reward. They are central to the visibility pillar.

What You Can and Cannot Control
Honesty about limits matters here, because the AI-visibility space is full of overpromising.
What you can control:
- Whether your content is crawlable and accessible to the relevant AI crawlers.
- How clearly and credibly you present information.
- The accuracy and specificity of your claims.
- Your structured data and consistent business facts.
What you cannot control:
- The final decision of any given model on any given query, which depends on factors the providers do not fully disclose.
- The exact ranking and citation logic, which is proprietary and changes over time.
One concrete and important point: do not block AI crawlers wholesale in an attempt to protect content. Blocking the crawlers that feed these systems generally means you cannot be represented at all, which is the opposite of the goal. Google's own documentation describes the relevant controls, and the sensible default is to remain accessible while keeping your content accurate and clear.
GEO and SEO Are the Same Foundation
A frequent worry is that GEO requires abandoning everything you know about SEO. It does not. The two share a foundation, and the differences are refinements, not reversals.
| Dimension | Traditional SEO | Generative Engine Optimisation |
|---|---|---|
| Goal | Rank high in a list of links | Be included and cited in a synthesised answer |
| Core requirement | Relevant, crawlable, quality content | Relevant, crawlable, quality content |
| Structure | Helps users and crawlers | Helps the model extract and attribute |
| Credibility | Builds rankings and trust | Appears to drive citation selection |
| Risk of shortcuts | Penalised by spam systems | Collapses on inspection; not rewarded |
Google has indicated that its AI features build on its core ranking and quality systems, which means the helpful, reliable, people-first content guidance still applies. Keep doing solid SEO and content work, and layer GEO-aware practices, direct answers, clear structure, credible sourcing, on top. The comparison SEO vs Google Ads for tradespeople and the comparisons index put the wider channel choices in context.
A Practical Checklist for Trades
To make your pages the kind of source generative engines reach for:
- Answer one clear question per page, and answer it directly near the top.
- Structure for extraction: descriptive headings, lists, and tables for discrete facts.
- Make genuine, specific, sourced claims, never fabricated statistics.
- Show real credibility: genuine reviews, accreditations, and first-hand expertise.
- Add accurate structured data so your business facts are unambiguous.
- Stay crawlable. Do not block the AI crawlers you want to be cited by.
- Keep content current, because stale information is a reason to be passed over.
This is not a trick. It is the description of a genuinely excellent, trustworthy page, which is exactly what these systems are built to surface. For the sector-specific application, see the guides for plumbers and electricians, and browse the glossary for the underlying terms.
Conclusion
Generative AI is changing how UK homeowners find tradespeople, but it is not changing what makes a business worth recommending. The peer-reviewed GEO research, alongside what the providers themselves say, points to a consistent conclusion: AI engines cite sources that are relevant, clearly structured, demonstrably credible, and accessible.
That is not a loophole to exploit; it is a standard to meet. Write the clearest, most accurate, best-sourced answer to the questions your customers actually ask, present your genuine trust signals plainly, mark up your business facts, and stay accessible. Do that, and you become the page an AI reaches for, for the same reason a neighbour would recommend you: because you are genuinely the trustworthy answer. Explore the visibility pillar, the trades we serve, and the wider blog to take it further.
We answer before we start
Q/01How does an AI assistant decide which sources to cite in its answer?
Most consumer AI assistants that cite sources, such as Perplexity and Google AI Overviews, work in two broad stages. First they retrieve a set of candidate sources relevant to the query, typically using a search index. Then a large language model synthesises an answer from those sources and attributes parts of it to specific pages. The sources most likely to be cited are those that are relevant to the exact query, clearly written, well-structured, and credible enough that the model can confidently draw a factual statement from them. The peer-reviewed GEO research found that content enriched with credible citations, quotations and statistics tended to gain visibility in generative engines, which suggests these systems reward demonstrable trustworthiness, not just keyword relevance.
Q/02What is Generative Engine Optimisation (GEO)?
Generative Engine Optimisation, or GEO, is the practice of optimising content so that it is more likely to be surfaced and cited by generative AI engines such as ChatGPT, Perplexity and Google AI Overviews. The term was introduced in the academic paper "GEO: Generative Engine Optimization" by Pranjal Aggarwal and colleagues, published at KDD 2024. The paper frames generative engines as systems that synthesise answers from multiple retrieved sources, and tests which content modifications improve a source's visibility within those answers. GEO is the AI-era counterpart to traditional SEO: rather than optimising for a ranked list of links, you optimise for being included and cited in a synthesised answer. The two overlap heavily, because both reward clear, credible, relevant content.
Q/03Does adding statistics and citations to my content help it get cited by AI?
The peer-reviewed GEO study found that it can. Aggarwal and colleagues tested a range of content modifications and reported that methods such as adding relevant citations, quotations from credible sources, and statistics produced measurable improvements in a source's visibility within generative engine responses for many query types. The intuition is that these signals make a page more demonstrably credible and more directly useful for a model trying to assemble a factual answer. For a trades business, this aligns with simply doing good content well: making genuine, accurate, well-sourced claims rather than vague marketing assertions. Note that the study measured visibility in a research setting; real-world results vary by engine, query, and competition, so treat it as a well-evidenced direction, not a guarantee.
Q/04Can I control whether AI assistants use my content?
Partly. You control whether your content is crawlable and how clearly it presents information, which influences whether it can be retrieved and cited. You can also signal preferences to AI crawlers through your robots.txt file, and Google's documentation describes controls relevant to its AI features. What you cannot control is the final decision of any given model on any given query, which depends on factors the providers do not fully disclose. The practical stance is to make your content easy to access and easy to cite, keep it accurate, and avoid blocking the crawlers you want to reach you. Blocking AI crawlers outright generally reduces your chances of being represented at all.
Sources & resourcesQ/05Is GEO different from traditional SEO?
They overlap heavily but optimise for different end states. Traditional SEO aims to rank a page highly in a list of links. GEO aims to have a page included and cited within a single synthesised answer generated by an AI engine. The good news for trades businesses is that the foundations are shared: clear writing, accurate information, a logical structure, demonstrable credibility, and technical accessibility all help with both. The GEO research suggests some AI-specific emphases, such as the value of citations and statistics, but it does not call for abandoning SEO fundamentals. The sensible approach is to keep doing solid SEO and content work, and layer GEO-aware practices on top, rather than treating them as competing strategies.
Sources & resourcesQ/06Do AI assistants and Google AI Overviews use the same content signals as normal search?
Largely yes, with some additions. Google has stated that its AI features in Search build on its core ranking and quality systems, which means the same fundamentals, helpful, reliable, people-first content, technical accessibility, and demonstrated experience and trust, continue to apply. AI assistants that retrieve from a search index inherit much of that ranking logic before the synthesis step. On top of the shared fundamentals, the GEO research highlights AI-specific content qualities such as clear structure and credible citations. So the safest reading is that strong traditional content and SEO remain the foundation, and AI-aware refinements build on that foundation rather than replacing it.
Sources & resources
