GEO / AI Search

How AI Search Actually Works (and Why It Changes SEO)

AI search does not rank ten links. It retrieves candidate pages, judges which to trust, writes one answer, and cites a few sources. Here is how the machine works, stage by stage, and what each stage means for getting your business cited.

Ask ChatGPT, Perplexity, or Google a question today and you often don't get a list of links anymore. You get one written answer, with a few sources cited underneath.

Something happened in the gap between your question and that paragraph. Most explanations of it are either drowning in jargon or so hand-wavy they tell you nothing.

So here's the plain version. An AI search engine does four things, in order: it retrieves a set of candidate pages, ranks them by relevance and trust, writes one answer from the ones it believes, and cites the handful it actually leaned on.

Get those four stages straight and the whole game of getting cited stops being a mystery.

We optimize for these engines every day, on client sites and our own. So this is the model we actually work from, not a summary of someone else's diagram. GEO is part of our services, and how these engines behave is the thing we watch for a living.

Abstract editorial illustration: a left-to-right pipeline of three stages - navy sources gathered, then ranked, resolving into one orange answer - how AI search works.

The shift: from ten blue links to one answer#

Classic search hands you a list and lets you choose. The engine's job ends at ranking. You read, compare, and click. The page is the product.

AI search does the reading for you. It pulls multiple sources, reasons over them, and hands back one composed answer with citations. The answer is the product, and your page is now an ingredient in someone else's paragraph instead of a destination.

That one change rewrites the whole objective.

You're not trying to occupy position three anymore. You're trying to be a source the engine retrieves, trusts, and quotes. And to do that on purpose, you need to know what happens at each stage of the pipeline.

Les quatre étapes d'une réponse de recherche IA
RécupérationLe moteur rassemble des pages candidates depuis la recherche et son index.
ClassementIl note ces sources selon la pertinence, la confiance et l'extractibilité.
SynthèseIl compose une seule réponse à partir des passages auxquels il se fie le plus.
CitationIl relie les sources utilisées : les notes que l'utilisateur voit vraiment.

The four stages of an AI answer#

Every major AI search surface runs some version of the same loop. The details differ engine to engine, but the shape holds: retrieve, rank, synthesize, cite.

An AI search engine retrieves candidates, ranks them by trust, writes one answer, and cites the few it leaned on. Win all four, not one.

Retrieval: how the engine finds candidate pages#

Retrieval is where the engine builds a shortlist of pages that might answer the question. If you are not in the candidate pool, nothing else you do matters.

Content reaches an engine two ways, and they run on completely different clocks.

  • Live retrieval. Ask ChatGPT with search, Perplexity, or Google a current question and the engine issues its own searches, fetches a set of live pages, and reads them right then. This is the surface you can move in days. Google describes AI Overviews as running a "query fan-out," issuing multiple related searches and pulling more pages than a single query would (see Google's AI Overviews documentation at developers.google.com/search/docs/appearance/ai-features). Perplexity works the same way: it searches the web per question and reads the results before it answers.
  • Training data. Part of what a model "knows" is baked in from the text it trained on, frozen until the next training run. That's why ChatGPT can name businesses with search switched off. You influence this layer slowly, by being written about consistently over months, not by editing a page this week.

Here's the practical split: live retrieval is fast and editable, training data is slow and durable.

To get retrieved live, the engine's crawler has to be able to reach and read your page. OpenAI fetches pages for ChatGPT search with OAI-SearchBot and documents its crawlers at platform.openai.com/docs/bots.

Block that crawler in your robots.txt and you've removed yourself from the candidate pool with your own hands. Worth checking before anything else.

Ranking: how the engine decides what to trust#

Retrieval casts a wide net. Ranking is where the engine pulls it tight, scoring candidates on two things at once: does this page actually answer the question, and is this source credible enough to repeat?

Relevance is the easy half. Trust is the hard half, and it's decided before a single word gets quoted. An engine won't put its name behind a claim from a source it has no reason to believe.

That trust comes from the same signals that build authority anywhere on the web:

  • Who references you. Mentions and links from credible sites the engine already reads tell it you're a recognized voice, not some unknown page asserting things into the void.
  • How consistently the web ties you to the topic. Cover a subject with real depth and you read as a source on it. One thin page reads as a passing mention.
  • Whether your identity is clear and consistent. Same business name, same description, same core topics everywhere you appear. Mixed signals dilute the association the engine is trying to form.

Google's own guidance points the same way: its systems aim to surface content that demonstrates experience, expertise, authoritativeness, and trust (the E-E-A-T framework in Google's Search Quality guidelines).

Trust isn't a tactic you bolt on at the end. It's the filter that decides whether your relevant page is even allowed into the answer.

Synthesis: how the engine writes the answer#

Synthesis is the stage that simply didn't exist in classic search. The engine takes the pages that survived ranking and composes a brand-new answer in its own words, stitching facts from several sources together at once.

This is why how your content is written matters as much as whether it's trusted. A language model is reading your page to pull out a usable claim. If the answer sits in one clear sentence under a clear heading, it's easy to lift. If it's smeared across three paragraphs of preamble, the model has to guess where the answer is, and it'll often just reach for a competitor who made it obvious instead.

If a model can lift one sentence from your page and have it stand on its own, you are extractable. If every sentence needs the three around it, you are not.

Extractability is a structural property, not a writing-quality opinion. Lead with the answer, match headings to the way people actually ask questions, keep paragraphs short, and make each fact stand on its own.

A page built that way feeds the synthesis stage clean material. A page that buries its answer makes the engine work for it, and engines on a token budget do not work hard for you.

Citation: how and when the engine credits a source#

Once the answer is written, the engine attaches citations to the sources it leaned on. A citation is the payoff: the link that sends a real person to your site, and the credit that builds your standing for the next answer.

Citations aren't handed out evenly. The engine tends to credit the sources that most directly supplied the claims in the answer, which loops right back to the earlier stages. A page that was retrieved, trusted, and easy to extract from is the page that gets named. A page that was merely present in the pool, or trusted but hard to quote, often gets read and then left uncredited.

And here's the part worth being blunt about: the cited page is not always the most authoritative one on the web.

It's the one that was reachable, credible enough, and clearest at the exact moment of the query. That gap is precisely what you optimize into.

Where the engines get their information#

Two pipes feed every answer, and conflating them is the single most common mistake I see.

The first is live web retrieval, the fast pipe. It reflects pages as they exist today, so a page you improve this week can show up in answers this month. This is where structuring content for extraction and keeping crawlers unblocked pays back quickly.

The second is the training corpus, the slow pipe. It reflects what the web said about you over the months and years before the last training cut, and you can't edit it directly. You move it the way you build any reputation: by being written about, consistently, on your topic, across sites the engines learn from.

Most businesses need both pipes working.

Live retrieval wins the answer this quarter. The training corpus makes you the name a model defaults to a year out, even with search switched off. The order that actually works: fix the fast pipe first, then fund the slow one.

What this means for SEO#

The foundations of SEO didn't stop mattering. They became the price of entry.

Crawlability, speed, clean HTML, schema, earned authority. That's exactly what gets you into the candidate pool and through the trust filter. So if your house is already in order for search, you're not starting from zero. The work carries over.

What's genuinely new is two jobs classic SEO never asked of you:

  • Structuring content for machine extraction. Ranking a page for a human reader and feeding a clean claim to a synthesis model are related, but they're not the same job. The second one rewards leading with the answer and writing standalone facts, which plenty of well-ranking content flat-out doesn't do.
  • Measuring AI visibility directly. Citations don't show up in a rank tracker. You have to ask the engines your buyers' real questions and log whether you're named, cited, or absent, because the same question can hand back different sources week to week.
Classic SEO ranks a page for a query. AI search rewards a source for being trustworthy and extractable enough to quote. The overlap is real, but the second job is new.

The deeper playbook for doing this work, stage by stage, is its own guide: our complete guide to generative engine optimization. The mechanism above is the reason that playbook is shaped the way it is.

How ChatGPT, Perplexity, and Google AI Overviews differ#

They all run the same four-stage loop, but the emphasis shifts in ways worth knowing.

  • ChatGPT (with search). Blends two surfaces. With search off, it answers from training data, so presence there is a long-term reputation game. With search on, it retrieves live pages via OAI-SearchBot and cites them, behaving more like a fast, editable search engine. For the tactics specific to this one, see how to rank in ChatGPT.
  • Perplexity. The most search-like of the three. It's built around live retrieval, runs searches per question, and shows its citations right up front, which makes it the easiest place to watch the retrieve-rank-cite loop happen and the fastest to test against.
  • Google AI Overviews. Sits on top of Google's existing index and ranking, then bolts a synthesis layer on top. Google has described the query fan-out behind it, where it fires off several related searches and pulls a broader set of pages than a single query would. Strong classic-search foundations carry the most weight here, because AI Overviews draws from the same systems you've already been optimizing for.

Here's the reassuring part. Because all three reward the same underlying things (retrievable, credible, extractable content), you're not optimizing three separate times. Do the work once, properly, and it compounds across every AI surface at the same time.

What to do about it#

Now that you know how the machine works, the moves fall straight out of the stages:

  1. Make sure you can be retrieved. Confirm the AI crawlers you care about are actually allowed, your pages render content in clean HTML, and nothing important hides behind heavy JavaScript. This is the price of entry to the candidate pool.
  2. Earn the trust that gets you ranked. Cover your topic with real depth, keep your business identity consistent everywhere, and get referenced by credible sites. This is the slow, compounding layer competitors can't fake.
  3. Write for the synthesis stage. Lead each section with the answer, match headings to real questions, keep units short, and make every fact stand on its own. This is the highest-impact change most sites have never bothered to make.
  4. Measure what gets cited. Run your buyers' real questions through ChatGPT, Perplexity, and Google on a schedule, and track citations separately from mentions. You can't improve what you don't watch.

Run those four as a tick-box list against your own pages with our GEO checklist.

We do this for clients and on our own properties, and this article is built to the exact spec it describes. If you found it inside an AI answer, well, that's the method working in real time.

If you'd rather have it done for you, that's what our services cover: we make your content the answer AI engines retrieve, trust, and cite, and we track it on your own queries. Book a consultation and we'll show you exactly where you stand today across all three engines.

Continuez la lecture
Commencez

Prêt à bâtir une croissance qui se cumule ?

Dites-nous où vous bloquez. Nous identifierons les causes structurelles et reviendrons avec un plan priorisé.

Réserver une consultation Services