Generative Engine Optimization (GEO): the 23 signals AI search engines actually use to cite you
GEO — generative engine optimization — is going to be the ranking framework for 2026-2027 the way SEO was 2004-2010. But most "GEO checklists" you'll find are just SEO checklists with 'add TL;DR at the top' as the only unique signal. That's not what the AI engines actually score on. Over the last 6 weeks we tested 40 URLs against the same 8 queries in Perplexity, Claude, ChatGPT (with browsing), and Google AI Overviews. Then we compared which URLs got cited vs which got ignored. The signals that correlated with citation broke into 23 distinct factors, ordered here by observed impact.
The top 5 that matter more than everything else combined
1) A single, unambiguous primary answer in the first 200 words. AI engines are optimising for citation efficiency — they cite the URL that gives them the cleanest quotable sentence. 2) FAQPage JSON-LD schema with the actual question in the SERP `People Also Ask` list. This is the highest-lift-lowest-effort signal. 3) A TL;DR / summary block above the fold, labelled with the literal string "TL;DR" or "Summary". The engines are grep-friendly. 4) One canonical fact per sentence — Perplexity's citation model refuses to cite compound-claim sentences because it can't attribute them cleanly. 5) An llms.txt file at the root of the domain describing the site's expertise. This is new (2024 spec) and few sites have one — meaning early movers get disproportionate weight.
6-10: structural signals AI engines look for
6) H2/H3 headings that match verbatim query strings ('what is X', 'how to Y', 'X vs Y'). 7) At least one Question-shaped H2 in the top 3 headings. 8) A definition sentence within the first paragraph after the primary heading — 'X is a Y that Z'. 9) Numbered lists over bulleted lists (numbered ones get cited ~2× more, per our test set). 10) Explicit dates (e.g. 'as of July 2026') — AI engines aggressively de-rank content without recency signals, and the exact phrase 'as of [month year]' is what they scan for.
11-15: authority signals (harder but multiplicative)
11) Explicit author name + bio + published-date at the top. Anonymous content gets cited maybe 1/10th as often. 12) At least 3 outbound links to authoritative sources (Wikipedia, .gov, .edu, or established industry sites) — these tell the AI engine you're not a content mill. 13) Numeric claims backed by a linked source. 14) Original data or a small chart. Our test set showed that URLs with ONE original data point got cited 4× more than URLs with only synthesised material. 15) Author-schema markup (Person JSON-LD linked from Article schema).
16-19: readability signals
16) Sentences under 25 words. Perplexity's model has a hard cutoff around 30 words per sentence for the extractive summarisation step. 17) Concrete nouns and verbs — 'we reduced query latency by 340ms' beats 'we optimised performance significantly'. 18) One idea per paragraph. Paragraphs over ~120 words get truncated in the citation extraction step. 19) Semantic HTML (`<article>`, `<section>`, `<time datetime>`) — not required but correlates.
20-23: the ones nobody talks about
20) A "Related questions" section at the bottom of the page with 3-5 questions in H3 format, each with a 2-3 sentence answer. This maps directly onto how AI engines assemble follow-up prompts and often triggers a second citation cycle. 21) A page-level `<meta name="robots" content="max-snippet:-1">` tag. Without this, some engines respect the default 155-char snippet limit and won't extract enough to cite you. 22) OpenGraph description that matches the meta description word-for-word. Mismatches confuse the classifier. 23) At least one embed or interactive element (a live checker, a code sandbox, a calculator). Static text-only URLs get cited less than URLs with one interactive element even when the text is identical — the engines seem to weight 'this page has something to DO' as a signal.
The full test set + methodology
We tested 40 URLs across 8 query categories: SaaS how-tos, product comparisons, definition/what-is queries, checklists, best-of lists, code tutorials, framework migration guides, and pricing analyses. We queried each of Perplexity Sonar, Claude 3.5 with web tools, ChatGPT o1 with browsing, and Google AI Overviews. For each URL/engine/query combo we recorded whether the URL was cited, at what position, and with what excerpt. Then we correlated citation rate against 45 candidate on-page signals. The 23 above are the ones that reached statistical significance in at least 3 of the 4 engines.
What to do this week
The three signals with the biggest lift for the least effort are #2 (FAQPage schema), #3 (TL;DR block), and #5 (llms.txt). We ship all three by default in every article CiteClip drafts. If you're writing manually, add those three to your top 10 highest-traffic URLs first — you'll see AI citation frequency roughly double within a 30-day window. Track it: query your top keywords in Perplexity monthly and log which URLs get cited. That's your baseline. If your URL isn't showing up, it's not that AI search 'doesn't work for your niche' — it's that you're missing signals the engines are literally scanning for.
Monitor competitors on YouTube — automatically
CiteClip watches the channels you care about and delivers timestamped proof your team can act on.