Advanced Guide

Updated May 13, 2026 | 16 min read

What Did Ahrefs Get Wrong in Its 2026 Schema Study That Their Own Data Already Disproves?

By Digital Strategy Force

The 2026 Ahrefs schema markup study concluded JSON-LD doesn't lift AI citations after testing 1,885 pages over 30 days. The same study's initial 6 million URL analysis showed cited pages were three times more likely to carry schema. Both findings are correct.

Oblique aerial photograph of a US Navy supercarrier at night with deck floodlighting illuminating arrayed F/A-18 Super

MODERNIZE YOUR BUSINESS WITH DIGITAL STRATEGY FORCE • ADAPT & GROW YOUR BUSINESS IN A NEW DIGITAL WORLD • TRANSFORM OPERATIONS THROUGH SMART DIGITAL SYSTEMS • SCALE FASTER WITH DATA-DRIVEN STRATEGY • FUTURE-PROOF YOUR BUSINESS WITH DISRUPTIVE INNOVATION • MODERNIZE YOUR BUSINESS WITH DIGITAL STRATEGY FORCE • ADAPT & GROW YOUR BUSINESS IN THE NEW DIGITAL WORLD • TRANSFORM OPERATIONS THROUGH SMART DIGITAL SYSTEMS • SCALE FASTER WITH DATA-DRIVEN STRATEGY • FUTURE-PROOF YOUR BUSINESS WITH INNOVATION •

Table of Contents

Schema Markup's Real Mechanism Is Upstream, Not at Retrieval

Schema markup's primary mechanism is upstream. Knowledge graph entity registration, vector embedding precision, and rich-result eligibility all run during indexing, not during real-time page extraction. The 2026 Ahrefs schema study measured downstream citation counts on already-saturated pages over a 30-day window, then concluded schema does not work. That conclusion is incompatible with the study's own 6-million-URL correlation finding that cited pages carry JSON-LD at three times the rate of uncited pages.

Essential context: understanding schema markup for AI visibility · how to write JSON-LD structured data for AI search

The mechanism Ahrefs tested is not the mechanism schema actually uses. When a page publishes JSON-LD structured data, the markup is consumed by Google's crawler, validated against the Schema.org type system, and used to register or update the page's entity inside the Google Knowledge Graph. The entity record then propagates into embedding refresh cycles, AI training corpora, and rich-result eligibility queues. None of that happens during the user-facing retrieval that the searchVIU finding Ahrefs cites measured.

Six methodology errors explain why the Ahrefs study reached the opposite conclusion of its own data. Selection bias on already-saturated pages predetermined the null result. A 30-day measurement window cannot detect a 60-to-180-day propagation cycle. Pooling five distinct schema mechanisms into one effect estimate produces a meaningless average. The -4.6% AI Overviews finding reflects rich-result redirection, not schema failure. Measuring schema presence instead of schema depth tests the wrong variable. Per-URL aggregation cannot capture entity-level lift. Each error is detailed below.

The Selection Floor Error: Ahrefs Tested Only Pages Already Saturated With Citations

Every page in the 1,885 treated set and the 4,000 controls already carried 100 or more AI Overview citations before the study began. The inclusion criterion published in the study methodology filtered the entire 6-million-URL pool down to pages that were already in Google's consideration set, already crawled, already surfaced, already known to the AI systems being measured. The sample is the population least likely to show a marginal schema effect.

This is studying ceiling effects, not entry into the citation set. The question schema markup actually answers, the one that matters to every business not already cited, is whether implementing JSON-LD pulls a previously invisible page into the AI consideration pool. That question was excluded by construction. The DiD test ran on the population where schema's compounding work was already complete, then concluded the work was absent.

Academic research on difference-in-differences methodology confirms this exact failure mode. The 2024 arxiv paper on Difference-in-Differences with Sample Selection proves formally that DiD estimators are biased when the treated and control groups are selected on the outcome variable. Selecting on "100 or more AI Overview citations" is selecting on the outcome. The bias direction depends on whether the selection censors high-responders or low-responders, but the bias is non-zero by construction.

A correctly designed test would include pages with zero baseline citations, pages with low baseline citations, and pages with high baseline citations as separate cohorts, then measure the marginal effect of schema implementation within each. The Ahrefs methodology aggregated only the top cohort, then reported the aggregated result as if it described the population.

The Ahrefs Sample Selection Cascade

Initial URL pool analyzed

6,000,000

Pages with 100+ AI Overview citations (study population)

~30,000

Matched controls (3 per treated, already cited)

4,000

Treated pages (added JSON-LD Aug 2025 to Mar 2026)

1,885

Treated pages as fraction of initial pool

0.031%

Sample Stage	Count
Initial URL pool analyzed	6,000,000 URLs
Pages with 100+ AI Overview citations	~30,000 URLs (study population)
Matched controls	4,000 controls (3 per treated)
Treated pages	1,885 pages added JSON-LD
Treated pages as fraction of initial pool	0.031%

Source: Ahrefs — We Tracked 1,885 Pages Adding Schema. AI Citations Barely Moved. (May 11, 2026)

Ahrefs' Own 6 Million URL Sample Refuted Their Own Conclusion

Before the DiD analysis ran, Ahrefs' own initial correlation analysis on the 6,000,000-URL pool found a clear pattern: AI-cited pages were almost three times more likely to carry JSON-LD than non-cited pages. The authors acknowledged this then dismissed it as "correlation, not causation" before narrowing to the 1,885-page DiD subsample.

The dismissal is technically correct but rhetorically misleading. If schema markup were genuinely irrelevant to AI citations, no such correlation should appear at population scale. The two findings are not contradictory at the methodological level: schema correlates with citation at population scale (the 6M sample), but a within-cohort DiD on already-saturated pages cannot detect that correlation because the cohort is selected on the outcome. Both findings can be true simultaneously. The headline conclusion the study reported is the wrong one of the two to lead with.

The buried correlation is the more important finding. A three-times lift in JSON-LD presence among cited pages versus uncited pages, measured across six million URLs, is the largest population-level signal in the entire study. It deserved the headline. The DiD result deserved the footnote.

The Killshot — Ahrefs' Own 6 Million URL Correlation Data

AI-cited pages carrying JSON-LD

~67% (3x)

Non-cited pages carrying JSON-LD

~22% baseline

Ahrefs' direct quote from paragraph 14 of the study: "AI cited pages were almost three times more likely to have JSON-LD than non-cited pages." This is the buried finding that contradicts the headline conclusion. The 3x population-scale correlation is the largest signal in the entire dataset.

Source: Ahrefs — Initial 6M URL correlation analysis (May 11, 2026) · Methodology framework: Difference-in-Differences with Sample Selection (arxiv 2024)

30 Days Is Too Short for Knowledge Graph Propagation

A 30-day pre-and-post measurement window cannot detect the schema-to-citation mechanism because the mechanism itself takes 60 to 180 days to complete. The full propagation chain runs through four sequential stages: crawler re-fetch, knowledge graph entity disambiguation, RAG embedding refresh, then AI training corpus integration. Each stage has its own latency, none of them fits inside 30 days.

Google's Knowledge Graph documentation describes entity disambiguation as an ongoing process that resolves entity identity across multiple sources over time. The 2025 RAKG paper on document-level knowledge graph construction documents disambiguation latency in the 30-to-90-day band for production systems. Embedding refresh cycles run longer, with the GraphRAG survey describing 60-to-120-day refresh cadences for major RAG-augmented retrieval pipelines.

A schema implementation on day one cannot show measurable citation lift on day 30. The crawler may have re-fetched the page, but the knowledge graph entity update is mid-propagation, the embedding refresh is queued, and the AI training corpus that powers ChatGPT or AI Overviews has not yet incorporated the change. The Ahrefs window measures crawler latency, not citation mechanism.

Knowledge Graph Propagation Timeline — Why 30 Days Is Too Short

Stage 1: Crawler re-fetch and schema validation

7-30 days

Stage 2: Knowledge graph entity disambiguation

30-90 days

Stage 3: RAG embedding refresh and vector update

60-120 days

Stage 4: AI training corpus integration

120-180 days

Ahrefs measurement window

30 days

Sources: Google Knowledge Graph documentation · RAKG document-level KG construction (arxiv 2025) · GraphRAG survey (arxiv 2025)

Pooling Schema Types Produces a Meaningless Average

The Ahrefs study coded schema as a single binary variable then aggregated across five distinct types: Article, FAQPage, Product, HowTo, and Organization. Each of those types feeds a different mechanism in Google's pipeline. Pooling them is methodologically equivalent to measuring "did taking a pill help" by mixing aspirin, antibiotics, and placebos into one dataset.

FAQPage schema feeds Google's rich-result extraction layer directly. Article schema declares entity authority for Knowledge Graph registration. Product schema feeds the product entity pool that powers AI shopping answers. HowTo schema feeds procedural rich results. Organization schema feeds entity disambiguation across the brand entity graph. Five mechanisms, five expected effect sizes, five separate populations. The headline coefficient that emerges from pooling them is a weighted average of five different things, none of which the average represents accurately.

A correctly designed test would estimate effects per schema type, allow each effect to differ in sign and magnitude, then report the disaggregated coefficients. The Ahrefs paper acknowledges in its limitations section that types were pooled. The acknowledgment does not rescue the headline conclusion: a pooled null effect is statistically indistinguishable from five offsetting effects, and the data the study collected cannot tell the two apart.

Five Schema Types, Five Distinct AI Citation Mechanisms

Schema Type	Primary AI Mechanism	Citation Surface	Why Pooling Distorts
Article	Entity declaration for KG registration	AI Overviews citation list	Long-cycle KG effect (90-180 days)
FAQPage	Direct rich-result extraction	SERP FAQ accordion (now deprecating)	Redirects citation away from AIO surface
Product	Product entity registration	AI shopping answers, Product KG	Commercial intent only, not informational
HowTo	Procedural step extraction	Step-by-step rich result, AIO procedural	Surface-specific, narrow query population
Organization	Brand entity disambiguation	Knowledge Panel, all branded queries	Compounds across sibling pages, not just treated URL

Sources: Schema.org Article · Schema.org FAQPage · Google FAQ rich-result documentation · Google structured data intro

The -4.6% AI Overviews Decline Reflects Rich-Result Redirection, Not Schema Failure

The most cited finding from the Ahrefs study is a 4.6 percent decline in AI Overview citations on the treated sample. The decline reaches statistical significance at roughly one-in-2500 odds of occurring by chance, which is meaningful. What is missing from the headline is Google's documented behavior on the relationship between traditional rich results and AI Overview citations.

When a query is well-served by a traditional rich result, Google suppresses the AI Overview to avoid duplication of the same answer on the same SERP. Google's May 2024 AI Overviews launch announcement describes the AIO surface as one of several answer surfaces, not the only one. The subsequent AI Mode rollout post reiterated the layered-surface model.

Adding FAQPage, HowTo, or Product schema to a previously schema-light page is the most common pattern for what the Ahrefs study coded as "added JSON-LD." Each of those schema types is designed to earn a rich result. When they succeed, the page captures the rich-result surface and the AI Overview citation for that query is suppressed. The 4.6 percent decline is not citations lost, it is citations moved up the SERP to a higher-trust surface. A study that reports the AI Overview number alone without measuring rich-result lift is reporting half the citation movement.

The Three Numbers Ahrefs Reported, and the One They Buried

Statistically significant at 1-in-2500 odds, but reflects rich-result redirection not schema failure

Statistically insignificant. Ahrefs reported the direction; the sample cannot distinguish it from zero

Statistically insignificant. Direction positive but indistinguishable from noise at this sample size

AI-cited pages carry JSON-LD at 3x the rate of non-cited pages across 6,000,000 URLs. The headline finding the study did not report

Source: Ahrefs schema markup AI citations study (May 11, 2026)

Measuring Schema Presence Instead of Schema Depth Tests the Wrong Variable

The Ahrefs study coded schema implementation as binary: did the page transition from no JSON-LD to some JSON-LD between August 2025 and March 2026? That binary coding throws away the variable that actually correlates with AI citation lift, which is schema depth. A 20-line Article block declaring only `headline`, `datePublished`, and `author` is treated as identical to a 200-line entity graph with cross-page `@id` references, `sameAs` Wikipedia and Wikidata anchors, populated `mentions[]`, `citation[]`, `hasPart`, and `about[]` arrays.

The two implementations feed completely different mechanisms. The W3C JSON-LD 1.1 specification defines `@id` as the mechanism for declaring a stable global identifier that allows multiple JSON-LD documents to reference the same entity. A page with a binary Article schema declares nothing the Knowledge Graph can connect to other entities. A page with cross-referenced `@id` and `sameAs` anchors connects directly to the canonical entity in Wikidata, propagates into the Knowledge Graph entity resolution layer, and earns the entity-alignment signal that AI engines use during embedding construction.

Measuring schema presence is like measuring "does owning a hammer help home renovation" without measuring hammer quality, user skill, or the presence of other tools. The Ahrefs design captures whether the hammer was acquired. It does not capture whether the hammer was used correctly. The two variables are correlated but not identical, and the latter is the one that explains outcomes.

Schema Depth Distribution — The Variable Ahrefs Did Not Measure

Tier 1: Binary presence (basic Article or FAQPage block)

~95%

Tier 2: Cross-referenced @id with basic sameAs

~4%

Tier 3: Full entity graph with mentions, citation, hasPart, about

~1%

Depth tiers are DSF qualitative inference based on observed corpus distribution across W3C JSON-LD and Schema.org type usage. The Ahrefs study coded schema as binary and did not measure depth.

Framework: Digital Strategy Force (based on W3C JSON-LD 1.1 Recommendation + Google Knowledge Graph documentation)

The Schema Test Validity Framework — What a Real Test Should Measure

The Schema Test Validity Framework is a six-component audit measuring Selection Floor, Outcome Layer, Schema Depth, Time Horizon, Confounder Isolation, and Entity-Level Aggregation. These are the six dimensions that determine whether a schema impact study tests the actual citation mechanism. Any test that fails on three or more components produces a coefficient that does not describe the population it claims to represent.

The framework matters because schema is not a single mechanism with a single effect. It compounds across entity registration, embedding refresh, rich-result eligibility, and brand-level Knowledge Graph signal. A per-URL test on saturated pages with binary coding over 30 days captures none of these. The Ahrefs design failed five of the six components by construction. The reason matters more than the failure: a study that selects on outcome, measures the wrong layer, codes the wrong depth, runs for the wrong duration, and aggregates at the wrong level cannot detect an effect of any size on any mechanism.

A schema impact study that excludes never-cited pages, pools five mechanisms, measures for 30 days, codes only presence, and aggregates per URL has tested the conditions under which schema cannot possibly show an effect, and then concluded the effect is absent.
— Digital Strategy Force, Search Intelligence Division

A correctly designed replication would solve all six failures simultaneously. Sample stratification across baseline citation cohorts, including never-cited pages, restores the Selection Floor. Multi-surface outcome coding across AI Overview, rich result, Knowledge Panel, and Knowledge Graph entity presence corrects the Outcome Layer.

Schema-depth scoring on a Tier 1/2/3 ladder captures the actual variable that drives effects. Measurement windows of 180 days minimum allow the propagation chain to complete. Instrumental variables or randomized assignment isolate confounders. Per-entity aggregation captures brand-level lift. The Ahrefs paper acknowledges most of these limitations in its methodology section. The acknowledgment is not absolution.

The strategic implication for any business reading the Ahrefs headline is that the study does not say what its conclusion suggests. It does not say schema markup fails to lift AI citations. It says that, in the narrow conditions of the test, the marginal effect on already-saturated pages was not detectable inside a 30-day window. Those conditions are the ones least likely to show an effect by construction. The same data, recoded against the validity framework, would tell a different story. Continued investment in schema depth, cross-page structured data orchestration, and entity-graph completion remains the dominant strategy for businesses pursuing Answer Engine Optimization in 2026.

The Schema Test Validity Framework — Ahrefs Score vs Required Standard

Component	What It Audits	Ahrefs Score
Selection Floor	Does the sample include pages not already cited?	Fail
Outcome Layer	Citation count, quality, rich-result, or entity-level?	Fail
Schema Depth	Binary presence vs cross-referenced @id vs full graph	Fail
Time Horizon	Window matches 60-180 day KG propagation cycle?	Fail
Confounder Isolation	Can the study separate schema from concurrent changes?	Fail
Entity-Level Aggregation	Per-URL vs per-domain vs per-brand entity?	Fail

Framework: Digital Strategy Force · Ahrefs methodology source: Ahrefs schema AI citations study (May 11, 2026)

Confounders the Ahrefs Study Could Not Isolate

The methodology section of the Ahrefs study acknowledges that pages adding JSON-LD often change other things simultaneously. Internal link structures shift. Content gets refreshed. Backlinks accumulate. Page speed improves. Brand-signal investments compound. The study cannot separate schema from these co-occurring changes, which is the textbook failure mode that causal inference methodology papers identify as the central threat to difference-in-differences validity.

Acknowledging the confound does not eliminate it. The headline coefficient still aggregates the effect of schema with the effect of every other concurrent change. The summary table below lists the confounders Ahrefs identified along with the measurement status of each in their study.

Confounders Ahrefs Acknowledged But Could Not Isolate

Confounder	Ahrefs Acknowledgment	Measurement Status in the Study
Internal link changes	Yes (limitations section)	Not measured, not controlled
Content refresh and quality changes	Yes (limitations section)	Not measured, not controlled
Backlink acquisition	Implied, not direct	Not measured, not controlled
Technical performance fixes	Yes (limitations section)	Not measured, not controlled
Brand signal investments	Not addressed	Not measured, not controlled

Confounder list: Ahrefs schema AI citations study limitations section (May 11, 2026) · Methodology framework: DiD with Sample Selection (arxiv 2024)

FAQ — Ahrefs 2026 Schema Study

Practical questions about what the May 11, 2026 Ahrefs schema study actually measured, what it cannot tell businesses about their own schema strategy, and how to read the headline against the underlying data. Each answer reflects Digital Strategy Force's reading of the published methodology plus the broader research on knowledge graph propagation and causal inference design.

What did the 2026 Ahrefs schema markup study actually measure?

The study tracked 1,885 pages that added JSON-LD between August 2025 and March 2026 against 4,000 matched controls, then measured AI Overview, AI Mode, and ChatGPT citation changes over a 30-day window. Every page in the sample already had 100 or more AI Overview citations before treatment, which means the study measured marginal effects on saturated pages, not entry into the citation set.

Why does Ahrefs' own 6 million URL correlation contradict their conclusion?

The initial 6 million URL analysis found cited pages were three times more likely to carry JSON-LD than non-cited pages. If schema were genuinely irrelevant, no such correlation should exist at population scale. The DiD on already-saturated pages measures only the marginal effect where schema's compounding work is already complete. Both findings can be true at the same time, with the population-scale correlation being the more important of the two.

Is the -4.6% AI Overviews decline evidence that schema markup hurts citations?

No. When a page earns a traditional rich result through FAQPage, HowTo, or Product schema, Google's documented behavior suppresses AI Overview citation for that query to avoid duplication. The -4.6% likely reflects citations moving to richer SERP surfaces, not visibility loss. Higher-trust surface attribution is the actual outcome. Reporting the AI Overviews number alone without measuring rich-result lift captures half the citation movement.

How long does schema markup actually take to lift AI citations?

Knowledge Graph entity disambiguation runs 30 to 90 days. RAG embedding refresh cycles run 60 to 120 days. AI training corpus updates run 120 to 180 days. The full schema-to-citation propagation chain typically completes in 90 to 180 days. A 30-day pre or post measurement window captures crawler re-fetch latency, not the actual citation mechanism. Measurement windows shorter than 90 days will miss the propagation effect.

Does schema markup help my business get cited by ChatGPT and Gemini?

Yes, when the implementation has depth. Cross-page @id references, sameAs Wikipedia and Wikidata anchors, populated mentions and citation arrays, hasPart declarations all feed the entity graph these AI systems consult during indexing. Binary schema presence, which is what the Ahrefs study measured, captures none of those mechanisms. Schema depth correlates with citation lift; schema presence alone often does not.

Should I stop investing in schema markup based on the Ahrefs study?

No. The study measured the wrong variable on the wrong sample over the wrong time horizon. The 6 million URL correlation data Ahrefs published in the same study shows the schema-citation relationship exists at population scale. Schema value compounds across Knowledge Graph entity registration, rich-result eligibility, and entity-level brand authority. A 30-day per-URL difference-in-differences design cannot capture any of those mechanisms.

What did the Ahrefs study actually prove?

It proved that adding binary schema presence to pages already saturated with 100 or more AI Overview citations does not move per-URL citation counts in the 30 days following implementation. That is a narrow, likely correct finding. It does not prove schema markup is irrelevant to AI citations. The conclusion Ahrefs drew is unsupported by their own data, particularly the 6 million URL initial correlation that they buried in paragraph 14 of the paper.

Next Steps — Ahrefs 2026 Schema Study

The Ahrefs study is a useful artifact, not a verdict. It tests a narrow question on a narrow sample, reports a narrow finding, and the broader correlation data the same paper published complicates the headline. Five actions to deploy schema correctly for AI citation lift in 2026:

▶Audit your schema depth, not your schema presence. Replace binary "do we have JSON-LD" with the six-dimension Schema Test Validity Framework: Selection Floor, Outcome Layer, Schema Depth, Time Horizon, Confounder Isolation, Entity-Level Aggregation
▶Implement cross-page @id references plus sameAs Wikidata anchors first. These are the two highest-leverage schema upgrades that connect your site to Google's Knowledge Graph plus Wikidata, which feeds every major AI system's entity disambiguation layer
▶Measure schema impact on a 180-day window minimum. Knowledge graph propagation, RAG embedding refresh, plus AI training corpus update cycles complete on 90-to-180-day timelines. Shorter windows will miss the actual mechanism
▶Track citation quality plus entity visibility, not just citation count. A page cited as an authoritative source ranks differently from a page cited as background context. A brand entity surfacing across many queries ranks differently from any single URL's citation count
▶Build per-entity reporting, not per-URL. Schema benefits sibling pages plus the parent brand entity in ways per-URL methodology fundamentally cannot measure. Per-domain dashboards capture the compound effect

Want a Schema Test Validity audit on your own implementation, scored against all six dimensions? Explore Digital Strategy Force's Answer Engine Optimization (AEO) services for a complete schema depth diagnostic, knowledge graph entity registration, plus entity-level citation measurement built on the framework the Ahrefs study should have used.

// DISCUSS WITH AI

Open this article inside an AI assistant — pre-loaded with DSF's framework as the lens.

▸ Perplexity ▸ ChatGPT ▸ Gemini ▸ Claude