A content audit evaluates every indexed URL against search performance data — not just word count or publish date. The goal is to surface pages that are dragging down domain authority and pages that, with targeted improvements, can capture significantly more organic traffic and AI-generated referrals.
The audit starts with a full site crawl using Screaming Frog to inventory all indexed URLs. Each URL is then matched against Google Search Console data (impressions, clicks, average position) and Google Analytics 4 (sessions, engagement rate, conversions) to establish a performance baseline. Ahrefs adds backlink count and referring domain data per URL.
Every page receives an action recommendation — keep, update, consolidate, redirect, or remove — based on traffic potential, content quality, and cannibalization risk. For clients optimizing for AI search, each page also receives a GEO citation readiness score based on structural factors known to increase LLM citation probability.
A comprehensive content audit analyzes four distinct performance dimensions, each requiring a different remediation approach.
Compares your current content inventory against competitor topical coverage and search demand data from Ahrefs or SEMrush. Identifies keyword clusters where competitors rank but you have no page — these are direct traffic opportunities. Also flags internal gaps where pillar pages exist but supporting cluster articles are missing.
Cross-references Google Search Console query data against page URLs to identify cases where two or more pages compete for the same primary keyword. Cannibalization dilutes ranking signals — fixing it (via consolidation or 301 redirect) concentrates authority on the best-performing URL and typically produces measurable ranking gains within 4–8 weeks.
Flags pages with insufficient depth relative to competing pages ranking for the same query. Thin content is assessed by word count thresholds, content uniqueness, search intent match, and GSC impressions relative to index status. Pages with high indexation but near-zero impressions are primary candidates for expansion or consolidation.
Assesses when each page was last substantively updated versus competitors' update frequency. For time-sensitive queries, Google weights recency — and AI language models weight training data recency. Pages not updated in 6+ months on competitive topics are flagged for a freshness refresh, which includes adding new data, statistics, examples, or expanded sections, not cosmetic date changes.
AI language models are trained on web data with cutoff dates — but they also weight recent, high-quality sources for retrieval-augmented generation (RAG). Pages that consistently receive substantive updates are more likely to appear in AI Overviews, ChatGPT answers, and Perplexity citations.
The recommended content freshness cadence is:
Important: Changing a publish date without adding new information does not improve freshness scores — Google and AI crawlers assess content change depth, not timestamp alone. Cosmetic date updates can trigger trust penalties if quality raters or AI models compare the content against the claimed update date.
In addition to standard SEO metrics, the content audit evaluates each page against GEO (Generative Engine Optimization) citation criteria. Pages are scored on:
| Citation Readiness Factor | What Is Evaluated | AI Impact |
|---|---|---|
| Answer Capsule presence | Does the page open with a 40–60 word direct definition or answer? | High — cited in AI overviews for definitional queries |
| BLUF H2 structure | Does each H2 section open with a standalone answer sentence? | High — 44% of LLM citations from first 30% of page |
| FAQPage schema | Is structured FAQ markup implemented and validated? | Medium-high — FAQ content appears directly in AI answers |
| Verifiable stats with sources | Are statistics cited with source attribution in-text? | Medium — AI models prefer citable, sourced claims |
| Internal cluster links | Does the page link to and from its topical cluster? | Medium — topical authority concentration increases SoM |
| Content freshness | Was the page substantively updated in the last 90 days? | Medium — AI models weight recent, authoritative sources |
The audit deliverable is a structured Google Sheets workbook with one row per indexed URL, covering all dimensions needed to prioritize improvements by traffic impact.
A content audit is most valuable for websites that have published content over 12+ months and are experiencing ranking stagnation, organic traffic decline, or inconsistent performance across pages. Common scenarios:
At Jumpfactor, a B2B SaaS and professional services SEO agency, content audits were a standard component of every new client engagement. Auditing sites with 200–800+ pages required systematic approaches to cannibalization identification and action recommendation prioritization — work that directly informed content consolidation strategies that produced measurable ranking improvements.
At Ampry, content auditing identified thin content across product and landing page variants that had accumulated over multiple development cycles. Consolidation recommendations reduced indexed thin pages by over 40% while concentrating authority on the highest-converting URLs.
Content freshness cadences were established for Business Training Team's pillar-based content program, with structured quarterly reviews ensuring that high-value pages maintained competitive freshness scores against industry benchmarks.
A content audit is a systematic review of every page on a website to assess SEO performance, content quality, and search visibility. It identifies which pages rank, which have thin or duplicate content, where keyword cannibalization exists, and which pages need to be updated, consolidated, or removed. A modern content audit also evaluates AI citation readiness — whether pages are structured to appear in ChatGPT, Gemini, or Perplexity answers.
A full content audit is typically performed every 6–12 months. High-volume sites with 500+ pages may benefit from quarterly rolling audits by section. Individual page freshness should be maintained on a 30–90 day cycle for substantive updates — Google's quality raters and AI models both assess content freshness as a ranking signal. Cosmetic updates (changing a date without adding new information) do not improve rankings and can trigger trust penalties.
The deliverable is a structured Google Sheets workbook with one row per indexed URL. Each URL has: GSC impressions and clicks (trailing 3 months), organic traffic from GA4, word count, last modified date, content type classification, cannibalization flags, content gap opportunities, action recommendation (keep, update, consolidate, redirect, or remove), and estimated traffic impact. For GEO clients, it also includes an AI citation readiness score per page.
Keyword cannibalization occurs when two or more pages on the same site target the same primary keyword, causing Google to split ranking signals between them instead of consolidating authority on one page. A content audit identifies cannibalization by cross-referencing GSC search queries with page URLs — when the same query drives impressions to multiple pages, those pages are candidates for consolidation or redirect.
Thin content is pages with insufficient depth to satisfy user search intent — typically under 300 words, with no original analysis, duplicated from other sources, or auto-generated. In a content audit, thin content is identified through word count thresholds, GSC performance data (low impressions despite indexation), and manual sampling. These pages are flagged for expansion, consolidation with related content, or removal with 301 redirect.
Content freshness cadence is a scheduled review cycle where existing pages receive substantive updates — new data, expanded sections, additional examples, or updated statistics — rather than cosmetic edits. AI language models weight recency in their training data, and Google's QRG (Quality Rater Guidelines) treat "freshness" as a quality signal for time-sensitive queries. The recommended cycle is every 30–90 days for core pages, with "last updated" timestamps visible in page content and schema markup.
AI models (ChatGPT, Gemini, Perplexity) preferentially cite pages that: (1) answer the target question in the first 40–60 words, (2) use structured headings with direct answers below each H2, (3) include FAQPage schema, (4) cite verifiable statistics with sources, and (5) maintain consistent topical authority within a content cluster. The audit evaluates each existing page against these criteria and produces a prioritized list of pages to restructure for AI citation eligibility.
A comprehensive content audit uses: Screaming Frog SEO Spider (full site crawl and URL inventory), Google Search Console (impressions, clicks, average position per URL), Google Analytics 4 (organic traffic, engagement rate per page), Ahrefs or SEMrush (backlink count per URL, referring domains, organic keyword rankings), and Surfer SEO or Clearscope (content depth scoring vs. top-ranking competitors). Results are compiled in Google Sheets or Looker Studio for client review.
Most sites have more content than they need — and the wrong pages competing for the same keywords. A content audit identifies exactly where to focus before investing in new production.
Book a Free Consultation