CB Audit (internal AEO/SEO tool) · Day 27 · May 27, 2026 · 10 min read

Building CB Audit — an agentic AEO + SEO report engine on Claude + OpenAI

Shipped CB Audit: a local, two-pipeline audit engine that drops a URL, crawls the site, generates buyer-intent queries with Claude, probes Claude + OpenAI, scores brand presence with a Claude-as-judge rubric, and renders branded HTML/PDF reports — plus a parallel classical-SEO pipeline. One web UI, one shared queue, ~$1–2 and 3–6 minutes per audit.

By Andy D — Founder, Creative Brain Inc. — Brampton, Ontario

CB Audit landing page with the headline 'Audit your site for search engines or AI assistants' and two cards: Answer Engine Optimization (AEO) and Search Engine Optimization (SEO). — The chooser at `/`. One tool, two questions: "Why isn't AI mentioning us?" (AEO) and "Why aren't we ranking?" (SEO). Both pipelines share the queue, history, and progress stream.

TL;DR

Shipped CB Audit — a local Node.js engine that turns a single URL into a branded AEO and SEO report, internal + client variants, HTML + PDF + CSV.
The AEO pipeline crawls the site, has Claude generate 20 brand-neutral buyer-intent queries, probes both Claude (claude-opus-4-7) and OpenAI (gpt-4o), then uses Claude as a judge to score every response on mention, position, citation, sentiment, framing, and competitor share-of-voice (0–100 composite).
The SEO pipeline runs a multi-page BFS crawl and scores seven weighted categories — technical, on-page, schema, Core Web Vitals (PageSpeed), links, LLM-judged content, and an optional competitor compare.
A single Express + SSE server runs audits one-at-a-time off a file-backed queue, streams live progress, and survives restarts (in-flight jobs reconcile to failed on boot).
Real numbers from the first runs: creativebrain.ca scored SEO 64/100; sprint.creativebrain.ca scored AEO 0 across 31 probes while Toptal/Bubble/Webflow ate the share-of-voice. The zero is the point — it's the baseline the cbinc-www work is now moving.

Wiring: pure Node, ESM, file-backed job store (data/jobs.json). Two idempotent migrations run on boot. No database. Keys via .env (ANTHROPIC_API_KEY, OPENAI_API_KEY, optional PAGESPEED_API_KEY).

Why we shipped this

The May 26 cbinc-www sprint log opens with "the May 5 audit gave us SEO 64 / AEO 1." Those numbers didn't come from Ahrefs — they came from this tool. CB Audit is the instrument we use to decide what to ship on our own site and, increasingly, in client pitches.

The motivating problem: AEO is a discipline you can't improve without measuring, and nothing off-the-shelf measures it. Classical SEO tools tell you about backlinks and Core Web Vitals. None of them answer the question a prospect actually triggers when they open ChatGPT or Claude and type "best 14-day MVP development services for startups" — does our brand even get named, and if so, are we the leader or the afterthought? The only honest way to measure that is to ask the real models the real questions and grade the answers.

So the AEO pipeline is deliberately empirical: it doesn't infer presence from on-page signals, it probes the live LLMs and scores what comes back. Buyer-intent queries are generated brand-neutral on purpose — if we seeded the brand name into the query, we'd measure nothing but the model's ability to echo. We want the organic mention rate.

The SEO pipeline rides along in the same tool for one practical reason: client pitches. Dropping one URL and getting both "here's how you rank" and "here's how you show up in AI" — as two branded PDFs — is the deliverable. Building them as two pipelines behind one queue meant the web UI, history, progress streaming, and report plumbing got written once.

Files added or modified

AEO pipeline (`src/`)

src/crawl.js — single-page fetch + Cheerio extraction: title, meta, headings, JSON-LD schema inventory, FAQ detection, content sample, brand guess.
src/generate-queries.js — Claude generates 20 buyer-intent queries across 5 categories (category, comparison, problem-solution, brand, longtail).
src/probe-llms.js — fans every query out to Claude + OpenAI with a shared system prompt, p-limit concurrency, per-call error capture.
src/score.js — Claude-as-judge scoring against the rubric, plus aggregate() for overall score, share-of-voice, per-provider and per-category breakdowns.
src/recommend.js, src/report.js, src/run-pipeline.js — tiered recommendations, HTML/PDF render, AEO orchestrator.

SEO pipeline (`src/seo/`)

crawl.js — multi-page crawler (robots.txt, sitemap, BFS to N pages).
audit-technical.js, audit-onpage.js, audit-schema.js, audit-performance.js (PageSpeed Insights), audit-links.js, audit-content.js (LLM keyword + content-quality), audit-competitors.js.
score.js — weighted composite. recommend.js, remediate.js (paste-ready titles/meta/JSON-LD), report.js, run-pipeline.js.

Shared, server, and UI

src/dispatcher.js — kind-aware dispatcher; maps kind: "aeo" | "seo" to its pipeline and step list.
src/shared/http.js, src/shared/pdf.js (Puppeteer launcher), src/shared/csv.js.
server.js — Express + SSE, file-backed job store, single-job queue with pump(), report/CSV endpoints.
web/app.js, web/index.html, web/styles.css — vanilla SPA chooser + forms + history + live results.
run-audit.js, run-seo-audit.js — CLI entry points.

Prompts, templates, migrations

prompts/query-generation.md, prompts/scoring-rubric.md, prompts/recommendations.md + prompts/seo/*.
templates/report.html, templates/report-client.html + templates/seo/*.
scripts/migrate-add-kind.mjs, scripts/migrate-aeo-folders.mjs — idempotent, run on boot, back up data/jobs.json before mutating.

Design rationale

Two pipelines, one UI, one queue

AEO and SEO answer different questions, so they're different pipelines — but they share everything operational. The server doesn't know or care which pipeline a job runs; src/dispatcher.js takes the job's kind and returns the right orchestrator and step list. The web UI is a single chooser that branches into two forms, and both feed the same POST /api/audits.

The AEO entry. Competitors are optional — left blank, Claude infers 3–5 likely ones from the crawl context so share-of-voice still has something to measure against.

SEO audit form headed 'How does your site perform in search engines?' with fields for site URL, optional competitor URLs, and a pages-to-crawl slider set to 20. — The SEO entry. Same shell, different inputs: full competitor URLs (not names) and a pages-to-crawl budget that caps the BFS crawl.

The rejected alternative was two separate apps. That would have doubled the queue, the SSE plumbing, the history table, and the report endpoints. Routing on a kind discriminator kept all of that single-sourced.

Claude plays three roles, and they're kept separate

Claude isn't called once — it's called as a query strategist (generate buyer-intent queries), as a respondent (one of the two probed models), and as a judge (score every response against the rubric). Each role has its own prompt file in prompts/ so the judge's rubric can evolve without touching the strategist's instructions. Treating "the model that answers" and "the model that grades" as distinct steps is what makes the AEO score defensible — the judge sees only the query, the response, the target brand, and the competitor list, and returns structured JSON.

The scoring rubric is the heart of it: every probe gets graded on mentioned + position (first / early / middle / late), cited, sentiment, framing (leader / alternative / afterthought / competitor_to_others), competitors_mentioned, and a 0–100 overall_score.

AEO results page for sprint.creativebrain.ca showing 31 probes across Claude and OpenAI, an overall AEO score of 0, 0% mention rate, 0% citation rate, 0% leader framing, and per-provider cards for Claude (16 probes) and OpenAI (15 probes). — The headline metrics, split by provider. `aggregate()` in `src/score.js` rolls the per-probe JSON up into overall score, mention/citation rates, framing distribution, and the Claude-vs-OpenAI breakdown.

AEO share-of-voice chart titled 'Who AI names first' showing competitor mention counts — Toptal 16, Bubble 10, Webflow 9, Glide 8, Adalo 8, FlutterFlow 8, Stripe 8, Softr 7 — above the embedded full HTML deliverable. — Share-of-voice is the slide that lands in pitches: when AI answers buyer questions in our category, *these* are the names it reaches for. Zero of them are us — yet.

SEO score is a weighted composite, not a vibe

The SEO pipeline produces a 0–100 score, but the number is only useful because it decomposes. Each category carries an explicit weight (technical 20% / on-page 20% / schema 10% / performance 15% / links 10% / content 20% / competitors 5%), so a single headline score always traces back to which category dragged it down. On creativebrain.ca that was Content (34/100) and Performance (50/100) pulling a 64 overall, while Schema sat at 100.

SEO results page for creativebrain.ca showing an overall score of 64, 20 pages crawled, 15 issues found, 14 recommendations, and a category breakdown grid: Technical 78, On-Page 60, Schema 100, Performance 50, Links 93, Content 34. — The category breakdown makes the composite honest — you can see exactly where the 64 comes from and which weight each category carries.

The full branded SEO deliverable headed 'How Creative Brain Inc. performs in search engines' with an executive summary stating the SEO score is 64/100 and a four-stat summary block. — The full deliverable — the same data rendered as the client-facing report (HTML here, identical layout exported to PDF via Puppeteer).

File-backed queue over a job library

The server runs one audit at a time off an in-memory queue, persisting job state to data/jobs.json. No Redis, no BullMQ, no SQLite. For a single-operator local tool that runs a handful of audits a day, a JSON file plus a pump() function that starts the next job when the active one finishes is the right amount of machinery. The trade-off — no parallelism across audits — is acceptable because each audit is rate-limited against the model APIs anyway.

Gotchas we hit (so the next log doesn't repeat them)

LLMs fence their JSON even when told not to. Both generate-queries.js and score.js ask for raw JSON with "no markdown fences," and the models wrap it in a ```json block anyway often enough to break JSON.parse. Fix: strip fences before parsing — text.replace(/```json|```/g, '').trim() — rather than trusting the instruction.
One flaky provider call shouldn't kill a 40-call run. probeClaude/probeOpenAI never throw; they return { error } objects, and aggregate() filters to valid results only. Carry-forward: in fan-out probe steps, capture per-call errors as data and exclude them at aggregation — don't let Promise.all reject the whole batch.
A server restart used to strand jobs in running. On boot, server.js reconciles any job still marked running or queued to failed with error: "interrupted by server restart". Fix: treat the file-backed store as untrusted on startup and reconcile in-flight state before accepting new work.
Legacy rows had no kind. The AEO+SEO split added a kind discriminator that older jobs.json rows lacked. migrate-add-kind.mjs tags them kind: "aeo" and migrate-aeo-folders.mjs relocates legacy files into data/aeo/ + reports/aeo/, rewriting the path fields so old reports still resolve. Both are idempotent and write a one-time backup. Carry-forward: ship the migration with the schema change and run it on boot, not as a manual step.
Core Web Vitals is optional by design. The performance audit calls Google PageSpeed Insights, which needs PAGESPEED_API_KEY. Without it the step is skipped gracefully rather than failing the run — so an SEO audit still completes on a fresh checkout with only the two LLM keys set.

Verifying the work

cd cb-aeo-audit
npm install
cp .env.example .env          # fill ANTHROPIC_API_KEY, OPENAI_API_KEY (PAGESPEED_API_KEY optional)

# Web UI — choose AEO or SEO at http://localhost:3000
npm run web

# Or from the CLI:
npm run audit:aeo -- https://your-brand.com --competitors=Acme,Globex --queries=20
npm run audit:seo -- https://your-site.com --pages=20

Module smoke tests (each runs one step against a hard-coded URL):

npm run test-crawl      # src/crawl.js
npm run test-queries    # Claude query generation
npm run test-probe      # Claude + OpenAI probe
npm run test-score      # Claude-as-judge

Then open History and confirm completed jobs render with their score, an Open HTML report, and a downloadable PDF — for both kinds.

The shared history view — both pipelines write to one job table, filterable by kind. This is the quickest end-to-end check that the queue, scoring, and report endpoints all wired up correctly.

What's next

Production re-audits, not localhost. Re-score creativebrain.ca and sprint.creativebrain.ca against the live deploy so HSTS, canonical, and trailing-slash artifacts drop out of the report — and so the AEO 0 baseline gets a real "after" number once the cbinc-www citable-surface work lands.
More probe surfaces (deferred Phase 4). Add Perplexity and Gemini probes, scrape Google AI Overviews, and extract citation sources — right now share-of-voice is Claude + OpenAI only.
Delta tracking. Persist each audit's aggregate so re-runs show movement over time per query and per competitor, instead of standalone snapshots.
White-label PDF mode for client-delivered reports, and a combined #/domain/:host view that overlays a domain's AEO and SEO results on one page.
Scheduled re-audits + email digests — blocked on deciding where this runs (it's local-only today; a hosted scheduler is a separate build).