GitHub Actions SEO: gate PRs on broken links and schema
GitHub Actions SEO checks for blog PRs: four automated jobs that catch broken links, bad canonicals, invalid JSON-LD, and image-driven Core Web Vitals failures.
GitHub Actions SEO checks for blog PRs: four automated jobs that catch broken links, bad canonicals, invalid JSON-LD, and image-driven Core Web Vitals failures.

Code review is good at catching logic bugs. SEO bugs are different: a broken canonical does not throw a build error, a dead external link does not fail a type check, and a malformed JSON-LD block does not appear in a diff in any way that signals a problem. They ship quietly. You find out weeks later from Search Console.
The fix is a GitHub Actions SEO workflow that gates every blog PR automatically. Four jobs check broken links, meta and canonical correctness, JSON-LD validity, and a Lighthouse performance budget. The merge button stays red until all four pass.
This is the workflow, job by job.
A code reviewer checking a blog post looks at the prose: is the structure right, does the intro land, are the claims defensible? Nobody in that review is clicking every external link, validating the canonical, or running the new hero image through a performance budget. Those checks are not part of the review process. CI makes them automatic.
External links rot. A link that resolved when the author found the source may have moved, renamed, or 404ed by the time the post ships. Nobody in editorial review clicks every citation in a 2,000-word post. A CI job does.
The canonical tag tells Google which URL to credit when the same or similar content appears at multiple addresses. In a Next.js App Router site, pages generate their canonical via generateMetadata. The common failure mode is a page that inherits a canonical from a parent layout instead of setting its own, producing a post whose canonical points at /blog/ rather than /blog/your-post-slug/.
The page renders without error, silently sending its ranking signal to the wrong URL.
Nestlé measured that pages appearing as rich results in Google Search have an 82% higher click-through rate than non-rich-result pages, a figure cited in Google's structured data documentation. A Milestone Internet study of 4.5 million queries measured 58 clicks per 100 queries for rich results against 41 for standard results. A single malformed property in the JSON-LD block, a date string in the wrong format, or a missing required field silently disqualifies the page from rich-result consideration. The structured data is rendered in the HTML; it just does not validate.
Lighthouse runs around 8 automated SEO audits per page, and none of them validate JSON-LD content. A separate validation step closes that gap.
Google's Core Web Vitals thresholds are LCP under 2.5 seconds, CLS under 0.1, and INP under 200 milliseconds. Roughly half of all tracked origins pass all three, per 2025 Web Almanac data, with desktop (56%) outperforming mobile (48%).
A PR that adds a 3MB PNG where a 200KB WebP should be can push LCP over threshold, but the build succeeds and the post looks fine locally. The regression only surfaces in Search Console weeks later.
All four jobs live in .github/workflows/blog-seo.yml. The workflow triggers on pull requests that change files in content/blog/, so it only runs when content changes:
name: Blog SEO checks
on:
pull_request:
paths:
- 'content/blog/**'
- '.github/workflows/blog-seo.yml'lychee-action wraps lychee, a link checker written in Rust. The lychee project benchmarks it at 576 links in about 60 seconds on the analysis-tools-dev/static-analysis repository; throughput varies by repo size and link distribution, but most blogs with a few dozen posts complete in well under two minutes. It reads Markdown files directly and does not require a running server, so it can complete before any build step.
jobs:
broken-links:
name: Broken links
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v7
- name: Check links
uses: lycheeverse/lychee-action@v2
with:
args: --verbose --no-progress 'content/blog/**/*.md'
fail: true
jobSummary: truefail: true exits with a non-zero code on any broken link, which fails the job. jobSummary: true writes the full report to the GitHub Actions job summary, accessible from the PR's check status.
Add a .lycheeignore at the repo root for URLs to exclude, one regex per line:
# Localhost references in code blocks
http://localhost
# Web archive links
https://web.archive.orgThere is no off-the-shelf action for meta-tag validation on a Next.js App Router site, so this job builds the site and runs a short Node script against the HTML output. The script checks each page for a <meta name="description">, a <link rel="canonical"> that matches the page's own URL, and basic Open Graph tags.
After validating, the job uploads the build as an artifact. The JSON-LD and Lighthouse jobs download it instead of rebuilding, so all three validate the same output and CI time does not multiply with each additional check:
meta-tags:
name: Meta and canonical tags
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v7
- uses: actions/setup-node@v6
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- name: Cache Next.js build
uses: actions/cache@v6
with:
path: .next/cache
key: ${{ runner.os }}-nextjs-${{ hashFiles('**/package-lock.json') }}
- name: Build
run: npx next build
env:
NODE_ENV: production
- name: Check meta and canonical tags
run: node scripts/check-meta.mjs
- name: Upload build artifact
uses: actions/upload-artifact@v7
with:
name: next-build
path: |
.next/
public/
retention-days: 1Setting process.exitCode = 1 instead of calling process.exit(1) immediately lets the script report every failure across all pages in a single run rather than stopping at the first hit. Create scripts/check-meta.mjs in your repo:
// scripts/check-meta.mjs
import { readdir, readFile } from 'node:fs/promises';
import { join, resolve } from 'node:path';
const SITE_URL = process.env.SITE_URL ?? 'https://yoursite.com';
const BLOG_DIR = resolve('.next/server/app/blog');
async function walk(dir) {
const entries = await readdir(dir, { withFileTypes: true });
const files = [];
for (const entry of entries) {
const full = join(dir, entry.name);
if (entry.isDirectory()) {
files.push(...await walk(full));
} else if (entry.name === 'page.html') {
files.push(full);
}
}
return files;
}
async function checkPage(htmlPath) {
const slug = htmlPath.replace(BLOG_DIR + '/', '').replace('/page.html', '');
const html = await readFile(htmlPath, 'utf8');
const expectedUrl = `${SITE_URL}/blog/${slug}/`;
let ok = true;
const description =
html.match(/<meta[^>]+name="description"[^>]+content="([^"]+)"/i)?.[1] ??
html.match(/<meta[^>]+content="([^"]+)"[^>]+name="description"/i)?.[1] ??
null;
if (!description) {
console.error(`[FAIL] Missing meta description: /blog/${slug}/`);
process.exitCode = 1;
ok = false;
}
const canonical =
html.match(/<link[^>]+rel="canonical"[^>]+href="([^"]+)"/i)?.[1] ??
html.match(/<link[^>]+href="([^"]+)"[^>]+rel="canonical"/i)?.[1] ??
null;
if (!canonical || canonical !== expectedUrl) {
console.error(`[FAIL] Canonical mismatch: /blog/${slug}/`);
console.error(` Expected: ${expectedUrl}`);
console.error(` Found: ${canonical ?? 'missing'}`);
process.exitCode = 1;
ok = false;
}
const ogTitle =
html.match(/<meta[^>]+property="og:title"[^>]+content="([^"]+)"/i)?.[1] ??
html.match(/<meta[^>]+content="([^"]+)"[^>]+property="og:title"/i)?.[1] ??
null;
if (!ogTitle) {
console.error(`[FAIL] Missing og:title: /blog/${slug}/`);
process.exitCode = 1;
ok = false;
}
if (ok) console.log(`[OK] /blog/${slug}/`);
}
const files = await walk(BLOG_DIR).catch(() => []);
if (files.length === 0) {
console.error('[FAIL] No HTML found in .next/server/app/blog - run next build first');
process.exitCode = 1;
} else {
await Promise.all(files.map(checkPage));
}walk recurses the App Router build directory and collects every page.html file. Next.js 15 App Router writes pre-rendered pages to .next/server/app/blog/<slug>/page.html, so the slug is extracted directly from the path. checkPage reads each file, runs all three checks without short-circuiting, and logs every failure before the process exits. Set SITE_URL via the environment (or hardcode your domain) to match the canonical your generateMetadata produces.
Schemar (johnnyreilly/schemar) wraps the Schema Markup Validator. It accepts a list of URLs, checks the JSON-LD on each against Schema.org's rules, and returns pass/fail results. Combine it with marocchino/sticky-pull-request-comment to keep the validation output as a single updating comment on the PR rather than a new comment on every push.
This job downloads the build artifact from the meta-tags job rather than rebuilding from scratch. The needs: meta-tags dependency controls ordering; the artifact carries the actual output.
The job also needs the slug of the post being reviewed. Rather than hardcoding it, a get-slug step extracts the filename from the git diff - the slug is just the new .md filename in content/blog/ with its extension stripped:
json-ld:
name: JSON-LD validation
runs-on: ubuntu-latest
needs: meta-tags
steps:
- uses: actions/checkout@v7
- uses: actions/setup-node@v6
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- name: Download build artifact
uses: actions/download-artifact@v8
with:
name: next-build
- name: Get new post slug
id: slug
run: |
git fetch origin ${{ github.base_ref }}:refs/remotes/origin/${{ github.base_ref }}
SLUG=$(git diff --name-only origin/${{ github.base_ref }}..HEAD \
-- 'content/blog/' | grep '\.md$' | head -1 \
| sed 's|content/blog/||; s|\.md$||')
echo "slug=${SLUG}" >> $GITHUB_OUTPUT
- name: Start preview server
run: npx next start &
- name: Wait for server
run: npx wait-on http://localhost:3000
- name: Validate JSON-LD
id: schemar
uses: johnnyreilly/schemar@v0.1.1
with:
urls: "http://localhost:3000/blog/${{ steps.slug.outputs.slug }}/"
- name: Format results as markdown
id: format
if: always()
uses: actions/github-script@v9
with:
script: |
const results = ${{ steps.schemar.outputs.results }};
const lines = results.map((r) =>
`${r.processedValidationResult.success ? '🟢' : '🔴'} ${r.url}: ${r.processedValidationResult.resultText}`
);
core.setOutput('comment', ['### JSON-LD validation', ...lines].join('\n'));
- name: Post results as sticky PR comment
uses: marocchino/sticky-pull-request-comment@v3
with:
header: json-ld-validation
message: ${{ steps.format.outputs.comment }}The fetch line writes an explicit refspec, origin/${{ github.base_ref }}:refs/remotes/origin/${{ github.base_ref }}, instead of a bare git fetch origin main. actions/checkout@v7 defaults to a shallow, single-branch clone of the PR head, so a bare fetch only populates FETCH_HEAD and leaves no local origin/main ref for the diff to compare against. The explicit refspec creates that ref directly.
It still is not enough on its own. actions/checkout@v7's default depth-1 clone fetches only the PR head commit, with no shared history to main in the local repository, so origin/main and HEAD have no common ancestor that git can find locally. A three-dot diff (origin/main...HEAD), which compares against the merge base, fails with fatal: no merge base in that state. The two-dot form above (origin/main..HEAD) compares the two tips directly and does not need one, so it works regardless of the checkout's fetch depth.
The header param on the sticky comment means each new push overwrites the previous result in place. The PR timeline stays clean.
Schemar's results output is Result[], a JSON array, not pre-formatted markdown: each entry carries a url and a processedValidationResult object with success and resultText fields, confirmed in schemar's action.yml. Passing that array straight to message posts raw JSON on the PR. The actions/github-script step in between maps each result to a one-line pass/fail row before it reaches the sticky comment, which is the same shape johnnyreilly's own writeup of the action uses for its PR comments.
treosh/lighthouse-ci-action runs Lighthouse CI against a locally served build and fails the job when any assertion falls below threshold.
Like the JSON-LD job, this downloads the artifact rather than running another build. It also uses the same get-slug step to discover the post URL from the diff, then generates .lighthouserc.json on the fly so no file needs manual editing per PR:
lighthouse:
name: Lighthouse budget
runs-on: ubuntu-latest
needs: meta-tags
steps:
- uses: actions/checkout@v7
- uses: actions/setup-node@v6
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- name: Download build artifact
uses: actions/download-artifact@v8
with:
name: next-build
- name: Get new post slug
id: slug
run: |
git fetch origin ${{ github.base_ref }}:refs/remotes/origin/${{ github.base_ref }}
SLUG=$(git diff --name-only origin/${{ github.base_ref }}..HEAD \
-- 'content/blog/' | grep '\.md$' | head -1 \
| sed 's|content/blog/||; s|\.md$||')
echo "slug=${SLUG}" >> $GITHUB_OUTPUT
- name: Generate .lighthouserc.json
run: |
cat > .lighthouserc.json << EOF
{
"ci": {
"collect": {
"url": ["http://localhost:3000/blog/${{ steps.slug.outputs.slug }}/"],
"startServerCommand": "npx next start",
"startServerReadyPattern": "started server"
},
"assert": {
"assertions": {
"largest-contentful-paint": ["error", { "maxNumericValue": 2500 }],
"cumulative-layout-shift": ["error", { "maxNumericValue": 0.1 }],
"total-blocking-time": ["warn", { "maxNumericValue": 300 }]
}
}
}
}
EOF
- name: Run Lighthouse CI
uses: treosh/lighthouse-ci-action@v12
with:
uploadArtifacts: true
temporaryPublicStorage: true
configPath: .lighthouserc.jsonThe Generate .lighthouserc.json step uses a heredoc where ${{ steps.slug.outputs.slug }} is substituted by the Actions runner before the shell executes - so the generated file contains the literal slug, not a variable reference. LCP under 2500ms and CLS under 0.1 are Google's passing thresholds. Using "error" rather than "warn" is what causes the job to fail. Total blocking time is the closest lab-measurable proxy for INP; "warn" surfaces problems without blocking the merge on what is an approximation of a field metric. Tighten or relax as the site's performance baseline becomes clearer.
The four jobs above produce check runs on every PR. By default, GitHub does not prevent merging when a check fails. One configuration step makes them binding.
Go to repository Settings, then Branches. Add a branch protection rule for the branch content merges into, typically main. Under "Require status checks to pass before merging", add all four job names:
Broken linksMeta and canonical tagsJSON-LD validationLighthouse budgetWith these set as required, the merge button stays disabled until all four pass. A single failure keeps the PR locked regardless of approvals.
Without this step, the entire setup is advisory: the checks run and report, but nothing actually blocks the merge. This is the step most workflow tutorials omit.
The Schemar job's sticky comment puts JSON-LD results directly on the PR without navigating to the Actions run page. For the other three jobs, the GitHub job summary (via jobSummary: true on lychee, and console output on the meta-tag script) provides the detailed report accessible from each check status link.
Make the meta-tag script output specific enough to act on immediately:
[FAIL] Missing meta description: /blog/new-post-slug/
[FAIL] Canonical mismatch: /blog/new-post-slug/
Expected: https://yoursite.com/blog/new-post-slug/
Found: https://yoursite.com/blog/Three habits prevent the workflow from becoming a source of noise.
@v2, @v7, @v12). Moving tags like @latest break without warning when upstream ships a breaking change. Check release pages when onboarding a new action; marocchino/sticky-pull-request-comment is at v3, for example..next/cache between workflow runs, and the artifact upload carries the final output to the JSON-LD and Lighthouse jobs - one build per PR, three jobs consuming it..lycheeignore current. As the blog grows, more code-block URLs and archived-page references need exclusion. A stale file generates false failures that train the team to dismiss CI output; update it when adding an exclusion-worthy URL.A green run confirms:
What it does not confirm: whether the facts are correct, whether the post answers the question it sets up, or whether the prose is worth reading. CI has no opinion about those things.
This is the same division that makes automated content creation work without creating editorial risk: automate every check that has a clear pass/fail definition, leave judgment to people with context.
When a PR reaches human review with all four checks green, the reviewer does not need to wonder whether the canonical is pointing at itself or whether the link to the case study still resolves. CI answered those questions. The reviewer can focus on what CI cannot check: accuracy, voice, and whether the post actually serves the reader.
Combined with an earlier fact-checking step, Lyra verifies claims and links before opening the PR, CI gates the technical surface, and human review handles editorial judgment. All three pass before the post ships.
For teams using a PR-based AI blog writer where an agent produces the first draft, the CI gate is especially useful. The agent drafts fast, the checks run in parallel, and the reviewer sees a PR already validated on both the technical and factual axes. Internal linking automation also benefits directly: the broken-link job confirms that any new cross-links added to a post actually resolve before they ship.
A blog post and a code change share the same vulnerability: technical bugs that pass human review because humans are not built to run validators in their heads. These four GitHub Actions jobs run the validators automatically, so a green merge means what it should.
Step by step
Add lychee-action for broken link checking
Create .github/workflows/blog-seo.yml. Add a job that runs lycheeverse/lychee-action on your content/ directory before next build. Create a .lycheeignore file for URLs that should be excluded, such as localhost references in code blocks.
Parse built HTML for meta and canonical issues
Add a job that runs next build, checks each blog page for a meta description, self-referencing canonical, and Open Graph tags, then uploads the .next/ build as an artifact for downstream jobs to download instead of rebuilding.
Lint JSON-LD with Schemar
Download the build artifact from the meta-tags job, serve it locally, and pass the new blog post URL to the Schemar GitHub Action (johnnyreilly/schemar@v0.1.1). Format its results array into markdown with actions/github-script, then post it as a sticky PR comment via marocchino/sticky-pull-request-comment.
Assert Lighthouse Core Web Vitals with treosh/lighthouse-ci-action
Download the build artifact, serve it locally, and point treosh/lighthouse-ci-action at the new post URL via .lighthouserc.json. Set budget assertions for LCP under 2500ms and CLS under 0.1 as errors.
Require all four jobs in branch protection
Go to repository Settings, then Branches. Add a branch protection rule for your main branch. Under Require status checks to pass before merging, add all four job names. Without this step the workflow runs but nothing actually blocks the merge.
FAQ
Yes. A four-job workflow can check for broken links before the build, validate meta and canonical tags in built HTML, lint JSON-LD structured data, and assert Lighthouse Core Web Vitals thresholds, all before the merge button turns green.
lychee-action is fast and purpose-built: it is written in Rust, can check hundreds of links in under a minute (the lychee project benchmarks 576 links in about 60 seconds on the analysis-tools-dev/static-analysis repository), scans Markdown and HTML files, supports a .lycheeignore file for patterns to skip, and writes results to GitHub Job Summaries.
Schemar (johnnyreilly/schemar) wraps the Schema Markup Validator. Pass it the new blog post URL from a locally-served build and it posts pass/fail results as a sticky PR comment via marocchino/sticky-pull-request-comment, so the result stays visible as the PR evolves.
Google's Core Web Vitals targets are LCP under 2.5 seconds, CLS under 0.1, and INP under 200 milliseconds. Lighthouse cannot measure INP directly in lab conditions, so assert total-blocking-time as the proxy instead (300ms is a reasonable warning threshold). treosh/lighthouse-ci-action supports LHCI assertions that fail the build when any threshold is exceeded.
Built by the tool you're reading about
Lyra finds the topics worth ranking for, writes them in your repo's voice, fact-checks every claim, and opens a pull request scored and ready to merge. You review and hit merge. Want to see what she'd write for you? Tell us about your blog and the founder will walk through it with you.
Keep reading

Automated content creation that doesn't read like slop. What to automate, what to keep human, and how to add verification and review so scaled content still ranks and gets cited.

Semantic SEO automation, explained. How to build topical authority with entities and clusters, and which parts of the work, like internal linking and coverage gaps, you can safely automate.

AI content fact-checking, explained. How to catch hallucinated stats and dead links before they ship, and how Lyra verifies every claim and link automatically.