Engineering

GitHub Actions SEO: gate PRs on broken links and schema

GitHub Actions SEO checks for blog PRs: four automated jobs that catch broken links, bad canonicals, invalid JSON-LD, and image-driven Core Web Vitals failures.

By Mitrasish, Co-founderJul 1, 202614 min read

GitHub Actions SEO: gate PRs on broken links and schema

Code review is good at catching logic bugs. SEO bugs are different: a broken canonical does not throw a build error, a dead external link does not fail a type check, and a malformed JSON-LD block does not appear in a diff in any way that signals a problem. They ship quietly. You find out weeks later from Search Console.

The fix is a GitHub Actions SEO workflow that gates every blog PR automatically. Four jobs check broken links, meta and canonical correctness, JSON-LD validity, and a Lighthouse performance budget. The merge button stays red until all four pass.

This is the workflow, job by job.

What a blog PR can ship that code review misses

A code reviewer checking a blog post looks at the prose: is the structure right, does the intro land, are the claims defensible? Nobody in that review is clicking every external link, validating the canonical, or running the new hero image through a performance budget. Those checks are not part of the review process. CI makes them automatic.

Broken external links nobody clicked during editorial review

External links rot. A link that resolved when the author found the source may have moved, renamed, or 404ed by the time the post ships. Nobody in editorial review clicks every citation in a 2,000-word post. A CI job does.

A missing or self-conflicting canonical that splits your ranking signal

The canonical tag tells Google which URL to credit when the same or similar content appears at multiple addresses. In a Next.js App Router site, pages generate their canonical via generateMetadata. The common failure mode is a page that inherits a canonical from a parent layout instead of setting its own, producing a post whose canonical points at /blog/ rather than /blog/your-post-slug/.

The page renders without error, silently sending its ranking signal to the wrong URL.

Malformed JSON-LD that silently forfeits rich-result eligibility

Nestlé measured that pages appearing as rich results in Google Search have an 82% higher click-through rate than non-rich-result pages, a figure cited in Google's structured data documentation. A Milestone Internet study of 4.5 million queries measured 58 clicks per 100 queries for rich results against 41 for standard results. A single malformed property in the JSON-LD block, a date string in the wrong format, or a missing required field silently disqualifies the page from rich-result consideration. The structured data is rendered in the HTML; it just does not validate.

Lighthouse runs around 8 automated SEO audits per page, and none of them validate JSON-LD content. A separate validation step closes that gap.

A new hero image that blows your Lighthouse budget

Google's Core Web Vitals thresholds are LCP under 2.5 seconds, CLS under 0.1, and INP under 200 milliseconds. Roughly half of all tracked origins pass all three, per 2025 Web Almanac data, with desktop (56%) outperforming mobile (48%).

A PR that adds a 3MB PNG where a 200KB WebP should be can push LCP over threshold, but the build succeeds and the post looks fine locally. The regression only surfaces in Search Console weeks later.

The GitHub Actions SEO workflow: four checks, one file

All four jobs live in .github/workflows/blog-seo.yml. The workflow triggers on pull requests that change files in content/blog/, so it only runs when content changes:

yaml

name: Blog SEO checks

on:
  pull_request:
    paths:
      - 'content/blog/**'
      - '.github/workflows/blog-seo.yml'

Job 1: broken links - lychee-action scans Markdown files before the build

lychee-action wraps lychee, a link checker written in Rust. The lychee project benchmarks it at 576 links in about 60 seconds on the analysis-tools-dev/static-analysis repository; throughput varies by repo size and link distribution, but most blogs with a few dozen posts complete in well under two minutes. It reads Markdown files directly and does not require a running server, so it can complete before any build step.

yaml

jobs:
  broken-links:
    name: Broken links
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v7

      - name: Check links
        uses: lycheeverse/lychee-action@v2
        with:
          args: --verbose --no-progress 'content/blog/**/*.md'
          fail: true
          jobSummary: true

fail: true exits with a non-zero code on any broken link, which fails the job. jobSummary: true writes the full report to the GitHub Actions job summary, accessible from the PR's check status.

Add a .lycheeignore at the repo root for URLs to exclude, one regex per line:

code

# Localhost references in code blocks
http://localhost
# Web archive links
https://web.archive.org

Job 2: meta, canonical, and OG tags - parse built HTML after next build

There is no off-the-shelf action for meta-tag validation on a Next.js App Router site, so this job builds the site and runs a short Node script against the HTML output. The script checks each page for a <meta name="description">, a <link rel="canonical"> that matches the page's own URL, and basic Open Graph tags.

After validating, the job uploads the build as an artifact. The JSON-LD and Lighthouse jobs download it instead of rebuilding, so all three validate the same output and CI time does not multiply with each additional check:

yaml

  meta-tags:
    name: Meta and canonical tags
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v7

      - uses: actions/setup-node@v6
        with:
          node-version: '20'
          cache: 'npm'

      - run: npm ci

      - name: Cache Next.js build
        uses: actions/cache@v6
        with:
          path: .next/cache
          key: ${{ runner.os }}-nextjs-${{ hashFiles('**/package-lock.json') }}

      - name: Build
        run: npx next build
        env:
          NODE_ENV: production

      - name: Check meta and canonical tags
        run: node scripts/check-meta.mjs

      - name: Upload build artifact
        uses: actions/upload-artifact@v7
        with:
          name: next-build
          path: |
            .next/
            public/
          retention-days: 1

Setting process.exitCode = 1 instead of calling process.exit(1) immediately lets the script report every failure across all pages in a single run rather than stopping at the first hit. Create scripts/check-meta.mjs in your repo:

javascript

// scripts/check-meta.mjs
import { readdir, readFile } from 'node:fs/promises';
import { join, resolve } from 'node:path';

const SITE_URL = process.env.SITE_URL ?? 'https://yoursite.com';
const BLOG_DIR = resolve('.next/server/app/blog');

async function walk(dir) {
  const entries = await readdir(dir, { withFileTypes: true });
  const files = [];
  for (const entry of entries) {
    const full = join(dir, entry.name);
    if (entry.isDirectory()) {
      files.push(...await walk(full));
    } else if (entry.name === 'page.html') {
      files.push(full);
    }
  }
  return files;
}

async function checkPage(htmlPath) {
  const slug = htmlPath.replace(BLOG_DIR + '/', '').replace('/page.html', '');
  const html = await readFile(htmlPath, 'utf8');
  const expectedUrl = `${SITE_URL}/blog/${slug}/`;
  let ok = true;

  const description =
    html.match(/<meta[^>]+name="description"[^>]+content="([^"]+)"/i)?.[1] ??
    html.match(/<meta[^>]+content="([^"]+)"[^>]+name="description"/i)?.[1] ??
    null;

  if (!description) {
    console.error(`[FAIL] Missing meta description: /blog/${slug}/`);
    process.exitCode = 1;
    ok = false;
  }

  const canonical =
    html.match(/<link[^>]+rel="canonical"[^>]+href="([^"]+)"/i)?.[1] ??
    html.match(/<link[^>]+href="([^"]+)"[^>]+rel="canonical"/i)?.[1] ??
    null;

  if (!canonical || canonical !== expectedUrl) {
    console.error(`[FAIL] Canonical mismatch: /blog/${slug}/`);
    console.error(`  Expected: ${expectedUrl}`);
    console.error(`  Found:    ${canonical ?? 'missing'}`);
    process.exitCode = 1;
    ok = false;
  }

  const ogTitle =
    html.match(/<meta[^>]+property="og:title"[^>]+content="([^"]+)"/i)?.[1] ??
    html.match(/<meta[^>]+content="([^"]+)"[^>]+property="og:title"/i)?.[1] ??
    null;

  if (!ogTitle) {
    console.error(`[FAIL] Missing og:title: /blog/${slug}/`);
    process.exitCode = 1;
    ok = false;
  }

  if (ok) console.log(`[OK]   /blog/${slug}/`);
}

const files = await walk(BLOG_DIR).catch(() => []);

if (files.length === 0) {
  console.error('[FAIL] No HTML found in .next/server/app/blog - run next build first');
  process.exitCode = 1;
} else {
  await Promise.all(files.map(checkPage));
}

walk recurses the App Router build directory and collects every page.html file. Next.js 15 App Router writes pre-rendered pages to .next/server/app/blog/<slug>/page.html, so the slug is extracted directly from the path. checkPage reads each file, runs all three checks without short-circuiting, and logs every failure before the process exits. Set SITE_URL via the environment (or hardcode your domain) to match the canonical your generateMetadata produces.

Job 3: JSON-LD linting - schemar posts pass/fail results as a sticky PR comment

Schemar (johnnyreilly/schemar) wraps the Schema Markup Validator. It accepts a list of URLs, checks the JSON-LD on each against Schema.org's rules, and returns pass/fail results. Combine it with marocchino/sticky-pull-request-comment to keep the validation output as a single updating comment on the PR rather than a new comment on every push.

This job downloads the build artifact from the meta-tags job rather than rebuilding from scratch. The needs: meta-tags dependency controls ordering; the artifact carries the actual output.

The job also needs the slug of the post being reviewed. Rather than hardcoding it, a get-slug step extracts the filename from the git diff - the slug is just the new .md filename in content/blog/ with its extension stripped:

yaml

  json-ld:
    name: JSON-LD validation
    runs-on: ubuntu-latest
    needs: meta-tags
    steps:
      - uses: actions/checkout@v7

      - uses: actions/setup-node@v6
        with:
          node-version: '20'
          cache: 'npm'

      - run: npm ci

      - name: Download build artifact
        uses: actions/download-artifact@v8
        with:
          name: next-build

      - name: Get new post slug
        id: slug
        run: |
          git fetch origin ${{ github.base_ref }}:refs/remotes/origin/${{ github.base_ref }}
          SLUG=$(git diff --name-only origin/${{ github.base_ref }}..HEAD \
            -- 'content/blog/' | grep '\.md$' | head -1 \
            | sed 's|content/blog/||; s|\.md$||')
          echo "slug=${SLUG}" >> $GITHUB_OUTPUT

      - name: Start preview server
        run: npx next start &

      - name: Wait for server
        run: npx wait-on http://localhost:3000

      - name: Validate JSON-LD
        id: schemar
        uses: johnnyreilly/schemar@v0.1.1
        with:
          urls: "http://localhost:3000/blog/${{ steps.slug.outputs.slug }}/"

      - name: Format results as markdown
        id: format
        if: always()
        uses: actions/github-script@v9
        with:
          script: |
            const results = ${{ steps.schemar.outputs.results }};
            const lines = results.map((r) =>
              `${r.processedValidationResult.success ? '🟢' : '🔴'} ${r.url}: ${r.processedValidationResult.resultText}`
            );
            core.setOutput('comment', ['### JSON-LD validation', ...lines].join('\n'));

      - name: Post results as sticky PR comment
        uses: marocchino/sticky-pull-request-comment@v3
        with:
          header: json-ld-validation
          message: ${{ steps.format.outputs.comment }}

The fetch line writes an explicit refspec, origin/${{ github.base_ref }}:refs/remotes/origin/${{ github.base_ref }}, instead of a bare git fetch origin main. actions/checkout@v7 defaults to a shallow, single-branch clone of the PR head, so a bare fetch only populates FETCH_HEAD and leaves no local origin/main ref for the diff to compare against. The explicit refspec creates that ref directly.

It still is not enough on its own. actions/checkout@v7's default depth-1 clone fetches only the PR head commit, with no shared history to main in the local repository, so origin/main and HEAD have no common ancestor that git can find locally. A three-dot diff (origin/main...HEAD), which compares against the merge base, fails with fatal: no merge base in that state. The two-dot form above (origin/main..HEAD) compares the two tips directly and does not need one, so it works regardless of the checkout's fetch depth.

The header param on the sticky comment means each new push overwrites the previous result in place. The PR timeline stays clean.

Schemar's results output is Result[], a JSON array, not pre-formatted markdown: each entry carries a url and a processedValidationResult object with success and resultText fields, confirmed in schemar's action.yml. Passing that array straight to message posts raw JSON on the PR. The actions/github-script step in between maps each result to a one-line pass/fail row before it reaches the sticky comment, which is the same shape johnnyreilly's own writeup of the action uses for its PR comments.

Job 4: Lighthouse budget - serve the build locally, assert on LCP, CLS, and INP

treosh/lighthouse-ci-action runs Lighthouse CI against a locally served build and fails the job when any assertion falls below threshold.

Like the JSON-LD job, this downloads the artifact rather than running another build. It also uses the same get-slug step to discover the post URL from the diff, then generates .lighthouserc.json on the fly so no file needs manual editing per PR:

yaml

  lighthouse:
    name: Lighthouse budget
    runs-on: ubuntu-latest
    needs: meta-tags
    steps:
      - uses: actions/checkout@v7

      - uses: actions/setup-node@v6
        with:
          node-version: '20'
          cache: 'npm'

      - run: npm ci

      - name: Download build artifact
        uses: actions/download-artifact@v8
        with:
          name: next-build

      - name: Get new post slug
        id: slug
        run: |
          git fetch origin ${{ github.base_ref }}:refs/remotes/origin/${{ github.base_ref }}
          SLUG=$(git diff --name-only origin/${{ github.base_ref }}..HEAD \
            -- 'content/blog/' | grep '\.md$' | head -1 \
            | sed 's|content/blog/||; s|\.md$||')
          echo "slug=${SLUG}" >> $GITHUB_OUTPUT

      - name: Generate .lighthouserc.json
        run: |
          cat > .lighthouserc.json << EOF
          {
            "ci": {
              "collect": {
                "url": ["http://localhost:3000/blog/${{ steps.slug.outputs.slug }}/"],
                "startServerCommand": "npx next start",
                "startServerReadyPattern": "started server"
              },
              "assert": {
                "assertions": {
                  "largest-contentful-paint": ["error", { "maxNumericValue": 2500 }],
                  "cumulative-layout-shift": ["error", { "maxNumericValue": 0.1 }],
                  "total-blocking-time": ["warn", { "maxNumericValue": 300 }]
                }
              }
            }
          }
          EOF

      - name: Run Lighthouse CI
        uses: treosh/lighthouse-ci-action@v12
        with:
          uploadArtifacts: true
          temporaryPublicStorage: true
          configPath: .lighthouserc.json

The Generate .lighthouserc.json step uses a heredoc where ${{ steps.slug.outputs.slug }} is substituted by the Actions runner before the shell executes - so the generated file contains the literal slug, not a variable reference. LCP under 2500ms and CLS under 0.1 are Google's passing thresholds. Using "error" rather than "warn" is what causes the job to fail. Total blocking time is the closest lab-measurable proxy for INP; "warn" surfaces problems without blocking the merge on what is an approximation of a field metric. Tighten or relax as the site's performance baseline becomes clearer.

Wiring it into the PR so the merge button stays red

The four jobs above produce check runs on every PR. By default, GitHub does not prevent merging when a check fails. One configuration step makes them binding.

Required status checks in branch protection - the one setting that makes everything above binding

Go to repository Settings, then Branches. Add a branch protection rule for the branch content merges into, typically main. Under "Require status checks to pass before merging", add all four job names:

Broken links
Meta and canonical tags
JSON-LD validation
Lighthouse budget

With these set as required, the merge button stays disabled until all four pass. A single failure keeps the PR locked regardless of approvals.

Without this step, the entire setup is advisory: the checks run and report, but nothing actually blocks the merge. This is the step most workflow tutorials omit.

Surfacing failures inline with sticky PR comments

The Schemar job's sticky comment puts JSON-LD results directly on the PR without navigating to the Actions run page. For the other three jobs, the GitHub job summary (via jobSummary: true on lychee, and console output on the meta-tag script) provides the detailed report accessible from each check status link.

Make the meta-tag script output specific enough to act on immediately:

code

[FAIL] Missing meta description: /blog/new-post-slug/
[FAIL] Canonical mismatch: /blog/new-post-slug/
  Expected: https://yoursite.com/blog/new-post-slug/
  Found:    https://yoursite.com/blog/

Keeping it zero-maintenance: .lycheeignore, pinned action versions, and caching the build

Three habits prevent the workflow from becoming a source of noise.

Pin action versions to major version tags (@v2, @v7, @v12). Moving tags like @latest break without warning when upstream ships a breaking change. Check release pages when onboarding a new action; marocchino/sticky-pull-request-comment is at v3, for example.
Share the build output. The cache step in the meta-tags job preserves .next/cache between workflow runs, and the artifact upload carries the final output to the JSON-LD and Lighthouse jobs - one build per PR, three jobs consuming it.
Keep .lycheeignore current. As the blog grows, more code-block URLs and archived-page references need exclusion. A stale file generates false failures that train the team to dismiss CI output; update it when adding an exclusion-worthy URL.

Where the green check ends and editorial judgment begins

What four passing jobs actually confirm - and what they cannot

A green run confirms:

No external link in the PR's Markdown files returns a 4xx or 5xx response
Every generated page has a meta description, a self-referencing canonical, and Open Graph tags
The structured data on the new post validates against Schema.org
The new post clears Core Web Vitals thresholds under lab conditions

What it does not confirm: whether the facts are correct, whether the post answers the question it sets up, or whether the prose is worth reading. CI has no opinion about those things.

This is the same division that makes automated content creation work without creating editorial risk: automate every check that has a clear pass/fail definition, leave judgment to people with context.

The split that works: CI owns technical correctness, humans own voice and facts

When a PR reaches human review with all four checks green, the reviewer does not need to wonder whether the canonical is pointing at itself or whether the link to the case study still resolves. CI answered those questions. The reviewer can focus on what CI cannot check: accuracy, voice, and whether the post actually serves the reader.

Combined with an earlier fact-checking step, Lyra verifies claims and links before opening the PR, CI gates the technical surface, and human review handles editorial judgment. All three pass before the post ships.

For teams using a PR-based AI blog writer where an agent produces the first draft, the CI gate is especially useful. The agent drafts fast, the checks run in parallel, and the reviewer sees a PR already validated on both the technical and factual axes. Internal linking automation also benefits directly: the broken-link job confirms that any new cross-links added to a post actually resolve before they ship.

A blog post and a code change share the same vulnerability: technical bugs that pass human review because humans are not built to run validators in their heads. These four GitHub Actions jobs run the validators automatically, so a green merge means what it should.
Talk to the founder → · Join the waitlist

Step by step

The short version

01
Add lychee-action for broken link checking
Create .github/workflows/blog-seo.yml. Add a job that runs lycheeverse/lychee-action on your content/ directory before next build. Create a .lycheeignore file for URLs that should be excluded, such as localhost references in code blocks.
02
Parse built HTML for meta and canonical issues
Add a job that runs next build, checks each blog page for a meta description, self-referencing canonical, and Open Graph tags, then uploads the .next/ build as an artifact for downstream jobs to download instead of rebuilding.
03
Lint JSON-LD with Schemar
Download the build artifact from the meta-tags job, serve it locally, and pass the new blog post URL to the Schemar GitHub Action (johnnyreilly/schemar@v0.1.1). Format its results array into markdown with actions/github-script, then post it as a sticky PR comment via marocchino/sticky-pull-request-comment.
04
Assert Lighthouse Core Web Vitals with treosh/lighthouse-ci-action
Download the build artifact, serve it locally, and point treosh/lighthouse-ci-action at the new post URL via .lighthouserc.json. Set budget assertions for LCP under 2500ms and CLS under 0.1 as errors.
05
Require all four jobs in branch protection
Go to repository Settings, then Branches. Add a branch protection rule for your main branch. Under Require status checks to pass before merging, add all four job names. Without this step the workflow runs but nothing actually blocks the merge.

FAQ

Frequently asked

Can GitHub Actions catch SEO bugs in blog pull requests?+

Yes. A four-job workflow can check for broken links before the build, validate meta and canonical tags in built HTML, lint JSON-LD structured data, and assert Lighthouse Core Web Vitals thresholds, all before the merge button turns green.

What is the best GitHub Action for checking broken links in Markdown?+

lychee-action is fast and purpose-built: it is written in Rust, can check hundreds of links in under a minute (the lychee project benchmarks 576 links in about 60 seconds on the analysis-tools-dev/static-analysis repository), scans Markdown and HTML files, supports a .lycheeignore file for patterns to skip, and writes results to GitHub Job Summaries.

How do you validate JSON-LD structured data in a GitHub Actions workflow?+

Schemar (johnnyreilly/schemar) wraps the Schema Markup Validator. Pass it the new blog post URL from a locally-served build and it posts pass/fail results as a sticky PR comment via marocchino/sticky-pull-request-comment, so the result stays visible as the PR evolves.

What Lighthouse thresholds should I set in CI for a blog?+

Google's Core Web Vitals targets are LCP under 2.5 seconds, CLS under 0.1, and INP under 200 milliseconds. Lighthouse cannot measure INP directly in lab conditions, so assert total-blocking-time as the proxy instead (300ms is a reasonable warning threshold). treosh/lighthouse-ci-action supports LHCI assertions that fail the build when any threshold is exceeded.

Built by the tool you're reading about

This post is the kind of thing Lyra ships on her own.

Lyra finds the topics worth ranking for, writes them in your repo's voice, fact-checks every claim, and opens a pull request scored and ready to merge. You review and hit merge. Want to see what she'd write for you? Tell us about your blog and the founder will walk through it with you.

Talk to the founder Join the waitlist

GitHub Actions SEOSEO CI ChecksBroken Link Checker GitHub ActionsJSON-LD Validation CILighthouse CI BlogTechnical SEO

Keep reading

Engineering4 min read

Automated content creation without the slop

Automated content creation that doesn't read like slop. What to automate, what to keep human, and how to add verification and review so scaled content still ranks and gets cited.

Jun 26, 2026Read →

Engineering4 min read

Semantic SEO, automated: a practical guide

Semantic SEO automation, explained. How to build topical authority with entities and clusters, and which parts of the work, like internal linking and coverage gaps, you can safely automate.

Jun 25, 2026Read →

Engineering7 min read

How AI content fact-checking actually works

AI content fact-checking, explained. How to catch hallucinated stats and dead links before they ship, and how Lyra verifies every claim and link automatically.

Jun 16, 2026Read →