Skip to content
← Back to blog
Engineering

Programmatic SEO in Next.js and Astro: a developer's guide

Generate SEO pages from data in Next.js (generateStaticParams) or Astro (Content Layer), gated by code so the set doesn't trip Google's spam filter.

By Mitrasish, Co-founderJul 2, 202614 min read
Programmatic SEO in Next.js and Astro: a developer's guide

The programmatic SEO playbook tells you the rule: one unique, verifiable data point per page, or the set doesn't survive. It doesn't tell you where that check lives in your code. This is the part after the strategy meeting, the actual generateStaticParams function, the actual Astro loader, the actual sitemap chunking, with the guardrails written as code instead of as a policy someone has to remember to follow.

If you haven't read the strategy layer yet, start with programmatic SEO for SaaS for the data-to-template model and the 2026 scaled-content-abuse crackdown for why the unique-value bar exists at all. This post assumes you already believe both and want to know what to type into app/ or src/pages/.

Why the strategy playbook isn't the same as the build

A strategy doc says "gate thin pages before they publish." Code either does that or it doesn't. The gap between the two is where most programmatic sets actually fail: the team agreed on the unique-value rule in a meeting, then the route-generation function ships every row in the dataset regardless, because nobody wired the rule into the function that decides what gets a URL.

The fix isn't a separate QA pass after the pages exist. It's making the completeness check part of the same function that returns your route list, so a page that fails the test never gets built in the first place. That's the shape of every code example below: the gate is in the data layer, not bolted on after.

The data model: one dataset, one template, one page per row

Every programmatic set reduces to the same three pieces: a dataset, a template, and a mapping from row to route. Get the dataset schema right and the framework-specific code is mechanical. Get it wrong and no amount of generateStaticParams tuning saves you.

A minimal row needs a stable slug and at least one field that exists because you produced it, not because you templated it:

ts
type Integration = {
  slug: string;
  name: string;
  category: string;
  setupSteps: string[];
  screenshotUrl: string;
  lastVerifiedAt: string; // the field that makes this page worth indexing
};

lastVerifiedAt (or a real screenshot, a live price, a computed comparison) is the field the strip-the-wrapper test checks for. If a row is missing it, the page is a template with nothing behind it, and the code below should never generate a route for it.

Generating pages in Next.js: generateStaticParams, dynamicParams, and partial prerendering

generateStaticParams is the App Router function for statically generating multiple versions of a dynamic route from an external data source at build time. It explicitly replaces getStaticPaths from the Pages Router, and it's the whole engine of programmatic SEO in Next.js: one function, one dataset query, one page per row.

tsx
// app/integrations/[slug]/page.tsx
import { notFound } from "next/navigation";
import { getIntegrations, getIntegration } from "@/lib/integrations";

function hasUniqueValue(integration: Integration) {
  return (
    integration.setupSteps.length >= 3 &&
    Boolean(integration.screenshotUrl) &&
    Boolean(integration.lastVerifiedAt)
  );
}

export async function generateStaticParams() {
  const integrations = await getIntegrations();
  return integrations.filter(hasUniqueValue).map((i) => ({ slug: i.slug }));
}

export const dynamicParams = false;

export default async function Page({
  params,
}: {
  params: Promise<{ slug: string }>;
}) {
  const { slug } = await params;
  const integration = await getIntegration(slug);
  if (!integration) notFound();
  return <IntegrationPage integration={integration} />;
}

The gate is hasUniqueValue, run inside generateStaticParams itself. A row that fails it never becomes a route, which is a stronger guarantee than a noindex tag on a page that still technically exists and still gets crawled.

Prerender everything at build time vs. a subset now and the rest on demand

You don't have to choose between "build every page now" and "build nothing until someone visits it." Next.js explicitly supports a middle option: return a subset from generateStaticParams, leave dynamicParams at its default true, and the rest render (and cache) the first time they're requested.

tsx
export async function generateStaticParams() {
  const integrations = await getIntegrations();
  // Prerender the 500 integrations with real search demand now.
  // dynamicParams stays true, so the long tail renders on first visit.
  return integrations
    .filter(hasUniqueValue)
    .slice(0, 500)
    .map((i) => ({ slug: i.slug }));
}

This matters for build time on a set in the thousands: you don't wait for every row to prerender before shipping, and rows with lower expected traffic still get built the moment someone (or a crawler) actually asks for them. One caveat worth knowing before you rely on it: during ISR revalidation, generateStaticParams is not called again, so a dataset refresh that adds new rows needs a fresh build, or needs dynamicParams left true so the new rows can still render on first visit between builds.

dynamicParams = false as a hard gate against thin, unplanned pages

dynamicParams = false is the enforcement mechanism, not a performance setting. With it set, only the paths generateStaticParams returned are served; any other value 404s instead of silently rendering. Without it, anyone who guesses or scrapes a plausible slug gets a live, indexable page that never passed your unique-value filter, because Next.js will happily render and cache it on demand.

For a fully gated set, that means dynamicParams = false plus a route list built exclusively from rows that already passed hasUniqueValue. The dataset query becomes the actual definition of what's live on the site, enforced by the framework, not by a checklist someone has to remember to run.

Generating pages in Astro: Content Layer loaders, getStaticPaths, and Server Islands

Astro's model is the same shape with different names. The Content Layer API decouples a collection from where its data lives: one collection can be local Markdown, another can call a REST API, another can read a database, all queried through the same type-safe getCollection() / getEntry() calls. getStaticPaths() then reads that collection and returns one route per entry, same as generateStaticParams.

ts
// src/content.config.ts
import { defineCollection, z } from "astro:content";
import { integrationsLoader } from "./loaders/integrations";

const integrations = defineCollection({
  loader: integrationsLoader({ apiUrl: import.meta.env.INTEGRATIONS_API_URL }),
  schema: z.object({
    name: z.string(),
    category: z.string(),
    setupSteps: z.array(z.string()),
    screenshotUrl: z.string().url(),
    lastVerifiedAt: z.string(),
  }),
});

export const collections = { integrations };
astro
---
// src/pages/integrations/[slug].astro
import { getCollection } from "astro:content";

export async function getStaticPaths() {
  const integrations = await getCollection(
    "integrations",
    (entry) => entry.data.setupSteps.length >= 3 && Boolean(entry.data.screenshotUrl)
  );
  return integrations.map((entry) => ({
    params: { slug: entry.id },
    props: { integration: entry },
  }));
}

const { integration } = Astro.props;
---

The filter callback passed to getCollection() is the Astro equivalent of hasUniqueValue: the unique-value test runs before getStaticPaths() ever returns a route for that row.

Custom loaders for API- or database-backed collections (not just Markdown)

The built-in glob() and file() loaders cover local Markdown and JSON. A programmatic set backed by a database or an API needs a custom loader, which is a plain object with a name and a load() function that populates a store:

ts
// src/loaders/integrations.ts
import type { Loader } from "astro/loaders";

export function integrationsLoader({ apiUrl }: { apiUrl: string }): Loader {
  return {
    name: "integrations-loader",
    load: async ({ store, meta, parseData, logger }) => {
      const lastSync = meta.get("lastSync");
      const res = await fetch(`${apiUrl}?since=${lastSync ?? ""}`);
      const rows = await res.json();

      logger.info(`syncing ${rows.length} integrations`);
      for (const row of rows) {
        const data = await parseData({ id: row.slug, data: row });
        store.set({ id: row.slug, data });
      }
      meta.set("lastSync", new Date().toISOString());
    },
  };
}

meta is a key-value store scoped to the collection, meant for sync tokens and last-modified times, so a loader can make conditional requests instead of re-fetching the whole dataset on every build. That matters directly at programmatic scale: Astro 5's Content Layer claims up to 5x faster Markdown builds, 2x faster MDX builds, and 25 to 50% lower memory use than the legacy content collections API, a difference you'll feel once a set gets into the thousands of pages.

One constraint worth internalizing either way: the Content Layer's data store is populated only at build time. A deployed static site cannot mutate it, the same as generateStaticParams's fully-static path in Next.js. A dataset refresh needs a rebuild, full stop, unless you're using Astro's on-demand rendering for that route.

When a Server Island belongs on an otherwise static page

Most fields on a programmatic page (name, category, setup steps, the verified screenshot) are exactly the kind of thing a static Content Layer collection should render: stable, cacheable, safe to prerender. A live status check or a per-visitor detail is not, and forcing the whole page to render on demand just for that one field throws away the performance win of static generation.

Astro's Server Islands solve that split. A component marked server:defer renders on the server after the static shell has already shipped, with fallback content in the meantime:

astro
---
// src/pages/integrations/[slug].astro
import LiveStatus from "../../components/LiveStatus.astro";
---
<article>
  <h1>{integration.data.name} integration</h1>
  <p>Verified {integration.data.lastVerifiedAt}</p>

  <LiveStatus slug={integration.id} server:defer>
    <p slot="fallback">Checking live status...</p>
  </LiveStatus>
</article>

The page around it, the part built from your Content Layer collection, stays fully static and fast. Only the island that genuinely needs fresh data pays the on-demand cost, which is the right trade for a set where 99% of the content per page doesn't change between builds.

The unique-value test, enforced in code, not just in principle

Google's spam policy defines scaled content abuse as pages "generated for the primary purpose of manipulating search rankings and not helping users," and is explicit that this applies "no matter how it's created," by automation, humans, or a combination of the two. The policy doesn't care whether your unique-value rule exists in a Notion doc. It cares whether the page that got indexed actually has something in it.

Which means the filter has to run in the same function that decides what gets a route, both in generateStaticParams and in getStaticPaths, the same hasUniqueValue check shown above in both frameworks. A row that fails it should not get a URL at all if you can help it; if you can't fully exclude it (say, a row that's incomplete today but will be complete next sync), noindex it explicitly instead of leaving it to chance:

tsx
export async function generateMetadata({
  params,
}: {
  params: Promise<{ slug: string }>;
}): Promise<Metadata> {
  const { slug } = await params;
  const integration = await getIntegration(slug);
  const isThin = !integration || !hasUniqueValue(integration);

  return {
    title: integration ? `${integration.name} integration` : "Integration",
    robots: { index: !isThin, follow: true },
  };
}

generateMetadata is the code-level mechanism Next.js provides for setting robots: { index: false } per route. Astro's equivalent is a conditional <meta name="robots"> tag in the page's frontmatter block, driven by the same hasUniqueValue check. Either way, a thin row that's still under construction gets a working page nobody indexes, instead of contributing to a pile of pages Search Console will eventually mark Crawled - currently not indexed, Google's own term for a page it crawled and chose not to index, possibly permanently.

Sitemap and internal linking generated from the same dataset

The sitemap and the route list have to come from the same query, or they drift. If your sitemap generator runs a different filter than generateStaticParams, you'll either submit URLs that 404 or omit pages that are actually live. Query the dataset once, apply hasUniqueValue once, and feed both the route generator and the sitemap from that result.

app/sitemap.ts, generateSitemaps, and the 50,000-URL sitemap limit

A single XML sitemap file is capped at 50,000 URLs by the sitemap protocol Google enforces. Next.js's own generateSitemaps documentation codes directly against that number, chunking a product table into ranges of 50,000 rows per generated file, each served at a URL like /product/sitemap/[id].xml:

ts
// app/integrations/sitemap.ts
import type { MetadataRoute } from "next";
import { getIntegrationsCount, getIntegrations } from "@/lib/integrations";

export async function generateSitemaps() {
  const count = await getIntegrationsCount();
  return Array.from({ length: Math.ceil(count / 50000) }, (_, id) => ({ id }));
}

export default async function sitemap({
  id,
}: {
  id: number;
}): Promise<MetadataRoute.Sitemap> {
  const start = id * 50000;
  const rows = await getIntegrations({ offset: start, limit: 50000 });
  return rows
    .filter(hasUniqueValue)
    .map((i) => ({
      url: `https://example.com/integrations/${i.slug}`,
      lastModified: i.lastVerifiedAt,
    }));
}

If your set genuinely grows past 50,000 URLs, know the next ceiling too: a sitemap index file can reference up to 50,000 individual sitemaps, and a single Search Console property can submit up to 500 sitemap index files. Almost no programmatic set needs to think about that second number, but it's the actual wall if yours does.

Astro's official @astrojs/sitemap integration generates the sitemap automatically from your routes, including dynamic ones built via getStaticPaths(), and chunks past its default entryLimit of 45,000 URLs into a sitemap-index.xml plus numbered sitemap-0.xml files:

js
// astro.config.mjs
import { defineConfig } from "astro/config";
import sitemap from "@astrojs/sitemap";

export default defineConfig({
  site: "https://example.com",
  integrations: [
    sitemap({
      filter: (page) => !page.includes("/integrations/draft-"),
    }),
  ],
});

The filter option is where the same unique-value logic belongs again, so a page you've excluded from getStaticPaths (or noindexed) doesn't still ride along in the sitemap. For the internal-linking half of this, build the hub-and-spoke links (a category page linking to every integration in it, each integration linking back to its category and to two or three siblings) from the same collection query, not a hand-maintained list. Internal linking automation covers the anchor-diversity and orphan-prevention rules that apply once you're generating links at this scale, which matter more, not less, once a template is producing them for you.

Guardrails that keep the set out of the scaled-content-abuse bucket

Everything above composes into a small set of rules, each enforceable in code:

GuardrailWhere it lives
Unique-value testInside generateStaticParams / getStaticPaths, filtering the route list itself
noindex for thin-but-not-excludable rowsgenerateMetadata's robots field, or an equivalent conditional meta tag in Astro
Sitemap matches the live route setBoth built from the same dataset query and filter, never a separate list
50,000-URL sitemap ceilinggenerateSitemaps chunking in Next.js, entryLimit/chunks in @astrojs/sitemap
No dead or unplanned routesdynamicParams = false in Next.js; a filtered getStaticPaths() in Astro

None of these are exotic. They're the ordinary levers both frameworks already expose for dynamic routing, pointed at the specific failure mode Google's policy targets. The same failure mode has a quieter cost even when Google never penalizes it directly: a poorly gated programmatic set competing against your own stronger editorial pages for the same terms is keyword cannibalization at scale, one template accidentally burying the guide you actually wanted ranking.

Pair the code gates with automated checks that run on every pull request: a broken-link scan, a canonical and meta check on the built HTML, JSON-LD validation. GitHub Actions SEO checks covers the four-job version of that workflow, which catches the class of bug that doesn't fail a type check but still ships quietly, exactly the risk profile of a batch of generated pages.

Where a human review gate plugs into this pipeline

Code gates decide what's structurally allowed to exist. They don't decide whether the content on a page is actually correct, and Google's scaled-content-abuse policy doesn't distinguish between "generated a thin page" and "generated a page with a wrong number on it." Both fail the same "helping users" test the policy is built around. A generation pipeline still needs a step that checks the facts and a person who can reject a batch before it merges.

That's a fact-check pass, not a vibe check: pull every claim and link a generated batch produces and confirm each one against a live source before publish. How AI content fact-checking works covers the mechanics of running that as a hard gate rather than a confidence score the generator assigns itself.

If your blog already lives in a Git repo, the natural place for that review is the pull request the batch opens, the same review surface you already use for code. That's the model Lyra runs on: she writes in your repo's existing voice, fact-checks every claim and link before anything ships, and opens a pull request you review and merge yourself, nothing auto-publishes. For a programmatic set specifically, the same discipline applies at the batch level: score it, fact-check it, and require a human "yes" before the pages go live. If you want to talk through what that review gate would look like on your dataset, talk to the founder or join the waitlist.

Programmatic SEO in Next.js or Astro survives on the same rule either way: gate the page in the function that generates it, not after. Lyra applies that same discipline to every post she writes, fact-checked and scored before the pull request ever reaches you.

Talk to the founder → · Join the waitlist

Step by step

The short version

  1. 01

    Model the dataset before the template

    Define the row shape first: every entity needs a stable slug and at least one field that only exists because you produced it (a live number, a screenshot, a verified date). If that field is missing, the page has no reason to exist.

  2. 02

    Wire the dataset into generateStaticParams or a Content Layer loader

    In Next.js, generateStaticParams reads the dataset and returns one params object per row. In Astro, a Content Layer loader populates a collection and getStaticPaths() reads it. Both replace hand-written route files with one template plus a query.

  3. 03

    Filter the route list on the unique-value test before it ships

    Run the same completeness check that decides indexability inside the function that returns your route list, so an incomplete row never gets a URL in the first place, not just a lower search visibility.

  4. 04

    Gate unplanned URLs with dynamicParams = false

    Set dynamicParams = false so only the rows you explicitly generated exist as pages. Anything else 404s. This turns your dataset query into the actual, enforced list of what's live, not a suggestion.

  5. 05

    Generate the sitemap from the same query, chunked at the platform limit

    Build app/sitemap.ts (with generateSitemaps for large sets) or configure @astrojs/sitemap from the identical dataset query used to render the pages, so the sitemap and the live route list can never drift apart.

  6. 06

    Route every generated batch through a human review gate

    Score and review a new batch of pages before merging, the same way you'd review a code change. Automated generation is fine; automated publishing with nobody able to reject a bad batch is what Google's policy actually penalizes.

FAQ

Frequently asked

How do you generate SEO pages from data in Next.js?+

Write a generateStaticParams function that reads your dataset (an API, a database, a CMS) and returns one params object per row. Next.js statically renders one page per object at build time. Set dynamicParams = false in the same route segment so any slug not in that list 404s instead of rendering on demand.

What does dynamicParams = false actually do?+

It restricts a dynamic route to only the paths generateStaticParams returned. Without it, Next.js will render and cache any other value for that segment the first time someone requests it, which means an unplanned or unreviewed page can go live purely because a URL got typed or guessed. With it, generateStaticParams becomes the single gate for what exists.

Does Astro support programmatic SEO the same way Next.js does?+

Yes, with different building blocks. Astro's Content Layer API lets a collection load from Markdown, an API, or a database through a loader, and getStaticPaths() in a dynamic route reads that collection to generate one page per entry, the same one-dataset-one-template-one-page model as generateStaticParams.

How do you keep a programmatic page set out of Google's scaled content abuse policy?+

Enforce the unique-value test in the code that generates the route list, not as a design guideline: filter or noindex any row missing the field that makes the page worth indexing before it ever reaches generateStaticParams or getStaticPaths. Pair that with a human review gate on the batch, since Google's policy explicitly targets low-value pages 'no matter how it's created.'

What is the sitemap URL limit for a large programmatic page set?+

50,000 URLs per sitemap file, per the sitemap protocol Google enforces. Next.js handles this with a generateSitemaps function that chunks output into files served at /product/sitemap/[id].xml; Astro's @astrojs/sitemap integration does it automatically past its entryLimit (45,000 by default) or via a custom chunks function.

Built by the tool you're reading about

This post is the kind of thing Lyra ships on her own.

Lyra finds the topics worth ranking for, writes them in your repo's voice, fact-checks every claim, and opens a pull request scored and ready to merge. You review and hit merge. Want to see what she'd write for you? Tell us about your blog and the founder will walk through it with you.

Programmatic SEO Next.jsProgrammatic SEO AstrogenerateStaticParamsAstro Content LayerGenerate SEO Pages From DataTechnical SEO