Claude API cost per blog post: the real token math
Claude API cost per blog post, broken down into research, draft, review, and iteration tokens, at 2026 Anthropic pricing, to budget the BYOK model.
Claude API cost per blog post, broken down into research, draft, review, and iteration tokens, at 2026 Anthropic pricing, to budget the BYOK model.

A blog post written through a research-draft-review-iteration pipeline on Claude Sonnet 5 costs somewhere between $0.60 and $1.10 in raw API tokens at current Anthropic pricing, before prompt caching brings it down further. That number will not show up in any comparison of Jasper, Surfer, or Byword's subscription tiers, because none of those tools bill by the token. They bill by the seat or the document quota. If you bring your own Anthropic key, the bill you actually see is metered, itemized, and nothing like a flat $49-$999/mo line item. This is the math behind that bill, stage by stage.
A single post is not one API call. It is a pipeline of stages, each one a separate conversation with the model, each one spending input and output tokens at different rates depending on what it is doing. Understanding the cost means breaking the post apart the way the pipeline actually runs it, not treating it as one lump "write me a blog post" prompt.
Research. The model searches for the topic, reads competing pages, and pulls source material into context. Every search costs $10 per 1,000 queries on top of standard token pricing, and every fetched page adds input tokens: Anthropic's own numbers put an average 10 kB web page at about 2,500 tokens and a 100 kB documentation page at about 25,000 tokens. Six or seven fetched sources for one post adds up fast, and none of it produces output text yet.
Draft. The model reads the target repo's house style, its existing posts (to match voice), and the research notes, then writes the post. This is where output tokens dominate: the 2026 industry-average blog post runs about 1,427 words, up from 1,236 in 2023, and the SEO-recommended range sits at 1,500-2,000 words, in line with Backlinko's finding that top-ranking pages average 1,447 words. That word count becomes tokens, plus markdown formatting, frontmatter, and any tool calls used to write the file.
Review. A second, independent pass reads the draft, re-fetches every external link and every cited source to confirm it says what the draft claims, and checks pricing or statistics against a live page if one is configured. This stage looks a lot like research in token shape (fetch-heavy input, light output) but starts from a bigger base, because it has to load the whole draft first. We cover the mechanics of this pass in how AI content fact-checking actually works.
Iteration. The draft goes back to the writer with specific line comments to fix. Each round reloads the draft, the repo context, and the review notes, then writes a smaller, targeted output. This is the stage with the most variance, because how many rounds a post needs is the one thing in this list that is not fixed by post length or source count.
The naive way to estimate cost is (input tokens x input price) plus (output tokens x output price). That undercounts what actually lands on the invoice in three specific ways.
First, an agentic pipeline is multi-turn, not one call. Each turn in a tool-use loop resends the accumulated conversation, so a stage that runs eight or ten turns is not paying for the final input once, it is paying for a growing context on every turn until that stage ends. This is the single biggest reason a "four-stage" post looks cheap on paper and costs more in practice: it is really dozens of API calls, not four.
Second, tools carry their own token overhead before they do anything. A tool-use system prompt adds roughly 350-475 tokens per Sonnet 5 request depending on tool choice mode, the bash tool adds 245 input tokens per call, and the text editor tool adds 700. None of that is "the post," and all of it bills at standard input rates.
Third, the newer tokenizer used by Sonnet 5 and Opus 4.7 and later produces about 30% more tokens for the same text than the tokenizer used by Sonnet 4.6 and earlier models, according to Anthropic's pricing documentation. A lower per-token rate does not automatically mean a lower per-post cost if the same English text now counts as more tokens.
Here is an illustrative run for one roughly 1,700-word post through Lyra's own four-stage pipeline, using Sonnet 5's current published rates: $2 per million input tokens, $10 per million output tokens, and $10 per 1,000 web searches. These are representative totals aggregated across every turn in each stage, not a single call, since that is what a real agentic run looks like.
| Stage | Approx. input tokens | Approx. output tokens | Tool fees | Stage cost |
|---|---|---|---|---|
| Research | 60,000 | 4,000 | ~6 searches ($0.06) | ~$0.22 |
| Draft | 90,000 | 8,000 | none | ~$0.26 |
| Review | 70,000 | 3,000 | none (web fetch has no fee) | ~$0.17 |
| Iteration (1 round) | 50,000 | 5,000 | none | ~$0.15 |
| Total | 270,000 | 20,000 | $0.06 | ~$0.80 |
Two iteration rounds instead of one pushes the total to roughly $0.95. Three rounds, which is closer to what a first-draft-on-a-hard-topic post can need, lands closer to $1.10. None of these numbers include prompt caching, which is the first lever worth pulling, covered below. For scale, this sits well inside what Anthropic itself reports for agentic Claude usage: across enterprise Claude Code deployments the average cost is about $13 per developer per active day and $150-250 per month, with 90% of users spending under $30 on an active day. A single blog post pipeline run is a small fraction of what one working session already costs.
A subscription content tool sells you a tier: a fixed number of documents or seats per month for a fixed price, regardless of how many tokens the underlying model actually burns to produce them. A BYOK tool does the opposite. You connect your own Anthropic key, the tool never marks up the usage, and Anthropic bills you directly at the published per-token rate, itemized in your own console. Publish four posts in a month and you pay for four posts' worth of tokens. Publish none and you pay nothing. There is no quota to burn through or waste, because there is no quota, just usage. We laid out the reasoning behind building an AI blog writer for developers around this model rather than a bundled one.
None of the bulk or optimization tools price this way, because none of them are BYOK. Byword's plans start at $99/month, scaling up through higher tiers as article volume increases; we break the full tier table down, sourced, in our Byword vs Jasper comparison. Jasper's Pro plan runs $69/month monthly or $59/month billed annually for a single seat, with a custom-priced Business tier above it. Surfer SEO spans five tiers from Discovery at $49/month up to Enterprise at $999/month, priced by document and workspace limits rather than tokens; we cover that full range in Surfer SEO alternatives for teams whose blog lives in Git. (Figures are current as of this post's date and can change; check each vendor's pricing page before you buy.)
That flat price bundles more than model inference. It covers the SaaS product itself: the editor, the CMS integrations, keyword and SERP tooling, support, and the vendor's own margin on top of whatever the model actually costs them to run. "When that $15 drops to $2, the same $24 looks less like a product price and more like a markup," Ritesh Shrivastav wrote about AI products more broadly, describing exactly this dynamic: as underlying inference costs fall, a flat subscription price that was set when tokens were expensive increasingly looks like a wrapper tax rather than a fair reflection of what the product costs to run today.
Neither model is free of trade-offs. A metered BYOK bill is auditable down to the token: you can open your Anthropic console and see exactly what a given post cost, broken out by input, output, and cache. A flat subscription is predictable and simple to budget, one number, once a month, but you cannot separate what portion of that $99 or $299 is model spend versus product margin, and the price does not shrink when the underlying model gets cheaper. Predictability is worth something. So is knowing exactly what you are paying for.
A blog pipeline resends a lot of the same context on every turn: the house-style guide, the repo's CLAUDE.md, the tool definitions, the growing draft. Prompt caching is built for exactly that pattern. Anthropic's pricing sets a 5-minute cache write at 1.25x the base input price and a 1-hour cache write at 2x, while a cache read (a hit) costs 0.1x base input for the same duration as the write it followed. That means a 5-minute cache pays for itself after a single subsequent read, and a 1-hour cache pays for itself after two reads. A four-stage pipeline that touches the same house-style and repo context repeatedly across dozens of turns is close to the ideal case for caching, because most of those repeated reads move from full input price down to a tenth of it.
The Batch API cuts both input and output token prices in half for asynchronous processing, which sounds like the obvious lever until you look at what "asynchronous" means here: submit a batch of requests, get results back later, no live back-and-forth. A blog pipeline that opens a pull request, waits for a human to look at it, and iterates on live review comments is the opposite of that shape. It needs a response now, not queued behind a batch job that might not clear for hours. The discount is real, and it fits bulk offline jobs well. It does not fit a pipeline whose whole point is a tight, interactive loop between the model and a human reviewer.
Research, draft, and review scale with post length and source count, which do not move much post to post. Iteration count is different: it depends entirely on how clean the first draft is and how much the review pass finds wrong with it. One clean round of fixes barely moves the total. Three or four rounds, chasing down claims that keep failing verification or links that keep resolving to the wrong page, can double or triple the iteration stage's share of the bill on its own. The lever here is not a pricing setting, it is draft quality: a writer that gets voice, sourcing, and linking right the first time spends less on the rounds that clean up after it.
Multiply the worked example above by a realistic monthly cadence and the gap to a flat subscription becomes obvious fast. Eight posts a month at roughly $0.80-$1.10 each lands in the $6-$9 range in raw Claude spend, before caching brings it down further, against Byword's $99/month floor, Jasper's $69/month per seat, or Surfer's $49/month starting tier. That is not a claim that BYOK tools always come out cheaper. It is a claim that the two bills measure completely different things: one is metered model usage you can read off an invoice, the other is a flat price for a bundled product where the model is one line item among many you cannot see. If you already pay for an Anthropic key, that distinction is the one worth doing the math on before comparing sticker prices.
Lyra runs on your own Anthropic key, encrypted at rest and never marked up, so the token math above is the actual bill, not a bundled subscription tier.
FAQ
For a roughly 1,700-word post run through a four-stage pipeline (research, draft, review, and one round of iteration) on Claude Sonnet 5 at current pricing, the raw token cost lands under a dollar, often in the $0.60-$1.10 range before prompt caching. It scales with post length, how many sources get fetched, and how many review-and-fix rounds the draft needs, so a post that needs three iteration rounds instead of one can roughly double that stage's share.
Yes. Sonnet 5 is priced at $2 per million input tokens and $10 per million output tokens through August 31, 2026, versus $5 input and $25 output for Opus 4.8. For a drafting and review workload, Sonnet 5 costs a fifth of Opus per token on both sides of the ledger, which is why it is the default model for a blog-writing pipeline rather than a coding-heavy agent workload.
Meaningfully, if the pipeline reuses the same system prompt, house-style guide, and repo context across turns. A cache read costs a tenth of the base input price, so a 5-minute cache write (1.25x base) pays for itself after a single read, and a 1-hour cache write (2x base) pays for itself after two reads. A blog pipeline that resends the same CLAUDE.md and voice guide on every turn of every stage is exactly the repeated-context pattern caching is built for.
A BYOK tool bills you at Anthropic's per-token rate with no markup, so your cost moves with your usage and shows up itemized on your Anthropic invoice. A subscription tool charges a flat monthly tier, such as Surfer's $49-$999/mo or Byword's $99-$999/mo, that bundles the model cost with the SaaS product (CMS integrations, keyword tooling, support) into one number you cannot separate. BYOK is metered and auditable; a subscription is fixed and opaque about what share of it is model spend.
Built by the tool you're reading about
Lyra finds the topics worth ranking for, writes them in your repo's voice, fact-checks every claim, and opens a pull request scored and ready to merge. You review and hit merge. Want to see what she'd write for you? Tell us about your blog and the founder will walk through it with you.
Keep reading

GitHub App permissions decide what an AI writer can touch in your repo: which scopes to grant, which to refuse, and how to audit or revoke access.

A concrete editorial review process for AI content: grounded sourcing, separated writer/checker roles, verified links, and a pre-publish check for E-E-A-T.

GitHub Actions SEO checks for blog PRs: four automated jobs that catch broken links, bad canonicals, invalid JSON-LD, and image-driven Core Web Vitals failures.