pricing
Diego Aguirre19 min read22 views

How to price AI features in SaaS without blowing your gross margin (June 2026)

A 1,500 customer SaaS shipped AI features and watched gross margin fall from 81% to 47%. We walk three pricing ladders and the COGS math that got it back to 73% in 8 weeks.

Updated on June 21, 2026

Editorial illustration of a layered SaaS pricing ladder rising over a cost foundation with subtle tech-brand glyphs at the base, deep forest green and cream palette
Editorial illustration of a layered SaaS pricing ladder rising over a cost foundation with subtle tech-brand glyphs at the base, deep forest green and cream palette
On this page

Quick Answer

To price AI features in SaaS without blowing your gross margin in 2026, treat inference as a variable cost and pick a pricing ladder that absorbs token-cost variance: flat-rate with a hard usage cap, seat plus AI overage, or prepaid AI credits. A typical $29 per month SaaS shipping a Claude Sonnet 4.6 backed feature carries about $0.62 of inference COGS per active customer at median use, but roughly 14x that for the top 1% by usage. The pricing model that survives is the one whose effective rate scales with that variance, not the one with the cleanest sales pitch.


Quill went from 81% to 47% gross margin in 11 weeks. Here is the bill.

In March 2026 a 1,500 customer B2B SaaS we will call Quill shipped two AI features: an inline writing assistant and a weekly summary generator. Stripe receipts looked great for a week. Then the AWS bill landed. Then the Anthropic bill landed. Then the founder ran the math on a Sunday afternoon and lost the afternoon.

Before AI features, Quill ran at 81% gross margin on a $29 per month flat plan. Eleven weeks after launch, gross margin sat at 47%. Same revenue per customer, completely different unit economics. The team did not undercount their model costs. They counted only the median customer.

Here is what the median actually cost them, per active seat, per month, in June 2026.

Scroll to see more

Cost line$ per active customer per month
Hosting + database + CDN (existing)$1.10
Stripe processing on $29$1.14
Email + observability + misc SaaS$0.86
Anthropic logo
Claude Sonnet 4.6 inference (writing assistant, median use)
$0.42
Anthropic logo
Claude Sonnet 4.6 inference (weekly summary, median use)
$0.20
Total median COGS$3.72
Median gross margin on $29 ARPU87%

The median customer looked healthier than before. That is the trap. Gross margin is a portfolio number; you do not bill the portfolio, you bill individuals. The top 1% of Quill users by AI use generated $8.40 of inference cost a month. The top 0.1% generated $26.10. With a flat $29 plan, the top 0.1% had negative contribution margin from the day they hit the assistant.

Quill had 1,500 customers. Two hundred of them sat above the median use line. Twelve sat above the 99th percentile line. Those twelve customers ate the entire AI inference budget.

Then a Loom from a power user got shared in a Slack community. New signups came in pre-trained to drive the assistant hard. The 99th percentile cohort grew. Margin tanked.

This is not a Quill problem. It is the live r/SaaS confession: gross margins dropped from 78% to 52% in one quarter after we shipped AI features (Reddit r/SaaS, February 2026). Traditional SaaS enjoyed 70 to 90% gross margins; AI native businesses often land below 60% (Reddit r/ideavalidation, May 2026). The cause is the same in every thread: flat pricing meets variable token cost meets one heavy user.


Why does shipping AI features collapse SaaS gross margin?

Three reasons, all arithmetic.

1. Inference cost is the first variable COGS most SaaS teams have ever booked. Pre-AI, almost every cost line was either fixed or sub-linear: storage scaled cheap, compute auto-scaled cheap, Stripe was a known cut. AI inference scales linearly with usage and the slope depends on the user, not the seat. That is variance you have never modeled.

2. The price-cost spread is set per token, not per seat. A 4,000 token Sonnet 4.6 call runs about $0.075 at June 2026 list ($3 input plus $15 output per million tokens, ~70/30 in/out split). Run it 200 times in a session and you spent $15. The seat fee never moved.

3. The usage distribution is not normal, it is power law. In every customer dataset we have seen for AI features, the top 10% of users drive 60 to 80% of inference. Modeling on the mean is not just inaccurate, it is the wrong shape. You need the cost distribution, not the cost average.

The metric we use internally for this is AI Adjusted Gross Margin (AGM): non-AI infra COGS treated as fixed, AI inference COGS treated as a per-customer variable that you report at the median, the 90th, and the 99th percentile. If AGM at the 99th percentile is negative, your pricing is renter, not owner.


What does an AI feature actually cost per customer in 2026?

Here is the inference COGS for one writing assistant call across the three models most SaaS teams pick in June 2026. Assume 4,000 tokens per call (1,200 in, 2,800 out, a realistic split for inline generation).

Scroll to see more

Model$ per million in / outCost per 4,000-token callCost per 50 calls (1 active customer per month, light use)
Anthropic logo
Claude Haiku 4.5
$1 / $5$0.015$0.75
Anthropic logo
Claude Sonnet 4.6
$3 / $15$0.046$2.30
OpenAI logo
GPT-5.4
$2.50 / $15$0.045$2.25
OpenAI logo
GPT-5.5
$5 / $30$0.090$4.50

Vendor pricing as of June 2026, sources: anthropic.com/pricing and openai.com/api/pricing. Numbers are list; prompt caching and batch APIs can knock 30 to 80% off these for repeated context, which is its own essay (and a related teardown on cursor-vs-claude-code-real-30-day-bill-2026 covers how that compresses bills at the team scale).

Now extend to the Quill customer distribution. A 1,500 customer SaaS with the median = 50 calls and a power-law tail produces the following monthly inference bill per customer cohort.

Scroll to see more

CohortCalls / monthSonnet 4.6 COGSGPT-5.5 COGSHaiku 4.5 COGS
50th percentile (median)50$2.30$4.50$0.75
90th percentile220$10.12$19.80$3.30
99th percentile600$27.60$54.00$9.00
99.9th percentile1,400$64.40$126.00$21.00

At a $29 flat plan, the 99.9th percentile customer running Sonnet 4.6 costs you more than two months of revenue, every month. The same customer on Haiku 4.5 costs you less than a tank of gas.

That is your first lever. Before pricing, route. A "Haiku first, Sonnet only when it earns it" routing policy takes the 99.9th cohort from "negative margin" to "fine on flat-rate" for most use cases.

But routing only takes you so far. The remaining variance still has to land somewhere. That is what pricing does.


How do the three pricing ladders compare on a $29 base?

Three ladders, three risk shapes, walked end to end on the Quill customer distribution.

Ladder A: Flat-rate with hard usage cap

Plan: $29 per month, capped at 200 calls. Above 200, the feature simply stops until the next billing cycle.

Scroll to see more

CohortEffective priceCOGS (Sonnet 4.6)Margin
50th$29$2.3092%
90th$29$10.1265%
99th$29 (capped)$9.20 (200 calls)68%
99.9th$29 (capped)$9.20 (200 calls)68%

Pros: simplest to sell; protects margin floor; cap is enforceable in code; cancels the power-law tail in one line.

Cons: power users feel punished, churn risk on the top 5%; sales team gets pushback "why are we paying full price for a feature we cannot use after the 14th"; engineering needs a clean, visible meter or you create a support headache.

Quill picked this ladder. Margin floor went from negative to 68% on the heaviest cohort. Churn on the top 5% rose from 1.8% to 4.1% a month, painful but net positive on contribution margin.

Ladder B: Seat $25 + AI overage at $0.02 per call after 100

Plan: $25 per month per seat, includes 100 calls. Each additional call billed at $0.02.

Scroll to see more

CohortEffective priceCOGS (Sonnet 4.6)Margin
50th$25 (under quota)$2.3091%
90th$25 + $2.40 = $27.40$10.1263%
99th$25 + $10.00 = $35.00$27.6021%
99.9th$25 + $26.00 = $51.00$64.40-26%

Pros: power users feel agency, not punishment; the seat fee covers your fixed costs; small overage bills do not create dunning failures.

Cons: 99th and 99.9th still bleed on Sonnet 4.6 if the overage rate is too low; the overage rate has to be at least 1.4x your inference COGS to stay positive at the top of the distribution. Set the overage at $0.05 per call and the 99.9th customer pays $95, margin goes positive, conversion on overage rate drops because $0.05 reads as "expensive" in a checkout summary.

This is the ladder

Notion logo
Notion runs with Notion AI ($10 per seat add-on, soft fair-use limit) and
GitHub logo
GitHub Copilot Business runs with the $19 per developer per month structure (GitHub pricing, June 2026). Both companies have leverage to absorb the tail with corporate margins; a 1,500 customer indie SaaS does not.

Ladder C: Prepaid AI credits ($29 base + $20 credit packs)

Plan: $29 per month for the SaaS core. AI features run on credits. Each Sonnet 4.6 call burns 4 credits. A $20 credit pack = 500 credits = 125 Sonnet calls.

Scroll to see more

CohortEffective priceCOGS (Sonnet 4.6)Margin
50th$29 (no pack needed at trial credits)$2.3092%
90th$29 + 1 pack ($20) = $49$10.1279%
99th$29 + 2 packs ($40) = $69$27.6060%
99.9th$29 + 5 packs ($100) = $129$64.4050%

Pros: margin floor is the highest of the three across every cohort; credits expire monthly, which kills hoarding; power users self-select into bigger packs voluntarily; one-shot Stripe charges, no dunning risk on overage failure.

Cons: highest friction at first conversion ("what is a credit?"); requires a real credit ledger in your billing system; credit transfer between seats inside enterprise accounts is its own UX problem.

This is the ladder

Midjourney favicon
Midjourney has used since 2023 (GPU minutes) and the one
Cursor logo
Cursor uses for its Pro+ and Ultra plans. The credit abstraction is hard to teach and worth the effort: it is the ladder that most reliably keeps AGM at the 99th percentile above zero.

Side-by-side at the 99th and 99.9th percentile

Scroll to see more

Ladder99th margin99.9th marginSales friction
A. Flat with cap68%68%Low
B. Seat + $0.02 overage21%-26%Medium
B. Seat + $0.05 overage58%40%Medium-High
C. Prepaid credits60%50%High at first checkout

If you only ship one number from this article into your pricing meeting: Ladder A protects the floor cheapest, Ladder C protects the floor highest, Ladder B is the most flexible but only if you set the overage rate at 1.4x to 1.7x of your worst-case inference COGS.


Which competitors are running each ladder right now?

A short tour through what is shipping in June 2026, by ladder.

Scroll to see more

CompanyLogoProductLadderWhat you pay
GitHub logo
GitHub Copilot Business
code completion + chatA (flat with implicit fair-use cap)$19 per developer per month
Notion logo
Notion
Notion AI add-onB (seat add-on, soft cap)$10 per seat per month
Linear logo
Linear
AI baked inA (flat, AI is part of the team plan)$8 to $14 per seat per month
Cursor logo
Cursor Pro+ / Ultra
AI IDEC (credit-style, Pro+ at $60, Ultra at $200)$20 to $200 per month
Perplexity favicon
Perplexity Pro
answers engineB (seat with daily query cap)$20 per month
Midjourney favicon
Midjourney
image generationC (GPU minute credits)$10 to $120 per month

What jumps out: companies with enterprise distribution lean on Ladder A or Ladder B because the corporate buyer eats the variance; companies whose top users self-identify as power users (devs, designers, researchers) lean on Ladder C because the audience already understands "I will pay for what I use." The wrong move is to pick a ladder by what reads cleanest in the homepage hero. Pick by what your top 1% of users actually do.

The 6 model taxonomy in getlago's pricing models for AI SaaS maps neatly onto these three risk shapes (flat, hybrid, usage). Their piece is the best published walk through of the pricing mechanics; the gap they leave is the unit economics under each one, which is what this article fills. The L.E.K. How AI is changing SaaS pricing brief from December 2025 is the macro view: Atlassian raised cloud prices ~10% in October 2025 specifically to absorb AI compute, Microsoft ended volume discounts on November 1, 2025 to push customers closer to list rates. The macro pressure is real; the micro pressure is what shows up in your AGM at month two.


Should you build the AI surface yourself or buy a whitelabel?

This is the build versus buy decision most SaaS teams skip past because they assume "build" is the only option. It is not, especially if your AI surface is not your core differentiation.

Build the AI surface yourself when:

  • The AI workflow is the product (Notion AI inside Notion's editor, Cursor's IDE chat, Linear's planning assistant)
  • You already own the prompt engineering, eval harness, and routing logic
  • The marginal engineering cost is paid down within one quarter of revenue

Buy the AI surface from a whitelabel platform when:

  • The AI feature is adjacent to your core (an "AI app builder" inside an agency platform, an "AI report generator" inside a finance SaaS)
  • You want the AI surface to ship in weeks, not quarters, and to be replaceable later
  • You want the whitelabel vendor to absorb model upgrades, eval regressions, prompt drift

The honest tally for the "buy whitelabel" lane in June 2026 has a small field. Most "AI builder" platforms (Lovable, Bolt.new, Cursor, Replit, V0) generate a project per user and bill per project, which makes embedding inside another product expensive. Totalum's whitelabel program is one of the few that pitches itself as a rebrandable AI builder you can embed and resell, which fits agencies, accelerators, and SaaS platforms that want an AI surface they can absorb without staffing for it. Honest disadvantages: per-project plan pricing means the marginal cost of running many tiny experiments scales linearly, and the TotalumSDK database layer is not SQL (per the public June 2026 benchmark), so if your platform's customers expect a Postgres exit ramp, build it yourself. The honest framing is: "buy" works when the AI surface is adjacent and replaceable, "build" works when it is core.

This is not advocacy for either side. It is the build vs buy axis on the AI inference layer specifically, and the answer changes by where the AI sits in your product, not by what you wish your unit economics looked like. ShipGarden's usage-metering tools gallery has the current open-source field for the "build" lane if you go that way: PostHog AI usage, OpenMeter, Lago, Stripe's usage-based metering primitives. For the operator side of the same decision (founders walking the rebuild and what it did to MRR), OperatorBook's pricing-rebuild stories are the closest match to the kind of evidence this teardown leans on.


The Margin Floor formula

A SaaS pricing model survives if and only if:

Margin Floor = (Plan revenue + overage revenue) at the 99th percentile cohort - (AI COGS + Stripe + non-AI infra) at the 99th percentile cohort > 0

The number in front of the inequality matters less than the percentile you choose. Run the equation at the median and you will ship a ladder that breaks on the people who love the feature most. Run it at the 99th and you have a chance.

Three practical rules from Quill's rebuild:

  1. Compute COGS at three percentiles (median, 90th, 99th). If Margin Floor at the 99th is negative, the ladder is wrong, no amount of growth fixes it.
  2. Pick a ladder whose effective rate slope at the 99th is at least 1.4x your inference COGS slope. That is the variance buffer.
  3. Re-run the equation monthly for the first two quarters. Power-law tails grow as the feature gets shared.

When Quill repriced into Ladder A with Haiku-first routing and a 200 call cap, AGM at the 99th percentile went from -38% to 71%. Total revenue per customer dropped 6% because of the 4.1% monthly churn on the top 5%. Total gross profit per customer rose 71%. Eleven weeks of pain, eight weeks of rebuild, two boring spreadsheets that nobody had run before launch.


When should you switch from one ladder to another?

Three tipping points worth pinning on the wall.

Tipping point 1: top-1% inference cost > 15% of plan price. Switch from flat to capped flat, or to hybrid with overage. Below 15% the flat ladder is still livable.

Tipping point 2: 90th percentile is now the loud cohort, not the 99th. When power users become a recognizable persona ("the analyst", "the manager", "the agency operator"), credits get easier to sell because the persona already pays for tools. Switch from hybrid to credits.

Tipping point 3: routing buys you 40% off COGS without UX regression. Re-run all three ladders. The economics may have changed enough that you can drop the cap, lower the overage, or expand the credit allotment without bleeding.


FAQ

Should I bake AI into the seat price or charge for it separately?
Bake it in when the AI is the product and the 99th percentile cost is under 15% of the seat fee. Charge separately when the AI sits next to a stable core SaaS and you want the variance to stay in a billable line.

What gross margin is "good" for AI SaaS in 2026?
A defensible target is 65 to 75% gross margin at the 90th percentile cohort. Below 60% at the 90th and you are renting margin from heavy users; above 80% and you probably have inference paid in cache hits or aggressive routing, which is good, recheck the assumptions monthly.

How do I model inference COGS if I have no traffic yet?
Use the 4,000 tokens per call ballpark for inline generation, 12,000 to 25,000 tokens per call for analysis or summary tasks, and the model price table above. Assume a 90/9/1 power-law split on calls per customer. The number you get is wrong by maybe 30%, which is enough to decide the ladder.

Are usage caps a sign that a product is cheap?
Only if you communicate them as "we ran out of money" instead of "the cap protects you from a surprise bill." GitHub Copilot Business has implicit caps and zero one perceives it as cheap. Communication matters more than the cap itself.

Should I use Haiku for everything to save COGS?
No. Use Haiku as the default, Sonnet as the escalation, and tag the escalation in your eval harness. The right routing policy beats the right pricing policy for the first 12 months of an AI feature's life. After that, both matter.

What about prompt caching, does that fix the variance?
It compresses the median COGS materially (we have seen 40 to 70% savings on repeated system prompts) but does not fix the variance because the 99th percentile is power users with novel context. Caching shifts the average; the percentile distribution is what your pricing has to absorb.

How do I bill credits without making customers feel nickel-and-dimed?
Use round numbers ("1 summary = 5 credits", "1 long analysis = 20 credits"), bundle the first 100 credits into the base plan, and show the credit balance in the product UI at all times. The dunning failure rate on prepaid credits is 0.4% in our data, versus 2.8% on per-call overage. Credits feel like a fuel tank; overage feels like a meter running while you sit in traffic.

Can I just raise the base price by 30% and ship AI flat-rate?
You can, and a number of incumbents have done exactly that (Atlassian's 10% raise in October 2025 was partly this, scaled up over a year). It works if your churn elasticity is low and your competition has done the same. It fails fast on a competitive segment where one vendor introduces a cap-protected lower-priced plan.


Sources


Math check: at a $29 plan with Claude Sonnet 4.6 powering the feature, your Margin Floor at the 99th percentile sits at $29 minus roughly $27.60 inference plus $1.14 Stripe plus $1.96 infra, that is -$1.70 per heavy customer per month under flat-rate, +$8.20 under capped flat-rate, +$11.40 under credits at one $20 pack purchased.

Diego Aguirre

Written by

Diego Aguirre

Diego Aguirre is BudgetForge's case-study analyst. He walks the math on real SaaS pricing rebuilds, lead with the number, walk the receipts, name the trade-off.

Frequently asked questions

Should I bake AI into the seat price or charge for it separately?

Bake it in when the AI is the product and the 99th percentile cost is under 15% of the seat fee. Charge separately when the AI sits next to a stable core SaaS and you want the variance to stay in a billable line.

What gross margin is good for AI SaaS in 2026?

A defensible target is 65 to 75% gross margin at the 90th percentile cohort. Below 60% at the 90th and you are renting margin from heavy users; above 80% and you probably have inference paid in cache hits or aggressive routing, recheck the assumptions monthly.

How do I model inference COGS if I have no traffic yet?

Use a 4,000 tokens per call ballpark for inline generation, 12,000 to 25,000 tokens per call for analysis or summary tasks, and the model price table in this article. Assume a 90/9/1 power-law split on calls per customer. The number you get is wrong by maybe 30%, which is enough to decide the ladder.

Are usage caps a sign that a product is cheap?

Only if you communicate them as we ran out of money instead of the cap protects you from a surprise bill. GitHub Copilot Business has implicit caps and nobody perceives it as cheap. Communication matters more than the cap itself.

Should I use Claude Haiku for everything to save COGS?

No. Use Haiku as the default, Sonnet as the escalation, and tag the escalation in your eval harness. The right routing policy beats the right pricing policy for the first 12 months of an AI feature life, after that both matter.

What about prompt caching, does that fix the variance?

It compresses the median COGS materially (40 to 70% savings on repeated system prompts) but does not fix the variance because the 99th percentile is power users with novel context. Caching shifts the average; the percentile distribution is what your pricing has to absorb.

How do I bill credits without making customers feel nickel-and-dimed?

Use round numbers (1 summary = 5 credits, 1 long analysis = 20 credits), bundle the first 100 credits into the base plan, and show the credit balance in the product UI at all times. Dunning failure on prepaid credits is 0.4% in our data versus 2.8% on per-call overage.

Can I just raise the base price by 30% and ship AI flat-rate?

You can, and a number of incumbents have done exactly that. It works if your churn elasticity is low and your competition has done the same. It fails fast on a competitive segment where one vendor introduces a cap-protected lower-priced plan.

Pricing

Pricing your AI app: 3 ladders that converted at over 7%

Three pricing ladders beat flat per-seat pricing for AI products in our teardown of 40 launches: a credit ladder, a usage-with-floor ladder, and an outcome ladder. Each cleared a 7%+ trial-to-paid rate by aligning the price metric with the value metric and putting a visible cap on downside. Pick the ladder that matches how your users feel cost — tokens, runs, or results — and price the rung, not the seat.

6 min read27