Talk to an Expert

May 22, 2026 |

GitHub Copilot Pricing Change Reveals the 5-Position AI Pricing Spectrum

Author

TL;DR. Across four decades of software pricing transitions — perpetual license to subscription, on-prem to SaaS, seat-based to consumption — the architectural decision has always been the same: which unit the buyer pays against. AI is the latest transition, not a new question. The GitHub Copilot pricing change announced April 27 — moving Copilot to token-based AI Credits on June 1, 2026 — positions GitHub at the infrastructure extreme of a five-vendor pricing spectrum. Sierra AI sits at the opposite extreme with pure outcome pricing. Google and Microsoft move counter-direction, bundling AI into base seat pricing. Salesforce runs three concurrent pricing models on Agentforce across multiple credit constructs. The pattern isn’t “AI vendors moving away from flat fees.” It’s AI pricing fragmenting across a spectrum, with every major vendor staking a different position on who carries the cost-curve risk.


The five-position AI pricing spectrum

The five positions on the consumption-risk transfer spectrum

AI pricing is fragmenting across five distinct positions, organized by which meter family the vendor selects and how much consumption-cost responsibility each meter family shifts away from the customer. Position 1 leaves the cost curve with the customer; Position 5 absorbs cost variance into a fixed seat price. The middle positions distribute risk in different ways.

Every pricing architecture sits on a value metric — the singular, countable unit the buyer is paying for. SPP introduced the term in the 1980s as the upstream-most decision in pricing, also called the licensing metric: what specific thing the vendor charges against, in language precise enough to invoice on and stable enough for the buyer to forecast against. The term has been re-used in later pricing writing, but the SPP coinage predates those later uses by roughly three decades.

Per-user, per-resolved-conversation, per-closed-deal, per-named-account, per-TB-stored, per-CPU-hour — these are all value metrics. They share three load-bearing properties: singular (one thing counted, not a basket), customer-meaningful (each unit maps to something the buyer recognizes as value received), and contractually stable (the definition doesn’t shift at vendor discretion inside a contract term).

The five positions below differ partly by which unit serves as the value metric — tokens at Pos 1, credits at Pos 2, invocations at Pos 3, verified outcomes at Pos 4, seats at Pos 5 — and partly by whether the unit is genuinely functioning as a value metric at all. Position 2 is where that last question lives, and we’ll return to it in detail below.

Naming a value metric is the easy part. Pricing-strategy decks pick one, put it on a slide, and treat the decision as made. The implementation downstream is where the choice actually matters: how the unit gets defined in code, how it’s metered, how it converts to dollars, how the rate sheet expands as new actions are added, what reserved rights the vendor retains over the meter itself.

These are the parts that determine whether the architecture holds together six quarters later or quietly degrades into something the buyer can’t forecast and the vendor can’t defend at renewal. Some implementations stay faithful to the value-metric idea on the deck. Others have garbled it past recognition.

This piece is written from the operator’s view rather than the consulting view. The consequences live downstream of the slide: on the customer’s invoice and in the vendor’s renewal economics.

Position 1: Passthrough. The meter mirrors the vendor’s own input cost at published rates. Tokens are today’s typical Position 1 unit (GitHub Copilot’s June 2026 change), but the architecture is unit-agnostic. It applies equally to API calls, GB-hours, vector queries, anything where the buyer absorbs vendor input variance against a transparent rate sheet. Customer carries consumption cost directly. When foundation-model rates move, the customer’s bill moves with them.

Position 2: Surrogate. Here the value-metric idea gets stretched. The meter is a vendor-designed billing unit (typically a credit, sometimes a point or compute unit) with vendor-controlled conversion ratios to underlying consumption. The credit isn’t itself a value metric; it’s a surrogate unit that stands in for one.

Credits are today’s typical Position 2 unit (Atlassian Rovo, Salesforce Agentforce, HubSpot Breeze, Snowflake credits), but the architecture is unit-agnostic. Composite compute units (Databricks DBUs, Cloudera CCUs) are the same architectural shape — a normalized vendor-designed unit that aggregates compute capability and bills against a $/unit rate — even though they’re not denominated in “credits.” “Points,” “tokens-with-multipliers,” and “effort units” all sit at the same position.

Customer still bears volume variance through overages, but the vendor abstracts foundation-model cost shifts behind the surrogate. When token rates change, the surrogate price stays stable until the vendor decides to re-rate the conversion table.

The surrogate is the meter; the conversion table from credit to action is where the price decision actually lives. Whether that table is published, stable, and contractually bounded separates a well-behaved surrogate from one that has walked away from the value-metric discipline. The credit-currency section below works through that distinction in detail.

Position 3: Invocation-based. Older per-AI-handled-conversation pricing models (Zendesk’s pre-August-2024 packaging). Customer pays per invocation regardless of success; vendor carries the cost of failed calls. Zendesk itself moved away from this model in August 2024 toward verified-outcome metering. Whether other Pos-3 vendors have followed is not visible from public pricing pages.

Position 4: Outcome-based. Payment only on verified completion. Customer pays for results, not attempts. “Verified” varies in strictness inside Pos 4.

Sierra AI’s resolution criteria are bounded to active customer-signal confirmation (the strongest standard). Customers pay only when the AI agent successfully resolves an interaction, saves a cancellation, or completes an upsell — billable events triggered by the customer’s own action (purchase completion, retention acceptance, problem-solved confirmation) rather than by inactivity timeouts or vendor-side logic. No charge on attempts, partial resolutions, or conversations escalated to human agents.

Zendesk AI (August 2024 outcome-based packaging) runs two different definitions depending on the pricing edition. Essential uses a passive 72-hour inactivity window plus feedback heuristics. Advanced AI agents apply LLM verification that the request was satisfactorily resolved without human intervention, on a default 2-hour messaging window (configurable up to 72h) or 72h for email.

SPP’s LevelSetter fees are event-triggered and verified on closed deals (the strongest standard, alongside Sierra). The verification trigger is the customer’s own contract signature in their CRM — a legally-binding artifact in the customer’s own system of record, which puts the verification mechanism on par with Sierra’s customer-side confirmation. Each closed deal generates a pricing event sized to its TCV, at rates that band downward as the customer’s annual TCV grows. $0 charged on lost quotes. SPP applies its own Margin-Calibrated Discounting to its own pricing.

HubSpot Breeze advertises outcome-flavored display rates (per resolved conversation, per qualified lead) but meters in a unified HubSpot Credit pool. Architecturally Pos 2, not Pos 4. See the credit-currency section below.

Position 5: Bundled. Google Workspace plus Gemini (bundled into Business and Enterprise plans through 2024–2026 policy actions), Microsoft 365 Copilot consolidation into the M365 product stack (December 2025 – April 2026). AI becomes a seat-price lift with no separate consumption meter on top. The customer’s bill is invariant to AI usage, until governance kicks in. Microsoft uses license-based gating (free Copilot Chat versus the paid Microsoft 365 Copilot license, with the paid license increasingly required for Copilot features in M365 apps at large organizations); Google’s Gemini-in-Workspace has fair-use guardrails on heavy features (image generation, NotebookLM source counts). So Pos 5 isn’t pure “vendor absorbs all variance.” It’s “vendor pre-models expected usage and uses license editions and feature-level caps as governance, with feature-access restriction reverting to the customer beyond the threshold.”

FIG. 01 — The AI pricing layer-by-position matrix. Five-position consumption-risk transfer spectrum across six stack layers, with bound-integrity badges distinguishing Snowflake (bounded) from HubSpot Breeze and Salesforce Agentforce (unbounded).

The position decides what shows up on the invoice and what the renewal conversation looks like. Position 1 customers watch their meter move with foundation-model rates. Position 4 customers watch theirs move with verified outcomes. The same nominal product can look like very different deals depending on which position the architecture sits at, and the contract economics diverge further across each renewal cycle.

What actually shifts as you move along the spectrum

The clean reading — “Position 1 = customer absorbs all risk, Position 5 = vendor absorbs all risk” — is true at a generalized level but misleading at the dollar-level. Real-world pricing at each position has implicit governance that reshapes risk allocation:

  • Pos 1 → Pos 2 (token → credit): the meaningful shift is the cost-curve transfer. Foundation-model rate changes hit Pos 1 customers directly but get absorbed into the vendor’s credit-conversion math at Pos 2. That’s a real, dollar-meaningful structural shift — not just a packaging change.
  • Pos 2 → Pos 3 (credit → invocation): vendor begins absorbing failed-call costs. Failed AI invocations that don’t bill become vendor cost rather than customer overage, putting pressure on per-invocation margin unless the vendor prices the rate to compensate.
  • Pos 3 → Pos 4 (invocation → outcome): vendor absorbs execution variance. Only verified completions bill. Failed invocations, retries, and partial work are vendor cost. This is the position where the value metric carries the most vendor-side risk: if the customer mix fails more often than it succeeds, the wrong value-metric choice can swamp the vendor’s economics. Implementation discipline — verification standard, scope bounding, customer-fit screening at acquisition — is what protects the vendor from the variance they’ve now agreed to absorb.
  • Pos 4 → Pos 5 (outcome → bundle): vendor nominally absorbs volume variance, but governance mechanisms (Microsoft’s license-based gating, Google’s fair-use limits on heavy features, Atlassian’s overage triggers in its hybrid model) reintroduce feature-access friction at the customer relationship. Pure unconstrained-usage bundling is rare in production AI pricing today.

The spectrum is the shape. Caps, floors, conversion-rate revisions at renewal, and service-level governance are how vendors manage their actual exposure inside any chosen position.

What sits at the extremes

Sierra AI and GitHub Copilot occupy opposite ends of the spectrum.

Sierra operates a “paying for a job well done” model — Bret Taylor’s public framing — that puts the vendor at maximum risk. Customers pay only when the AI agent successfully resolves an interaction, saves a cancellation, or completes an upsell. The vendor absorbs model costs, infrastructure costs, and execution risk.

Per TechCrunch on the $100M ARR milestone, Sierra reached $100M ARR in 21 months, with growth reported to $150M by February 2026. TechCrunch on the $950M raise reports a post-money valuation above $15B led by Tiger Global and GV. The figures are press-reported, not independently audited.

GitHub’s June 2026 model puts the customer at maximum risk on the transparent variance axis. Tokens are metered at the foundation model’s published per-token rates and then converted into AI Credits at GitHub’s published conversion rate — so customer bills move with whatever happens to foundation model pricing. When OpenAI or Anthropic adjusts per-token rates, GitHub Copilot customers absorb the change directly.

But Position 1’s customer risk is at least legible. The meter mirrors a published vendor rate card. The worst-case bill is forecastable against ceiling rates × consumption volume. Unbounded Position 2 surrogates can produce higher variance against a meter the buyer cannot see. The vendor’s published right to re-rate the conversion table is uncapped.

Enterprise Order Forms may bound this discretion bilaterally (frozen multipliers, capped acceleration clauses, MFN protections on credit-to-action rates), but the default contractual position is that the vendor can re-rate at will. What enterprise buyers extract above that default is procurement-confidential.

The two positions trade transparent variance for opaque variance, not strictly low-risk for high-risk. “Maximum customer risk” depends on whether you mean exposure to foundation-cost shifts (worse at Pos 1) or exposure to vendor discretion on the meter itself (worse at unbounded Pos 2, before any bilateral protections a buyer may have negotiated).

The extremes define the spectrum boundaries. Everything between them is a bet on flow rate. Each vendor is picking an injection point on the risk spectrum where they believe new acquisitions and renewals will compound most aggressively. Move risk toward the vendor and acquisition friction drops; per-unit margin holds or expands when the vendor prices for the value of risk absorbed, and compresses when they treat it as a discount lever. Move it toward the customer and margins hold by default, but every renewal becomes a variance negotiation. The “right” position isn’t the one with the strongest per-deal economics; it’s the one that maximizes the flow rate of contracts in and renewals forward.

L3 wrappers vs. first-party API plugins

The L3 row holds two architecturally different products that look similar on the surface.

Multi-model wrappers (GitHub Copilot, Cursor, Replit) sit between the developer and several foundation model vendors, routing requests across Anthropic, OpenAI, Google, and others.

First-party API plugins (Anthropic’s Claude Code in VS Code, OpenAI’s Codex CLI) are L1 in L3 clothing. The IDE surface is the only L3 component. Billing flows directly to the foundation vendor via the developer’s API key. Model choice is limited to that vendor’s family.

The wrappers share a layer but split on transparency. Replit hides the meter most aggressively with “Effort-based” credits priced in dollars. Cursor publishes a $-denominated API pool and runs two modes inside it: Auto/Composer at vendor rates and specific-model selection at API passthrough. GitHub Copilot’s 2026 move exposes the meter completely: tokens consumed times a published per-model multiplier against a credit allotment denominated in dollars. Same layer, three different decisions about how much of the underlying L1 economics to surface to the buyer.

This matters for buyers and for vendors. A buyer comparing Copilot and Claude Code is not comparing two products at the same layer. Copilot is buying a routing service plus a meter; Claude Code is paying Anthropic only for API usage — the VS Code plugin itself is free. A vendor entering this segment has to decide whether to compete as a wrapper (route, abstract, mark up the meter) or as a first-party plugin (charge for the surface, pass the model cost through, or — like Anthropic — give the surface away and charge only the underlying API). Wrappers carry vendor-side consumption-risk on the foundation cost line: when token rates move, the wrapper either absorbs the margin compression itself or passes it to customers and creates renewal friction. First-party plugins skip that risk entirely — the customer pays the foundation vendor directly, so there’s no margin layer to compress.

Credit-currency engineering and the surrogate-unit problem

Return to the value-metric anchor from the position framework above. A value metric is singular, customer-meaningful, and contractually stable: a per-user count, a per-resolved-conversation count, a per-closed-deal count. Each unit corresponds to something the buyer can recognize as value received. The whole pricing architecture downstream (licensing model, packaging, pricing model) rests on the unit definition holding still through the contract term.

The credit-pool layer at Position 2 is where vendors stretch this discipline. Cursor’s $-API-pool, Replit’s Effort credits, HubSpot Credits, Atlassian Rovo credits, Salesforce’s Flex Credits and Snowflake’s Snowflake Credits all do the same job. They sit between underlying consumption and the buyer’s bill, denominated in a vendor-designed unit. The marketing presents the credit as if it were a value metric: “one credit per X.” The architecture underneath is different. A credit isn’t a singular countable thing the buyer recognizes as value. It’s a surrogate unit that folds multiple consumption types into one billing currency, with the conversion ratio from credit to specific action set by the vendor.

The diagnostic is mechanical. A value metric needs no rate sheet beyond “$X per unit.” If you need a published action-to-credit rate sheet to explain what a credit costs per action, the unit isn’t denominating value. It’s denominating accounting. HubSpot’s rate sheet, for example, prices resolved conversations, prospecting recommendations, and dataset use at different per-action credit costs that scale with action complexity and (in the case of datasets) row count. The rate sheet is the translation layer the vendor controls. The conversion table is where the price actually moves; the headline credit price is not the question.

Surrogate units come in two architectural variants. They look the same in marketing and behave very differently in procurement.

Unbounded surrogates retain vendor discretion to re-rate the conversion table. The vendor can add new metered usage types, shift volume thresholds, accelerate consumption on specific feature classes, or split charges across parallel pools.

Bounded surrogates publish their conversion table, hold it structurally stable, and commit to improving the credit’s underlying value rather than re-rating its cost.

The architectural position on the risk-allocation spectrum (Position 2 in this matrix) is the same in both. The bound integrity is what makes one forecastable and the other not.

Bound integrity itself lives at three layers. Architectural is built into the meter design — Snowflake’s 10% Cloud Services cap, the published doubling-per-warehouse-size structure. Contractual default is what the published terms say — the right-to-modify clauses HubSpot and Salesforce publish. Bilaterally negotiated is Order Form addendums sophisticated enterprise buyers may extract — invisible from outside the relationship.

The unbounded/bounded distinction in this article describes contractual default plus architectural design. Bilateral protections at the Order Form layer could neutralize an unbounded-by-default meter in specific enterprise contracts.

HubSpot Breeze is one live example of the unbounded variant. Its June 2025 unification folded Breeze Intelligence Credits into a single HubSpot Credit pool. The published migration table runs non-uniform conversion ratios across plan-volume bands, set at vendor discretion. Credits expire monthly without rollover. Auto-upgrade-on-exhaustion is the default overage mode. Capacity-pack pricing sits at parity with pay-as-you-go: no commitment discount.

The products and services catalog publishes per-action credit rates and adds two disclosures worth reading carefully. Credits “may also be subject to other applicable service charges” (telephony, SMS), meaning the credit isn’t a complete proxy for the bill. Underlying communication infrastructure passes through separately on top. And “some automated, bulk, and high-volume features consume credits at a faster rate, which may accelerate overall credit usage.” Read most literally, that’s a consumption-velocity warning. Automation fires many actions per unit time, so total credit burn accelerates even when the per-action rate is unchanged.

The opacity is in what qualifies as “automated, bulk, and high-volume.” The disclosure doesn’t define the category, so the buyer can’t pre-classify their workflows or forecast against a specific rate. Combined with vendor-controlled credit-to-action conversion ratios and undefined feature classifications, the buyer is left forecasting against vendor discretion on the meter itself. The marketing-headline outcome rates (per resolved conversation, per qualified lead) are math, not published outcome prices — credits-per-action multiplied by the PAYG credit rate. The credit is the meter; the conversion table is what can change.

Salesforce Agentforce is the same architecture pushed to maximum complexity. Its published Flex Credits rate card meters on two simultaneous axes: action class times prompt complexity. A Data 360 sub-rate card spans roughly 83,000× between the cheapest operation (a basic query) and the most expensive (a real-time pipeline), with four volume-tiered multipliers that reset monthly. Two workflows that look comparable from outside can land at radically different total credit cost based on their underlying operation mix — buyers can’t forecast the bill from a workflow description alone.

A separate Conversations rate card establishes another pool. Salesforce’s own help documentation lists three primary usage-type categories (Flex Credits, Conversations, Einstein Requests), with Data Services Credits referenced as additionally consumed. Einstein Requests adds a token-chunking layer below the credit layer: prompts metered in 2,000-token chunks rounded up. A 6,500-token interaction is billed as four prompts. A separate Voice Minutes supplement assigns “certain customers” to a 60-credit-per-minute meter instead of the 30-credit-per-Voice-Action meter. Identical workloads can land on different rates based on Order Form assignment.

Each pool is independently re-rateable under the contract clause “usage types, tiers, and associated multipliers may be updated from time to time.” The metering logic itself is conditional. Whether any given AI call is metered depends on user profile, permission, execution context, feature identity, and org-level flags. Salesforce’s own documentation states that “a type of AI usage or a use case cannot be guaranteed as unmetered by a high-level description alone.” This is the surrogate architecture stated as a value proposition: “different AI services bill at varying rates against a unified credit pool.” The opacity is the feature.

Snowflake is the closest thing to a bounded surrogate in production. Its credit consumption table publishes credits-per-hour by warehouse size (1, 2, 4, 8 doubling per T-shirt size, with the structural model unchanged since launch). Edition-based on-demand pricing runs at three published per-credit rates across Standard / Enterprise / Business Critical. Cloud Services overhead is capped at 10% of daily compute spend (anything below is free). That’s a meaningful margin protection that doesn’t exist in the unbounded examples. Capacity Agreements deliver real per-credit discounts and long-term price guarantees, unlike HubSpot’s commitment-equals-PAYG pattern.

The disciplinary signature is observable rather than promised. The credit-to-warehouse-hour table has held its doubling-per-T-shirt-size shape across multiple years of published rate-sheet revisions. Edition-based per-credit pricing has been stable for years. Snowflake ships continuous platform optimizations (query-planner improvements, auto-clustering refinements, storage compression updates) that reduce the credits required to run the same workload. That’s a vendor-direction commitment to improving the unit’s underlying value rather than re-rating its cost. None of those properties exist in HubSpot’s catalog or Salesforce’s rate cards. That posture is the opposite of Salesforce’s explicit reservation (“usage types, tiers, and associated multipliers may be updated from time to time”) or HubSpot’s quiet vagueness on what counts as a “high-volume feature.”

Snowflake’s bound integrity still has visible cracks worth naming. Multi-edition pricing couples packaging access to an edition-based price multiplier. Snowpark-optimized warehouses use a separate (higher) credit scale. AI Credits are a separate construct with ACV-banded pricing. Serverless features have feature-specific multipliers. VPS pricing is “Talk to sales.” So Snowflake is partially bounded: bounded on the core compute meter most buyers think about, conditionally bounded at the edges.

The contrast establishes the dimension. Position on the consumption-risk-transfer spectrum is one axis; bound integrity is a second.

A vendor can sit at Position 2 (surrogate) with a published, stable, capped conversion table (Snowflake on its core meter). Or with a right-to-modify clause and parallel pools the buyer can’t predict (HubSpot, Salesforce). The risk-allocation choice is the same; the forecastability difference is enormous.

Bounded surrogates can be modeled by customers. Unbounded surrogates cannot. At least one input (the vendor’s discretion to re-rate the conversion table) has no published bound. Whether a specific enterprise contract neutralizes that discretion through Order Form addendums is the procurement-confidential layer. The contractual default is that the vendor retains the discretion.

Customers without bilateral protection handle the unbounded variant by widening error bars on forecast and absorbing variance at renewal. That works for the vendor short-term and erodes renewal economics over time.

The conversion table is what the vendor retains discretion over by default. The architectural question for a software company designing this is whether that discretion shows up as a margin win or as a renewal headwind. Vendors that publish a stable, capped conversion table buy customer forecastability, and the trust that compounds across the contract lifecycle. Vendors that don’t keep the optionality but accept the renewal friction that follows. The decision is the vendor’s; the consequences are downstream.

Where Does Your AI Pricing Land on the Spectrum?

Each position on the consumption-risk transfer spectrum carries different margin implications as AI costs collapse. We can pinpoint where your pricing sits and whether it still makes economic sense.

The GitHub Copilot pricing change in detail

The June 1, 2026 change in plain terms

GitHub’s announcement is structurally simple and operationally significant. The new billing unit is GitHub AI Credits, GitHub’s own branded term, defined in Copilot’s usage-based billing documentation as “the billing unit for Copilot usage in Copilot Business and Copilot Enterprise.” Token consumption (input, output, and cached tokens) gets priced per the model used, then converted into AI Credits at the published rate. Premium-request-unit metering disappears for Business and Enterprise.

GitHub’s framing acknowledged that “a quick chat question and a multi-hour autonomous coding session can cost the user the same amount.” The token-based change addresses that disconnect.

Plan structure as of the June 1, 2026 change: – Copilot Pro and Copilot Pro+ (individual): both frozen to new sign-ups. Existing subscribers retained; individual plans follow a different billing structure than the AI Credits model. – Copilot Business and Copilot Enterprise: each ships with an included monthly AI Credit allowance calibrated so the credit-equivalent in dollars equals the seat price. Same dollar-for-dollar match at both tiers, with promotional allowances above that floor through the initial transition window.

The dollar-for-dollar equivalence between seat price and included credit allowance masks the economic shift. A seat with a credit-equivalent allowance equal to its price isn’t the same product as a seat with unmetered chat. Customer value inside that bill becomes a function of token consumption against vendor-set per-token rates that flow through the AI Credit conversion.

What the double-meter does to enterprise budgets

Copilot code review carries a structural twist starting June 1. Every code review run on a private repository bills in two ways simultaneously. AI Credits under the new usage-based model. GitHub Actions minutes drawn from existing plan entitlements, with overage billed at standard Actions rates.

The double-meter lets GitHub capture value that scales with repository size and review duration. Linux runner rates scale by core count per the Actions pricing calculator; AI Credit rates track model API rates directly. Different units, entitlements, and overage curves.

Multi-meter pricing returns on-premise complexity to SaaS. Organizations with multiple private repos face one estimation problem per repo, plus portfolio aggregation. The forecasting load sits on the customer’s finance function, a constituency that doesn’t have a playbook for multi-meter AI pricing and brings that frustration to renewal.

The pattern itself isn’t new. Pre-AI SaaS already ran users+storage, or users+storage+CPU, when no single value metric captured the workload cleanly — each additional meter trying to catch the workloads that didn’t fit the primary one, and each meter published with its own rate. AI repeats the pattern. Tokens are the new base unit under the hood, the way CPU-hours were before. Where AI departs from the historical pattern is the surrogate credit layer that rides on top of tokens at Pos 2 — collapsing the multi-dimensional reality into a single billing unit, with the underlying meters hidden inside a vendor-controlled conversion table. The forecasting challenge is the same one buyers have wrestled with for decades. The visibility into where the variance lives has gotten worse.

The architectural alternative GitHub didn’t take

The architectural alternative would have been productivity-metric pricing. Keep per-seat licenses, redefine seat entitlements, price against pull requests merged or time-to-completion metrics customer groups can verify. This approach represents a value metric aligned with developer productivity outcomes.

From the outside, the constraints on a move like that are familiar in the class of decision even if we can’t see inside GitHub’s specific situation. Productivity-metric redesigns typically take nine to eighteen months — define the metric, build the telemetry, align sales and customer-success on the new commitment shape, reprice contracts at renewal. That timeline isn’t always available when foundation-model costs are moving quickly. Vendors in that position tend to ship what their existing billing infrastructure can support now rather than what would be architecturally cleanest over a multi-year horizon.

What’s observable from outside is a meter customers can’t forecast against without modeling the vendor’s input-cost trajectory. The downstream consequences of meters like this — renewal resistance, field-defense load on every deal — are well-established as a class. Whether GitHub modeled those tradeoffs and accepted them, or was operating under tighter constraints than any of the alternatives, is something only GitHub knows.

Credit-based metering — Atlassian and Salesforce

Atlassian Rovo — bundled credits with metered overage

Atlassian Rovo launched in February 2026 with bundled credits plus overage metering. Customers get Rovo access through their Cloud subscription with a monthly credit allowance included. Complex queries or high-volume usage consume credits, with overages billed separately. The structure runs two credit currencies within one product: standard Rovo credits for search and insights, plus Rovo Dev credits for code assistance.

Early enterprise customer experience with Rovo’s bundle-plus-overage structure has surfaced consumption-spike exposure as a renegotiation pressure point at renewal. SPP documented the pattern in Atlassian credits vs HubSpot resolutions.

The architectural concern is that the credit conversion rate is what the vendor retains unilateral discretion over. Policy commitments (Atlassian’s stated 90-day notice before activating overage billing) are not the same as contract terms. At renewal, the pressure shows up as credit-cap and price-protection clauses being negotiated into the contract before activation.

Atlassian’s credits are designed units that mask underlying compute behind stable per-action mapping. Bundling the credit allowance into the base subscription doesn’t change that it’s a consumption metric: overages still push variability to the buyer. The distinction from Position 5 vendors is exactly this: Atlassian absorbs baseline usage into seat prices but exposes customers to consumption volatility through credits.

Salesforce Agentforce: multiple credit constructs running concurrently

Salesforce Agentforce operates the most architecturally complex AI pricing surface among major vendors. Three documented credit-bearing constructs run concurrently (Flex Credits, Conversations, Einstein Requests), with Data Services Credits additionally consumed by certain features. Salesforce’s GenAI Usage and Billing guidance layers three concurrent pricing models on top: Consumption-Based, Hybrid, and Business-Metrics-Based. Mechanics were covered in the credit-currency section above. Per Salesforce’s pricing pages and the rate cards cited earlier.

The architecture distributes AI consumption across multiple concurrent meters that don’t substitute for each other. Different customer workloads pull against different meters: high-volume customer service workloads consume against the Conversations pool, variable agentic workloads consume against Flex Credits, embedded LLM call patterns consume against Einstein Requests with the 2,000-token chunking applied. Large enterprise customers carry the full multi-meter load against negotiated Order Form entitlements that determine which budget envelope absorbs which workload class. Among the major AI-pricing vendors we’ve examined, Agentforce sits at the top end for concurrent-meter count on a single product surface — though credit systems built on older on-premise software (Oracle, IBM, mainframe-era enterprise licensing) have run comparably complex multi-meter structures for years, so the complexity itself isn’t unprecedented in software pricing.

What both reveal about credit-based pricing under uncertainty

Both vendors ship multiple credit constructs concurrently. Atlassian runs a bundled-with-overage model on baseline plus spike consumption. Salesforce runs three concurrent credit constructs that distribute consumption across meters which don’t substitute for each other. Whether each vendor views multi-construct as the destination state or as a transitional shape while behavior and renewal patterns play out isn’t externally observable. What is observable: customers carry the multi-meter load either way.

Credits are designed units that look more stable than tokens because the per-action mapping is fixed. But they still scale with consumption, so customer variability still rides the meter. Customer variability that rides the meter shows up in vendor renewal conversations as predictability pushback. The distinction from Position 5 vendors is that Google and Microsoft absorb cost-curve risk into seat prices, while Position 2 vendors keep it on the meter for the customer to manage.

The outcome-pricing extreme: Sierra and Zendesk

Sierra’s “paying for a job well done” model

Sierra’s success validates a thesis broader than “outcome pricing works for AI.” AI is a capability, not a pricing category.

The consequential decision for any software business is what unit the buyer pays against, where on the consumption-risk-transfer spectrum the vendor places that unit, and how the metric is implemented in code, contract, and meter. Sierra picked “verified resolution” as the value metric, anchored it at the vendor-absorbs-execution-risk end of the spectrum, and invested in the implementation work (active customer-signal confirmation, verifiability infrastructure, bounded scope) that makes the metric contractually defensible.

That combination is what’s scaling, not “outcome pricing” in isolation. The AI is the capability that makes the outcome possible at scale; the value-metric architecture is what turns the capability into a business model.

Three flavors of outcome alignment at Position 4

Position 4 isn’t a single shape. Three distinct flavors of outcome alignment are running in production today:

  • Sierra: per resolved interaction. Flat fee on each verified outcome. Vendor gets the same fee whether the saved customer is a $10/month account or a $1M/year enterprise. Outcome alignment is binary (resolved or not).
  • Zendesk Advanced: per LLM-verified resolution. Flat-per-event like Sierra, but verification anchors to Zendesk’s own LLM judgment that the customer’s request was satisfactorily resolved without human-agent intervention — a vendor-side standard, not a customer-side one. Inactivity-window and feedback-signal heuristics apply at the Essential edition; LLM verification is the Advanced-edition standard.
  • LevelSetter: verified on closed deals, with pricing events at TCV-banded rates. Each closed deal generates a pricing event sized to its TCV, with rates banding down as the customer’s annual TCV grows. Verification anchors to the customer’s contract signature in their CRM, matching Sierra’s customer-side confirmation standard. SPP applies its own Margin-Calibrated Discounting to its own pricing. Fees scale with the magnitude of the customer’s outcome rather than running flat per-event, with $0 charged on lost quotes.

The strictness of outcome alignment varies along two axes: how verified the outcome is, and whether fee magnitude scales with outcome magnitude. The purest forms tie both — verified outcomes AND fees proportional to the value created.

Zendesk’s per-resolution packaging — outcome with a floor

Zendesk operates modified outcome pricing. Per-resolution rates are published, with committed-volume pricing below pay-as-you-go, and additional volume-tier discounts that step the per-resolution rate down at defined resolution thresholds. Per Zendesk’s announcement.

Zendesk launched this outcome-based approach in August 2024, the first in the CX industry to price AI agents on verified results. Verification depends on the pricing edition and channel.

Essential and Legacy AI agents use a 72-hour inactivity window combined with relevance checks and customer feedback signals: a “no-reopen plus feedback proxy” model. Advanced AI agents use LLM verification that “the customer’s request was actually satisfactorily resolved without human-agent intervention.” Default 2-hour window for messaging (configurable up to 72h) and 72h for email. Escalation, agent intervention, negative feedback, or a customer requesting live chat all invalidate the resolution.

Plans include a baseline of 5–15 resolutions per agent per month. Committed packs (100+ resolutions) are priced below pay-as-you-go overage. Annual cap of 10,000 across all plans. Per Zendesk’s resolution definition documentation.

Allotments expire at the end of each billing cycle. Zendesk explicitly states resolutions “do not roll over to the next billing period.” Customers configure exhaustion behavior between two modes: soft overage that keeps AI agents running at pay-as-you-go rates, or a hard stop that pauses AI handling and routes traffic back to live agents. Per Zendesk’s resolution management documentation.

HubSpot Breeze sits adjacent to Pos 4 in the marketing (per-resolved-conversation and per-qualified-lead display rates) but the underlying meter is a unified HubSpot Credit pool consumed by specific actions. Breeze’s architecture is Pos 2 (credit-based) with outcome-flavored display rates layered on top. Its verification standard sits below both Sierra’s strict customer-bounded outcomes and Zendesk Advanced’s LLM-verified resolution because the credit pool, not the outcome, is the underlying meter. Atlassian credits vs HubSpot resolutions walks through the architectural distinction between consumption-based and outcome-based metering in detail.

The counter-model — Google and Microsoft

Google and Microsoft moved opposite to GitHub and the credit-based vendors. They absorbed AI cost into seat pricing without consumption metering on top. Customers pay the same regardless of how much AI they use.

Google Workspace plus Gemini — bundling into base

Google has been moving Gemini AI from a separate Workspace add-on into the seat-price layer through a series of bundling and pricing-policy actions since 2024. Public reporting has covered both the bundling of Gemini features into Business and Enterprise plans and the associated seat-price increases that accompanied the consolidation. Per Google Workspace’s product announcements.

The architectural direction is consistent: AI becomes a seat-price decision rather than a separate consumption meter. Google absorbs consumption volatility in exchange for higher seat prices applied across the customer base.

Microsoft 365 Copilot — two repricings toward consolidation into the M365 stack

Microsoft executed two Copilot repricings between December 2025 and April 2026. Both moved toward absorbing Copilot back into the M365 productivity stack rather than running it as a separately-priced add-on.

April 15, 2026 brought the significant change. Users at organizations with more than 2,000 Microsoft 365 users lose Copilot Chat in Word, Excel, PowerPoint, and OneNote unless they hold a paid Microsoft 365 Copilot license.

Per Microsoft’s announcement and Computerworld’s reporting, which called the move a “mystifying backtrack.”

Microsoft’s approach moves opposite to GitHub’s passthrough model. Instead of pushing cost volatility to customers, Microsoft consolidates it internally through forced paid-license adoption.

Should You Absorb AI Costs Into Seat Pricing?

LevelSetter models whether the Google-Microsoft approach of embedding AI into existing seats generates more revenue than consumption metering for your specific customer portfolio.

Where each vendor sits on the AI stack, and why it predicts their pricing position

The six-layer AI pricing-defensibility framework

The framework names six layers organized by pricing defensibility. Each layer’s defensibility logic flows from L1: raw model API cost where the vendor’s rate is independently verifiable against published foundation-model rates. The further from L1, the harder consumption pricing is for the vendor to defend in field sales and at renewal, because the customer can no longer reconcile the meter to any externally observable cost or outcome.

L6 is L5 with specialization as the defensibility lever. Bounded vertical agents (Sierra, Zendesk Advanced, LevelSetter) have outcome verifiability that horizontal platforms structurally don’t. The row is broken out for visual clarity; the underlying axis is scope, not architectural distance.

The pattern predates AI by a decade. Cloudera’s CCU descends from the Hortonworks HCU that preceded the 2018 Cloudera–Hortonworks merger. Databricks introduced the DBU around 2017. Snowflake’s credit-based metering has been the defining example in cloud data warehousing for years. Multiple independent vendors across data platforms and lakehouses have converged on the same structure. L2 platforms reach for designed consumption units because their underlying products are heterogeneous (compute, storage, data, model inference), not because AI changed the economics. The AI pricing pattern at each layer is the latest expression of a much older layer-pattern.

The clearest test of the L2 pattern is where it breaks. BigQuery on-demand prices per TiB scanned (pure passthrough at published rates, no credit abstraction). Google chose to price the dominant resource (data scanned) directly rather than wrap it in a designed unit. The same architectural layer (cloud data warehouse), the same product category as Snowflake, the opposite pricing position. That contrast reveals what credits actually buy: cost-curve management. Snowflake adjusts credit-to-compute conversions internally when underlying compute rates shift; BigQuery on-demand passes the underlying rate through to the customer. L2 isn’t uniform. Vendors choose whether to absorb the cost-curve or expose it, and BigQuery is the name-brand L2 vendor that chose exposure.

Layer Stack layer What sits there Examples Typical AI pricing
L1 Consumption wire Foundation model APIs OpenAI, Anthropic, Gemini API Per-token at published rates
L2 Platform (PaaS) Designed consumption units for heterogeneous compute / storage / data Snowflake, Databricks, Cloudera, Pinecone, LangChain Cloud; anomaly: BigQuery on-demand Mostly consumption units (Snowflake credits, DBU, CCU, per-vector, per-second). BigQuery on-demand prices per TiB scanned — the L2 vendor that didn’t reach for credits.
L3 Developer tools Dev-facing AI products GitHub Copilot, Cursor, Replit Seat + $-denominated pool abstraction is the L3 wrapper norm (Cursor’s API pool with vendor + passthrough modes; Replit’s Core plan with Effort-based credits). GitHub Copilot’s 2026 token×multiplier exposure is the same-layer anomaly: more transparent meter, same wrapper architecture. First-party API plugins (Claude Code) are L1-in-L3-clothing: single-vendor, free IDE plugin, direct API billing
L4 Productivity platform Collaboration suites with embedded AI Google Workspace + Gemini, Microsoft 365 Copilot, Atlassian Cloud + Rovo Bundle (Google, Microsoft) or bundle + credit overage (Atlassian)
L5 Workflow application Business application SaaS with AI agents Salesforce Agentforce, HubSpot Breeze, ServiceNow AI, Workday AI Credit-based metering, often multi-product credit families. HubSpot’s June 2025 unification of Breeze Intelligence Credits into HubSpot Credits — with vendor-set, non-uniform conversion ratios across plan-volume bands — is the defining example of credit-currency engineering at this layer.
L6 Specialized workflow (L5 narrowed) Bounded vertical agents with defensible outcome verifiability Sierra (CX), Zendesk Advanced (CX), LevelSetter (SPP, pricing strategy) Per-resolution, TCV-banded pricing events, or pure outcome

The diagonal pattern — and three anomalies it reveals

Vendor layer position broadly predicts spectrum position. L1–L3 cluster at Position 1 (customer absorbs all variance — token passthrough); L6 clusters at Positions 3–4 (shared-to-vendor-absorbed variance — action and outcome). The middle layers (L4–L5) sit in the interior — that’s where the risk-allocation choices vary most.

  1. L4 splits three ways. Atlassian, Google, and Microsoft all sit at the same productivity-platform layer. They picked three different pricing positions: Atlassian (Position 2, credits with overage), Google (Position 5, pure bundle), Microsoft (Position 5, bundle with consolidation into the M365 stack). Same layer, three different bets — the most informative interior finding.
  2. L2 splits between credit abstraction and passthrough. Snowflake, Databricks, and Cloudera wrap heterogeneous compute behind designed consumption units. BigQuery on-demand chose the opposite path — pure passthrough at published per-TiB-scanned rates, no credit layer at all. Same architectural layer, same product category as Snowflake, opposite pricing position. The credit abstraction is a vendor choice at L2, not an architectural inevitability — and BigQuery is the name-brand L2 vendor that opted out.
  3. GitHub straddles layers. Copilot’s chat-and-completions surface sits close to the consumption wire (effectively L2). Its agentic features (cloud agent, Spaces, code review) sit further up (L3-L4). GitHub priced all of it at L1/L2-style token rates — the layer mismatch shows most acutely on the application-layer features.

Multi-metric history predicts how vendors implement their position

Layer position predicts WHERE a vendor lands on the spectrum. Pre-existing pricing pattern predicts HOW they implement that position.

Multi-metric and consumption-unit history predisposes vendors to Position 2 (credits). Atlassian, Salesforce, Snowflake maintained parallel meters before AI (storage + users + compute + transactions). Databricks (DBU) and Cloudera (CCU, descended from the Hortonworks HCU) already ran designed consumption units pre-AI.

The operational inertia is real — layering one more meter when AI arrives is a near-zero shift on top of an existing consumption operation. The cognitive inertia is bigger. Multi-meter vendors pursue accuracy, and accuracy breeds complexity. When AI gets added, the same mindset applies: meter it precisely too. Complexity compounds with each new product rather than diminishing. The pre-AI metric DNA carries through to AI pricing; the default to capture every dimension carries through with it.

Single-metric seat-based pricing predisposes vendors to Position 5 (bundle). Google Workspace and Microsoft 365 sold seats. Absorbing AI cost into the seat price was a smaller operational shift than adding a meter their billing systems and customer-success motion hadn’t run before.

The next-to-move vendors will likely sort along this same axis. Watch for multi-metric incumbents adding AI credits, and single-metric incumbents absorbing AI into seat lifts.

What software companies architecting AI pricing should take from this

Three takeaways carry the weight of the analysis above.

Choose the position deliberately. Position 1 (passthrough) exposes customers directly to your input-cost trajectory; that exposure shows up as resistance at renewal. Position 2 (surrogate) gives the vendor margin control over the conversion table, but trades short-term flexibility against long-term customer trust. Position 4 (outcome) unlocks the cleanest renewal conversation — and the strongest per-deal margin posture — if the verification infrastructure is built and the outcome is priced for the value of risk absorbed, not as a cost-plus markup. Outcome pricing is where the perceived value is concentrated, and where vendors are most defensible commercially. Done well, margins hold or expand. Done as a discount lever, they compress. The vendor controls which version they ship. Position 5 (bundled) is operationally simplest but caps monetization of heavy users, with cap-as-governance reintroducing the variance you tried to eliminate.

Implementation determines whether the position holds. Two Position-2 vendors with the same headline credit-pool architecture can present completely different renewal experiences — one bounded (published, stable, contractually capped) and forecastable, one unbounded (vendor discretion on the meter, opaque categorizations) and a source of recurring contract conflict. Sierra’s outcome model works because Sierra built verifiability infrastructure most vendors don’t have. Snowflake’s credit system works because the conversion table is published and structurally stable.

Tokens above the infrastructure layer is the trap. Token-based pricing inherits the structural flaws of credit-based AI pricing when applied above the infrastructure layer. At GitHub’s chat features (infrastructure), token pricing aligns with consumption economics. At code review, Spaces, and cloud agent (application layer), token pricing pushes customers to absorb cost variance against a meter they can’t reconcile to value delivered. What the vendor often doesn’t see is that high bill variability changes customer behavior. Buyers throttle their own usage to manage forecasting risk, and that throttle is invisible from the vendor side — reduced consumption reads as “demand we already captured” when it’s actually demand the meter design suppressed. Value-based pricing’s job is to unconstrain usage by making the price legible against value delivered. When that’s right, vendor and customer both reach healthy, maintainable value extraction — the meter stops being the thing standing between the buyer and the product.


The value metric you choose, where on the spectrum you place it, and how you implement it across code, contract, and meter is the most consequential pricing decision a software business makes. That hasn’t changed across four decades of software pricing transitions — perpetual license to subscription, on-prem to SaaS, single-product to platform, seat-based to consumption, and now AI. The capability shifts; the architectural decision underneath it doesn’t.

If you’re navigating an AI pricing transition and want pattern recognition from comparable shifts that came before, our approach draws on a four-decade engagement library of those transitions. Book a working session to apply the framework to your specific position.


Framework v1.0, May 2026, Software Pricing Partners.

FAQs

Ready for profitable growth?

Hit the ground running and learn how to fix your pricing.