April 7, 2026 |

Why Willingness-to-Pay Surveys Fail B2B Software Companies

Author

Chris Mele

Why Willingness-to-Pay Surveys Fail B2B Software Companies

TL;DR WTP surveys overstate what buyers will pay, the more so for complex products, can’t capture value buyers haven’t experienced, and produce static snapshots that are stale within a quarter. AI pricing compounds every one of these problems. The alternative (continuous monetization) measures real demand through transaction data, not hypothetical responses.

Table Of Contents

Why Willingness-to-Pay Surveys Fail B2B Software Companies
What “willingness to pay” actually measures
The overestimation problem: surveys inflate WTP by up to 2x
“But we use conjoint, not price-sensitivity surveys”
Why software makes the problem worse
AI compounds the problem
Most B2B buyers don’t decide on price
The endowment effect: buyers undervalue what they don’t yet have
Surveys don’t account for salespeople’s willingness to discount
B2B software pricing in practice
Why a snapshot can never replace a system
The alternative: continuous monetization
The argument in summary
FAQs

Most pricing consultancies start a software pricing engagement the same way: survey your customers, run a survey-based price sensitivity methods or conjoint analysis, and use the results to set your price. It’s clean, it’s quantitative, and it’s been the default playbook for decades.

There’s one problem. The research shows it doesn’t work the way people think it does.

Not because the methods are obscure, survey-based price sensitivity methods and conjoint are the standard tools in pricing research and have been for decades. The problem is more fundamental: they were designed for a world of physical goods and rational price evaluators, and B2B software is neither. The number a WTP survey produces is shaped by biases that the survey itself cannot detect.

I’ve argued before that 90% of value-based pricing and selling in B2B software is a hoax (that widely used methods borrowed from B2C markets don’t work in B2B, where products and usage are more complicated, making value harder to compare and estimate. Below is the academic evidence behind that claim) and what we’ve seen work instead after 40 years of pricing B2B software.

What “willingness to pay” actually measures

Willingness-to-pay is the maximum price a buyer would accept before walking away from a purchase. In theory, if you know every customer’s WTP, you can set prices that maximize revenue, charging each segment as close to their ceiling as possible.

The challenge is measuring it. There are three broad approaches:

Hypothetical survey methods ask buyers to state what they’d pay. The most common are:

survey-based price sensitivity methods Price Sensitivity Meter (price-sensitivity surveys), four questions that identify prices the respondent considers “too cheap,” “cheap,” “expensive,” and “too expensive.” Plot the cumulative responses and the intersections define an acceptable price range and an optimal pricing point.
Contingent Valuation (CV), a direct question: “What is the maximum you would pay for this product?”
Conjoint Analysis, an indirect method where respondents rank product configurations with different features and prices. Statistical modeling infers the implied WTP for each attribute.

Incentive-aligned methods put real money on the table:

incentive-aligned auction methods mechanism (incentive-aligned methods), respondents state their maximum price, then a random price is drawn. If the random price is at or below their stated WTP, they must actually buy at that price. If it’s above, they can’t buy. Because over-bidding or under-bidding can’t help them, respondents have an incentive to reveal their true WTP. For familiar, frequently-priced consumer goods with clear reference points, these are the most accurate methods we have. The accuracy falls off for complex, unfamiliar products buyers struggle to value, the B2B software case, and even at their best they price one person, not the group that buys.

Revealed preference methods observe what buyers actually do, market transaction data, A/B price testing, auction results.

The first category (hypothetical methods) is what the vast majority of pricing consultancies use, and it’s where the measurement breaks down.

To be clear about where the tools earn their reputation: for a familiar consumer product with abundant reference points, a coffee maker, a streaming plan, a pair of running shoes, these methods work well. The buyer has bought the category before, knows the going rate, and one person decides. Two things go wrong in B2B software, and each is enough on its own.

The first is that the decision is collective. Every individual elicitation method, survey or incentive-aligned auction alike, measures one person’s willingness to pay, and almost no B2B purchase is one person’s decision. A mid-market founder decides with operations and finance. An enterprise committee evaluates a seven-figure platform. The number you want belongs to the group, and you only ever measured individuals.

To get from individual answers to a group price you stack three bets, and you check none of them. First, that each person stated their true value. Second, that you can recombine those values into the group’s value. Third, that the rule you used to combine them, sum or average, is how the group actually decides. Three guesses stacked up, presented as one number.

The middle bet is the one almost nobody examines. Averaging only works when everyone is reading the same number with noise. Two people on a buying committee are not noisy reads of one price. They are two different prices, held for two different reasons. Average two different things and you invent a number nobody holds.

A peer-reviewed field experiment on joint household purchases showed exactly this. Individual stated willingness to pay diverged sharply from the couple’s actual joint decision, and the partner with the stronger preference was over-ruled by the one who controlled the money. The joint choice was not an average of the two values. It was the outcome of a bargaining game, settled by who held the budget, not by the arithmetic of who wanted it more.

That study is about a household, not a software deal, and the numbers don’t transfer. The structure does: the moment you combine individual answers, you are smuggling in a model of how the group decides. Sum them, average them, or let the decider take all, and each rule yields a different willingness to pay from the same responses. Nothing in the survey data tells you which rule the group will use.

The second problem is that B2B software gives the buyer almost nothing to anchor on. The methods that work for the coffee maker work because the category is familiar and frequently priced. B2B software is the opposite at every size, from a small team’s first purchase to an enterprise rollout: it is often novel, bought rarely or once, high-stakes against the budget, and has no established price for the specific value it creates. Peer-reviewed measurement work finds the overestimation problem is markedly worse for exactly these complex, hard-to-assess products than for familiar ones. We unpack why the software case is the hardest of all below.

Collective buying and missing reference points are independent. Either one breaks the survey on its own; B2B software has both. So even a flawless auction misses the buying unit, and even a single decider would be guessing without a reference price. The most reliable signal of what a buying unit will pay is the one place willingness to pay is revealed under real stakes: the win rates by price and the discount patterns of the deals you actually closed.

The overestimation problem: surveys inflate WTP by up to 2x

Across the broader research, the overstatement averages about a fifth and runs larger for complex, high-consideration products; the sharpest single result comes from a controlled low-cost test, where the gap reached nearly twice. The first empirical comparison of price-sensitivity surveys against an incentive-aligned method asked consumers to value the same product using three methods: contingent valuation, incentive-aligned methods, and survey-based price sensitivity methods, in a controlled field experiment where respondents had to back their stated price with real money.

The results were stark.

When researchers simply ask people what they would pay, the answers scatter all over the map and bear little relation to what those same people actually pay when it is time to buy. People are guessing. The answers only tighten and start tracking real purchases when the research makes them put actual money on the line. That distinction is the whole point: the accurate version is not a sharper survey, it is people buying. A survey never has money at stake, so it never earns that discipline. This is why asking inflates the number.

Hypothetical methods produced WTP estimates nearly twice as high as what people actually paid when real money was on the table.

Why? The researchers attribute it to hypothetical bias, respondents in a survey face no consequences for overstating what they’d pay. When there’s no transaction at the end, the cognitive cost of saying “$100” instead of “$60” is zero. The number they give you isn’t what they’d pay. It’s what they can imagine paying in a low-stakes thought experiment.

The author’s own conclusion: price-sensitivity surveys “yields biased results because of its hypothetical nature and its focus on minimum customer resistance.” One study did find that price-sensitivity surveys’s intersection-based optimal price happened to land near an incentive-aligned benchmark, but that was for 36-cent chocolates, with the author attributing the result to two biases coincidentally cancelling and calling for validation on “more expensive and industrial products” that has never materialized.

The volatility problem runs deeper than bias direction. Even a statistician working to improve the survey-based price sensitivity methods method documented acceptable price ranges spanning multiples of the base price from a small respondent sample, with the “optimal price” bouncing between $9.50 and $12.90 across bootstrapped samples. A separate head-to-head comparison found survey-based price sensitivity methods failed to detect product differentiation effects that choice experiments captured (and 23% of price-sensitivity surveys responses contained logical contradictions, where respondents’ “too expensive” threshold was lower than their “expensive” threshold. The errors aren’t even consistently directional: some studies find overestimation, others find underestimation of 2–5% versus choice experiments. survey-based price sensitivity methods doesn’t reliably err in one direction) it unreliably errs in every direction.

Controlled research with real B2B software buyers points the same way: buyers don’t carry a single willingness-to-pay number, they carry a range. The acceptable band for the product was wide, and how buyers responded shifted sharply depending on where the price landed inside it. A survey that hands you one “optimal price” is collapsing that whole band of buyer psychology into a single figure.

“But we use conjoint, not price-sensitivity surveys”

This is the most common objection from sophisticated pricing consultancies. They’ll concede that survey-based price sensitivity methods is a blunt instrument, then argue that their choice-based conjoint analysis is fundamentally different, that by forcing respondents to make trade-offs between realistic product configurations rather than stating a price directly, conjoint sidesteps the hypothetical bias problem.

It doesn’t. Here’s why.

Conjoint is still hypothetical. The standard WTP method classification places discrete-choice analysis (the dominant form of conjoint used in pricing research today) in the Indirect / Hypothetical quadrant.

	Stated Preference (Hypothetical)	Stated Preference (Incentive-Aligned)	Revealed Preference
Direct	survey-based price sensitivity methods, Contingent Valuation	incentive-aligned methods mechanism	Market data analysis
Indirect	Conjoint / Discrete-Choice	—	Experiments, Auctions

Source: WTP measurement method classification from peer-reviewed pricing methodology research. survey-based price sensitivity methods and conjoint (the two most common methods in B2B software pricing) both sit in the hypothetical column. Not incentive-aligned. Not revealed preference. Hypothetical. The respondent is choosing between product configurations on a screen, not signing a purchase order. The cognitive environment is identical to any other survey: no money changes hands, no procurement committee reviews the decision, no implementation risk is evaluated. The trade-off is more realistic than “what’s the maximum you’d pay?” but the fundamental problem remains, the respondent faces zero consequences for their stated choices.

Conjoint triggers strategic behavior in B2B. Research on pharmaceutical pricing methods found that direct questioning leads to “bargaining behaviour” where respondents “systematically understate willingness to pay” to influence final pricing . The researchers note that even discrete choice methods, while having a “high degree of realism,” require “substantial expertise” and “substantial time and budgets” and that they “know of no studies that have validated the survey-based price sensitivity methods approach”, a limitation they extend to all stated-preference methods. In B2B software, where your respondents are often trained procurement professionals who know exactly why you’re asking, strategic distortion is the norm, not the exception. Game-theoretic research proves this formally: buyers who know their responses inform pricing will deliberately choose suboptimal configurations to avoid revealing high willingness-to-pay.

Conjoint can’t capture value the buyer hasn’t experienced. A conjoint study asks respondents to trade off features and prices for a product they’re evaluating. But in B2B software, the most important value drivers are often things the buyer can’t evaluate in a survey, workflow improvements they haven’t seen, integrations they haven’t tested, efficiency gains that only emerge after months of organizational adoption. A conjoint study captures the buyer’s perception of value at the moment of the survey, anchored to their current experience with their current tools. It can’t measure willingness-to-pay for value that hasn’t been delivered yet.

Conjoint produces a snapshot, not a system. Even if a conjoint study perfectly captured every respondent’s true preferences at the moment of the study, those preferences change. The product ships new features quarterly. A competitor drops their price. The buyer’s business context shifts. The conjoint study that cost $150K and took three months to execute is calibrated to a market that no longer exists by the time the recommendations are implemented.

Consultancies that use conjoint claim in-market validation of their price recommendations, but this data is proprietary and self-reported. No independent peer-reviewed study has compared conjoint-derived pricing outcomes to continuous demand measurement outcomes for B2B software.

Some consultancies go further (using Bayesian hierarchical models (HB conjoint) that estimate individual-level preferences rather than population averages. The statistical modeling is more sophisticated, but the input data is identical: hypothetical choices made by respondents who face no consequences. A more precise model of hypothetical behavior is still a model of hypothetical behavior. The only validation study for HB conjoint in pricing used simulated data with known ground truth) not real purchase outcomes. When the researchers tested whether the model could recover what simulated buyers “actually” valued, it could. Whether it can do the same with real B2B software buyers who have organizational constraints, procurement processes, and strategic incentives to misrepresent their preferences has never been demonstrated.

The new pitch: AI-moderated surveys

A wave of platforms now sells acceleration: an AI moderator probes free-text answers in real time, and a conjoint or price-sensitivity study that once took weeks to field runs in hours. That compresses survey administration, and administration was never the problem. The respondent still faces no consequences, is still rarely the economic buyer, and still cannot price value they have not experienced in production. Some of the vendors selling these platforms concede elsewhere in their own catalogs that conjoint does not work well in enterprise software. Accelerating an instrument that measures the wrong thing delivers the wrong answer sooner.

The newest version of the pitch: simulate the respondents

It replaces the human respondents with an AI model: simulate a thousand buyers, re-run the panel every month, and the staleness and cost objections disappear. Benchmarks against matched human studies say otherwise. The simulated estimates sometimes landed in the right range, but they often missed by two to three times or pointed the wrong direction, and nothing in the output tells you which estimates are the bad ones without running the human study anyway. The simulated panels also collapsed real differences between buyer segments into a single population average. The same study re-run on a different model version produced estimates up to three times apart. A cheaper way to re-run a miscalibrated instrument is not a fix. It produces a fresher wrong number.

This isn’t an argument that conjoint is useless. It’s a more sophisticated instrument than price-sensitivity surveys, and it produces richer data about feature-price trade-offs. But it shares the same foundational weakness as every stated-preference method: it asks people to predict their own behavior in a consequence-free environment, captures a static snapshot, and presents the result as the answer to a question that the market itself should be answering continuously.

Did Conjoint Trade-Offs Predict How Your Buyers Actually Negotiated?

Conjoint surfaces stated trade-offs in controlled conditions — your closed deals reveal whether those trade-offs held under real procurement pressure. Describe what you received and a pricing expert will stress-test it against your transaction record.

Validate Your Architecture Book a working session

Why software makes the problem worse

Physical goods have a natural price anchor: material costs. A buyer evaluating a manufactured product has at least a rough sense of what the inputs cost, steel, labor, components. Even if they don’t know the exact figure, the physical nature of the product creates a reference frame.

Software has no such anchor.

Research on information goods pricing established that software exhibits fundamentally different economics: near-zero marginal reproduction costs and high fixed development costs. A customer buying enterprise software cannot reason backward from “what this costs to produce” to “what it should cost me”, because the answer to the first question is effectively zero per unit, and the answer to the second question depends entirely on the value it creates for their specific business.

This asymmetry is structural, not incidental. Recent research on digital goods pricing found that across 87 data-driven businesses, sellers consistently guide valuation while buyers cannot optimize purchasing decisions, because the information needed to assess value is controlled by the seller. A WTP survey asks the party with the least information to provide the most consequential number.

As I wrote in Why Continuous Monetization Is So Vital: “When a consumer decides to buy a new refrigerator, they have a pretty complete idea of its value. While certain features and designs can affect that value, the buyer can make an easy price comparison because of the product’s functional parity with other offerings, generally speaking, an icemaker is an icemaker. That’s not the case in software, where customer perceptions of value can miss the subtle distinctions that create huge differences between seemingly similar products.”

There is minimal functional parity in software. An email capability is not an email capability. So when a price-sensitivity surveys survey asks a software buyer “at what price would you consider this product too expensive?”, the buyer has no anchor. They’re not estimating, they’re guessing. And that guess is shaped by whatever reference points happen to be in their head: the last software they bought, a competitor’s listed price, a number their CFO mentioned in a budget meeting.

In B2B specifically, value perception is strongly affected by organizational software experiences. Differential value is often concentrated in innovations prospective buyers haven’t yet experienced, innovations that might enable currently unimaginable operational improvements. Buyers can’t tell you in a survey what they’d be willing to pay for something they haven’t yet seen working in their environment.

And even when buyers can evaluate what’s in front of them, the method still breaks. The complexity problem has a measurable signature. When researchers used survey-based price sensitivity methods to price athletic footwear (a consumer product with just five value dimensions) the method produced an internal contradiction for the most feature-rich product: the “optimal price” fell below the price floor where buyers would question quality. The model told you to price lower than the point where your own customers would distrust the product. If price-sensitivity surveys breaks down for a $150 running shoe, the odds of it producing coherent results for enterprise software (with dozens of value dimensions, multi-stakeholder evaluation, and no physical reference frame) are vanishingly small.

The WTP number you get from the survey is a composite of arbitrary anchors, filtered through hypothetical bias. It’s precise-looking data built on sand.

AI compounds the problem

If customers couldn’t estimate what they’d pay for software before AI, they certainly can’t now.

Consider what happens when a B2B software company wraps generative AI into its product. A single user action (asking a question, generating a report, running an analysis) might trigger three to five model calls behind the scenes: one to decompose the problem, one to run the inference, one to verify the answer, one to summarize the result. The user sees one click. The vendor’s infrastructure sees a chain of API calls, each with variable cost depending on the model, prompt length, and orchestration logic. Ask the customer what they’d be willing to pay for that feature, and they’re estimating a price for a process they can’t see, powered by costs they can’t comprehend, at a volume they can’t predict.

But the opacity isn’t just on the buyer’s side. Many vendors don’t know what their AI costs are per customer. Inference costs are comingled with compute, storage, and networking on aggregate infrastructure bills. A vendor might know their AWS bill increased 40% after launching an AI feature, but they can’t attribute how much of that is Customer A’s usage versus Customer B’s. They’re pricing a product whose cost-to-serve they can’t isolate.

And both sides are pricing against a moving target. Model costs drop 50% every six to twelve months. New models change the cost-per-quality curve. Orchestration patterns evolve as engineering teams optimize. A WTP survey conducted today is calibrated against a cost structure that will be obsolete by the time the recommended pricing is implemented.

The bundling trap. Facing this uncertainty, many B2B software companies made a rational short-term decision: they included AI in existing packages without charging separately, reducing sales friction and driving adoption. This solved the go-to-market problem. But it created a monetization trap, customers now expect AI as included, and separating it later triggers the endowment effect we described above. They’d be “losing” something they already have.

The token passthrough problem. Other vendors went the opposite direction, passing AI costs directly to customers as tokens, credits, or consumption charges. This might seem like transparent pricing, but it fundamentally undermines value capture. When you price at the token level, you’re no longer selling “we help you optimize your pricing strategy.” You’re selling “our tokens cost $0.003 versus Google’s $0.002.” You’ve commoditized your own product by framing it at the infrastructure layer instead of the business outcome layer. As I’ve written about GenAI pricing challenges, application-layer software companies should sell baked cakes, not itemize the ingredients and baking time. That is not a case against usage-based pricing: a variable consumption unit can be exactly right, as long as it tracks what the buyer reads as their own value rather than your internal cost components. The further you sit from the raw compute and model layers, where those cost units genuinely are the value, the more your price should follow what the customer gets, not what it cost you to make.

Enterprise buyers confirm this. When researchers asked healthcare decision-makers (sophisticated B2B buyers managing multi-million dollar technology budgets) about pricing diagnostic AI, 76% rejected models based on technical usage metrics like tokens or API calls, calling them “economically and operationally misaligned” with how they plan and budget . They preferred hybrid models with predictable base fees and variable components tied to business outcomes, not technical consumption. If buyers can’t even conceptualize the unit of measurement, a WTP survey for that unit is meaningless.

Credits make it worse. Some vendors added yet another abstraction, converting dollars to tokens to credits to “AI units.” Each layer moves the pricing further from anything a buyer can reason about. A WTP survey asking “what would you pay for 500 AI credits per month?” is measuring a fiction filtered through four layers of abstraction. The buyer doesn’t know what a credit represents, how many credits their workflow consumes, or how that consumption will change as the vendor’s AI evolves.

The buyer can’t state WTP because they don’t know their usage. The vendor can’t cost-plus price because they don’t know their per-customer costs. Both numbers are changing quarterly. And the tools most consultancies use to measure willingness-to-pay (surveys that produce a single number at a single point in time) are pricing against a reality that doesn’t hold still long enough to be measured.

That is not an AI-era novelty. Consumer plan-choice research found buyers systematically overestimate their own peak usage, and the overestimate is what pushes them into bigger plans than they need. In one program, roughly two-thirds of the buyers who chose the larger plan would have paid less on the simpler one they passed over. The magnitudes are consumer-side; the estimation failure is not. AI did not create that problem. It multiplied the units the buyer now has to guess in.

Continuous monetization isn’t just better here, it’s the only approach that can adapt as both the costs and the value change in real time.

Most B2B buyers don’t decide on price

Even if WTP surveys produced accurate numbers, there’s a more fundamental problem: for most B2B buyers, price isn’t the primary decision factor.

And there’s a quieter problem hiding underneath it: you are usually measuring the wrong people. Say you survey two of three directors, and they answer carefully and truthfully. The trouble is the owner makes the call, and you never asked the owner. You didn’t undersample. You measured the wrong people, precisely. Peer-reviewed organizational-buying research is blunt about this: job titles are poor proxies for who actually holds influence in a purchase.

More responses from non-deciders don’t fix that. They just make you confidently wrong. A bigger sample of people who don’t control the decision tightens the number around a person who was never going to sign.

We have watched this happen. A company that came to us had set its prices off a willingness-to-pay survey, launched, and shortly thereafter its new-customer growth rate had fallen by roughly a third. The response was to do more of the same: a larger sample, the survey re-run, the price points re-cut. Refining the method was easier than disowning it, since the people who had championed the survey were the ones who would have had to admit it was wrong. Growth fell by about another third. The bigger sample did not correct the original mistake. It produced a more confident version of it, a tighter number around a price the market kept walking away from. The timing made it worse, coming just after a major financing round, when a growth stumble is most visible and least easily forgiven. The cost was not only the growth. A pricing call that misses twice after a raise is the kind of decision that, at the executive level, tends to follow the people who made it.

A controlled study of B2B software buyers found the lowest price won the least interest, not the most. Shown a bargain price, buyers turned skeptical, and fewer asked to talk to sales. Interest peaked at a higher, credible price, then fell again when the price climbed too far. The lesson is uncomfortable for anyone tempted to compete on price: in B2B, a price that looks too cheap doesn’t read as a deal. It reads as risk, and it pushes away the buyers you most want.

This wasn’t a lab full of casual consumers. These were real B2B software buyers with budgets and procurement authority, and they still weren’t optimizing for the lowest number. They were reading the price as a signal of whether the product was worth their time. A price that looked too good raised as much doubt as one that looked too high.

A WTP survey treats every respondent as a price optimizer. It assumes the number they give you is the number that determines whether they buy. The B2B SaaS evidence shows the opposite: even among professional software buyers, the relationship between price and purchase intent is non-linear, and the “right” price is the one that falls within a cognitive acceptance range, not the lowest one the buyer will tolerate.

The endowment effect: buyers undervalue what they don’t yet have

There’s a third bias working against WTP accuracy, one that pulls in the opposite direction from hypothetical bias.

Research on new product adoption documented the endowment effect in purchasing decisions. Across multiple experiments, people demanded approximately 3x more compensation to give up a product they already possessed than they were willing to pay to acquire the same product.

In the most cited experiment, sellers consistently demanded multiples of what buyers were willing to pay for identical goods.

For B2B software, this means:

Buyers systematically undervalue new software relative to its actual worth to their business
They simultaneously overvalue whatever they’re currently using, even if it’s clearly inferior
WTP surveys capture this deflated number and present it as the ceiling

The net effect: hypothetical bias inflates the WTP number upward, while the endowment effect deflates the true valuation downward. A WTP survey gives you an inflated estimate of an already-deflated valuation. The error doesn’t cancel, it compounds the confusion.

Surveys don’t account for salespeople’s willingness to discount

Here’s where the academic evidence meets the reality of B2B software sales.

A software company can conduct survey after survey on their prospects’ willingness to pay. But as I’ve written in Forbes, these surveys don’t take into account salespeople’s willingness to discount. It’s a false assumption that the customer primarily drives willingness to pay. Salespeople have significant sway over how much money customers are willing to put forward, and software executives often underestimate that sway.

The academic research confirms this at scale. A study of thousands of enterprise software deals found that salespeople gave away 4.3 percentage points in excess discounts (translating to 6.6% of total vendor revenue) primarily to manipulate deal timing for their commission benefit. Seventy-four percent of deals closed on the last day of the quarter, and deals closing late in quarter averaged 35-37% discounts versus 30% mid-quarter.

This isn’t a few rogue reps. It’s structural. Non-linear quarterly commission plans create massive compensation differences for identical deals depending on timing. The same deal can earn a salesperson an order of magnitude more in one quarter than another. The rational response is exactly what the data shows: offer deeper discounts to pull deals into the quarter where they help your comp plan the most.

The buyer-behavior research reinforces this. When the same buyers saw a price well below the credible range, they pushed back rather than leaned in, generating more objections than support and fewer requests to talk to sales. A bargain price didn’t raise purchase intent, it lowered it. If discounting triggers the same rejection as overpricing, and salespeople are structurally incentivized to discount, then the WTP survey’s recommended price is being undermined by the very sales process it’s supposed to inform.

We see this play out constantly. A prospect might be willing to pay $30,000 a year for workflow automation software, until they learn that the salesperson they’re working with tends to give out discounts of up to 20%. Suddenly, the prospect won’t want to pay $30,000. And that discount doesn’t disappear come renewal time. It becomes the new baseline from where the renewal negotiation ensues, and that customer will likely argue for another discount on top of what they’ve already been given.

The compounding failure looks like this:

A WTP survey produces an inflated number (hypothetical bias)
The consultancy recommends a price based on that inflated number
The sales team discounts from the already-questionable ceiling, driven by comp structures
The discounted price becomes the new anchor for renewals
The customer’s perception of fair value has now been permanently depressed

Each step introduces error in the same direction, downward from true value. The resulting price has only an accidental relationship to the software’s actual worth.

B2B software pricing in practice

The B2B pricing research is damning enough. But the practitioner evidence is worse.

When Philip Ideson of the Art of Procurement podcast and I discussed this, he mentioned seeing price differences of as much as 10x for the same software product sold to different buyers. I’ve personally seen even more. Our own transaction datasets across B2B software companies reveal the full spectrum (the same product or bundle of products discounted anywhere from 100% (given away free) to surcharges of several multiples above list price. When the same software is sold to similar buyers at prices that vary by orders of magnitude, no WTP survey can fix what’s broken) because the survey assumes a world of rational price-setting that doesn’t exist.

Consider the scenario from my Forbes article on What Software Companies Get Wrong About Pricing: Tanya and Tessa are trying to purchase the same software solution for their companies. Their companies are the same size, in different industries, with nearly identical use cases. After they each undergo the sales process with a different salesperson and sign on the dotted line, they meet for coffee, and Tanya learns that Tessa paid half the price that she did.

When customers feel cheated like this, they warn other prospective buyers. The company gains a reputation as one that requires wheeling and dealing just to get a fair price. And a WTP survey conducted in this environment is measuring the aftermath of dysfunction, not some fundamental truth about customer valuations.

I’ve lived this myself. When I co-founded a software company, our initial willingness-to-pay research pointed to a $2,500 price point. We launched there. Over time, as we understood how customers actually used the product and what outcomes it drove, the price moved to $100,000 (and eventually to $500,000. The WTP study didn’t just underestimate demand. It undercut our value by orders of magnitude, because the buyers we surveyed couldn’t articulate the value of something they hadn’t yet experienced at scale. No survey redesign would have fixed that) the insight only came from observing real usage and real purchasing behavior over time. The academic research focuses on overestimation because that’s what controlled experiments on low-cost products reveal. The overstatement is not uniform. It runs larger for complex, high-consideration products than for simple, low-cost ones, and enterprise software is squarely in the high-consideration category. A survey-derived ceiling for B2B software is therefore more likely to sit above what buyers will actually pay than below it, and by a wider margin than the consumer average implies. The bias runs in one direction for the products that matter here: surveys read high, and they read highest where the purchase is complex.

This pattern repeats across the B2B software companies we work with. Executives consistently tell us the same thing: their customers genuinely don’t know what they’d pay. The tools most consultancies use to answer that question are measuring a fiction that gets further distorted by every sales interaction.

Why a snapshot can never replace a system

Even if you could fix every bias in WTP measurement (eliminate hypothetical inflation, correct for the endowment effect, account for buyer type, discipline every salesperson) you’d still have a fundamental problem: a survey produces a snapshot. It tells you what buyers said they’d pay at one moment in time, for one configuration of your product, in one competitive context.

Software doesn’t work that way. Your product changes every sprint. Your market shifts every quarter.

There is a myth about pricing that software companies often let themselves believe: that once a customer purchases their software, the ongoing monthly price entitles the customer to all the new features and value going forward. The problem is that improvements and additions the company makes to its software can drastically alter the cost-value calculus. Over time, the amount the customer pays drifts out of sync with the value the software delivers.

A WTP survey taken in January is stale by March. The consultancy that delivered it has moved on to their next engagement. And you’re making pricing decisions based on data that was biased when it was collected and is now outdated on top of it.

Google’s internal research makes this point directly. Their data documents that Google runs approximately 10,000 pricing and product experiments per year, with about 1,000 running concurrently. The reason: observational data alone “cannot establish causality for pricing decisions, leading to incorrect demand curve estimation.” External factors create false correlations between price and demand that no survey can untangle.

Without the data to support it, pricing decisions can only come from narrative, assumptions, and anecdotes, a much more precarious and risky foundation.

Your Market Is Moving. Is Your Pricing Architecture Moving With It?

A survey is a snapshot frozen at one moment; your deal behavior is a continuous signal. LevelSetter models licensing, packaging, and pricing against your live transaction history so your architecture evolves as buyer behavior shifts.

Talk to a Pricing Expert See LevelSetter, the platform your engagement runs on

The alternative: continuous monetization

The research (and what we’ve observed across hundreds of billions of dollars in B2B software transactions since 1982) supports a fundamentally different approach. One that replaces the “measure WTP, set price, move on” model with a system of ongoing demand measurement, structured experimentation, and iterative price improvement.

We call this continuous monetization. Top-performing software companies set off on a constant hunt to get paid fairly for their intellectual property. They know that the never-ending development of a software product means that, over time, some elements will over-deliver value while others may be underused. So they persistently monitor the value drivers and adapt their pricing.

Some firms now pair an initial conjoint study with ongoing price optimization. But if the optimization framework will override the study’s recommendations within a quarter (and it usually does, as the product and market evolve) the question is whether the initial study justified its cost and the months it delayed action.

Here’s how it works and why the evidence supports each component.

Measure demand response, not stated preference

The core shift: instead of asking customers what they’d pay, observe what they actually do when pricing changes.

We can’t rely on what customers say to estimate software value. We need to understand what they do, including purchases, usage patterns, upgrade behavior, and churn signals.

We recommend performing a series of controlled incremental price changes to understand and push the boundaries of willingness to pay for customer groups with similar usage and derived value characteristics. This is an empirical, reliable, risk-mitigating method of conducting demand elasticity analysis, firmly rooted in how customers behave, not just what they say.

This method also helps you harmonize pricing with the rate of new value creation from your product roadmap. In B2B software, especially if subscription-based, customer value perception contains a futures element. Customers expect a stream of increasing value. So another way to think about this: you’re taking pricing validation steps in a journey that always keeps you on the safe side of the razor’s edge of being paid fairly for your software’s value.

Research on price experimentation algorithms confirms the approach can achieve 96.9–99.1% of optimal revenue across different demand distributions. The key insight: structured experimentation that incorporates economic theory dramatically outperforms both naive price testing and static survey-based pricing. Recent theoretical work reinforces this, researchers proved mathematically that subscription usage data alone can identify willingness to pay without requiring price experiments, because variation in how customers use a product at a fixed price reveals their underlying valuations. The data you need is already flowing through your systems.

Software companies that know precisely how their customers behave (what they buy, the features they use and don’t use, the amount they pay, how quickly and often they upgrade, downgrade, or churn) are able to analyze pricing opportunities and risks. And adapt accordingly. The closer to real-time they have that information, the more quickly they can adapt and the sooner they can benefit from the value they are providing.

We often see how this “truth on the ground” data surprises operators, challenging or outright invalidating assumptions that helped build the original pricing model.

Start below value, iterate upward

Most pricing consultancies deliver a “right price” and their engagement ends. Continuous monetization starts differently: set an initial price based on competitive positioning and value hypothesis, then systematically increase it while measuring demand response.

The research on price compression documents a pattern where B2B companies systematically underprice their most valuable offerings and overprice their least differentiated ones, compressing the price range toward the middle. It is quite common for software executives to underestimate the variety and amounts of discounts that flow through their company’s book of business. They believe they know what is happening, until a scatter chart plotting every deal by discounts and other attributes tells them a very different story.

The fix isn’t a single repricing event. It’s a process of expanding the range outward: raising prices on high-value products where demand proves inelastic, and restructuring low-value products where prices meet resistance.

The buyer-behavior research supports this directly. The acceptable price band was wide, and buyers responded positively across the whole of it, not just at the bottom. Inside that band the company held real pricing power. A company that starts at the low end and iterates upward, measuring how demand actually responds at each step, captures that headroom without ever fielding a survey question. And as the product improves, the top of the acceptable band moves up with it.

Investors reward this approach. Research on SaaS transitions shows that stock prices increase an average of 2.2% when SaaS is offered alongside perpetual licensing, but companies that force-convert existing products to SaaS-only see a 3.5% value decrease. The market wants to see pricing that evolves with the product, not pricing that gets set once and locked in.

Let versioning reveal WTP through behavior

Versioning research provides the theoretical foundation: pricing structures that let buyers self-select based on their own assessment of value reveal more about willingness-to-pay than any survey. The pricing structure itself becomes the WTP measurement instrument.

Good-better-best tiering is one version of this, but it’s far from the only one. The research validates multiple packaging architectures depending on how customers derive value:

Tiered plans (good-better-best), work when customer segments align cleanly with feature tiers
Platform with optional modules, work when customers have diverse needs and value different capabilities. Research on multi-component pricing models shows these provide “superior outcomes” for B2B services by aligning price with actual usage patterns
Customized bundling (“pick N modules from the catalog”), research shows 13-21% profit improvement over forcing everyone into the same bundle, specifically when customer preferences vary across the product portfolio
Usage-based or hybrid models, work when value scales with consumption and customers prefer paying for what they use

The specific architecture matters less than the principle: let buyers reveal their willingness-to-pay through their purchasing choices (which configuration they pick, which modules they adopt, when they upgrade) rather than asking them to guess it in a survey.

This is fundamentally more accurate than stated-preference research for three reasons:

Real consequences. When a buyer chooses a configuration, they’re making a decision with their company’s money. There’s no hypothetical bias because the transaction is real.
Continuous signal. Every renewal, every expansion, every add-on, every upgrade decision is a new data point. You’re not relying on a single snapshot, you’re building a longitudinal dataset of revealed preferences.
Segment-level precision. Packaging structures naturally separate buyer types. Price-sensitive buyers self-select to basic configurations. Value-driven buyers choose comprehensive packages. The structure does the segmentation that a survey tries to do artificially.

But the right packaging architecture (and the right boundaries within it) must be discovered through market feedback, not set once by a consultancy.

Structure the price around how customers decide

The buyer-behavior research shows this directly. Buyers didn’t just weigh the amount, they weighed what the price signaled. The same product priced cut-rate drew suspicion (“what’s wrong with it?”), while a credible price read as a serious tool that invited engagement. The price structure shaped the buyer’s frame, and that frame decided whether they took the next step.

The structure keeps working after the signature. In one consumer subscription market, customers who switched to capped plans used about fifteen percent more than the price change alone predicted. The study ruled out self-selection: living inside the structure produced the extra usage, not heavy users sorting into the plan. The finding is consumer-side, but the mechanism is structural. The architecture does not just frame the purchase decision. It changes what the customer does with the product, which changes the value there is to price.

For B2B software, how you package and present pricing tiers has more impact on perceived value than the dollar amounts themselves. What we call licensing metrics (per seat vs. per usage, monthly vs. annual, bundled vs. modular) shape buyer behavior more than any specific number on a survey. Research on information goods bundling confirms this: bundling software modules reduces variance in customer valuations, making pricing more predictable and revenue more capturable than selling components individually.

The implication: a pricing page that clearly communicates what’s included at each tier, how pricing scales, and why may outperform one that’s been “optimized” by a WTP study, because the structure itself communicates value that no survey question can capture.

Build the pricing feedback loop into operations

Continuous monetization isn’t a one-time project, it’s an operational capability.

Always-on demand sensing. The right infrastructure captures conversion rates, upgrade/downgrade flows, churn by tier, and expansion revenue by cohort continuously, not when someone remembers to pull a report. AI-augmented pattern recognition can surface pricing signals that human analysis would miss: a subtle shift in tier mix that precedes churn, a packaging configuration that consistently accelerates deal velocity, or a discount pattern that correlates with lower lifetime value. Pricing data is constantly changing. Good data is not a snapshot of past history. It lives within your customers and your organization, and as it changes, so too must your pricing model.

Structured experiments. Test pricing changes on specific segments before rolling out broadly. Which segment carries the test, how large a change it can bear, and how long the read takes are design calls that depend on deal velocity and cohort structure, not defaults to copy. When the system is always collecting and analyzing deal data, the experiment isn’t a special event; it’s the normal operating rhythm.

Value-aligned iteration. Every major feature release is a pricing event. If you ship a capability that materially changes the value proposition for a segment, the price should reflect it. Not through a new WTP survey, through a structured increase with demand measurement. An always-on system detects when the value-price gap widens and flags the opportunity before it’s left on the table for quarters.

Longitudinal win/loss intelligence. For enterprise sales, the richest WTP data comes from the deals themselves, not as one-time post-mortems, but as a continuously enriched dataset. Not “what price did the customer say they wanted” but “at what price did they actually sign, what did they push back on, and what did they not push back on.” When AI is analyzing every quoting iteration, negotiation response, and contract term across your entire book of business over months and years, the pattern recognition compounds. What starts as 60 deals becomes a longitudinal intelligence layer that gets sharper with every transaction.

Why this works at enterprise scale

The most common objection to continuous monetization: “We close 60 enterprise deals a year. You can’t do statistically significant pricing experiments with 60 data points.”

This misunderstands what the data actually is.

Depth beats breadth. The issue with survey-based pricing was never sample size (it was data quality. Every response in a 2,000-person conjoint study is hypothetical. Every data point in a 60-deal transaction dataset is real. Depth of observation isn’t a concession from quantitative rigor) it’s an upgrade. Qualitative research methodology has established for decades that smaller samples with deep observational data produce more accurate behavioral findings than large-scale surveys with shallow, hypothetical responses. Anthropologists draw valid conclusions from 15-30 deep observations. The same principle applies to enterprise pricing: 60 deals where real money changed hands, real procurement teams pushed back, and real contracts were negotiated contain more signal about willingness-to-pay than 2,000 conjoint responses where nobody signed a check.

Over four decades of working exclusively with B2B software companies, we’ve found that controlled packaging and pricing iterations converge on accurate solutions rapidly, typically within two to three adjustment cycles. The accuracy comes not from the volume of data but from the quality: each iteration is tested against real buyer behavior with real budget consequences, and the feedback loop is measured in weeks, not the months a survey-based engagement requires.

It’s not 60 data points (it’s thousands. Each enterprise deal isn’t one observation. It’s a sequence of pricing decisions. Through our LevelSetter platform, we capture every quoting iteration, packaging configuration, and negotiation response via API) continuously, across the entire book of business. AI analysis runs against this growing dataset, identifying patterns that no quarterly review could surface. A single deal may generate 15-20 observed pricing interactions before close, the initial proposal, the counter, the packaging adjustment, the procurement pushback, the revised quote, the final terms.

Sixty active deals produce over a thousand real pricing responses, each made with budget authority, procurement oversight, and actual purchase intent. No survey produces data of this quality at any sample size, because no survey respondent faces the consequence of actually spending the money.

We see where deals break (not just whether they close. Continuous monetization doesn’t just measure outcomes. It maps the full negotiation arc. We see which packaging configuration opened the conversation, which pricing adjustment unlocked budget approval, and which counter-proposal killed the deal. This isn’t win/loss at the deal level) it’s win/loss at each decision point within the deal. Pattern-matching across these sequences reveals pricing dynamics that no point-in-time survey can detect.

Did the deal stall at the first quote? Packaging problem. Did it survive three rounds but fail at procurement? Price structure problem. Did a packaging change in round two unlock the budget? That’s a signal about what the buyer actually values, and it’s a signal you can only see in real transaction data.

The data compounds (surveys depreciate. A conjoint study is a depreciating asset: accurate (at best) on the day it’s delivered, stale within a quarter, irrelevant within a year. An always-on system built on transaction data is an appreciating asset. Every deal) win or loss (enriches the dataset. AI-driven pattern recognition gets more precise with every negotiation, every renewal, every expansion. The longer the system runs, the sharper its insights become) the opposite of a survey that begins aging the moment it’s completed.

This is the fundamental difference. Our approach is grounded in how deals are actually forged, the proposals, counteroffers, packaging iterations, and contract terms that produce real revenue. Survey-based methods are grounded in hypothetical worlds where respondents face no consequences for their answers.

The argument in summary

The traditional pricing consultancy model runs a WTP survey, delivers a “right price,” and exits. The evidence says this approach has five structural problems:

The survey is biased. Hypothetical methods overstate WTP, the more so for complex products. The buyer faces no consequences for overstating.
The buyer doesn’t know the answer. Software has no cost anchor. Buyers can’t tell you in a survey what they’d be willing to pay for innovations they haven’t yet experienced.
The question is wrong for most buyers. Peer-reviewed B2B SaaS research shows the lowest price generates as much buyer resistance as the highest. Buyers aren’t optimizing for price, they’re evaluating whether the price signals appropriate value.
The result is static. A snapshot of stated preferences in Q1 has no mechanism to adapt to product changes, competitive shifts, or segment evolution in Q2-Q4.
The implementation erodes it further. Sales teams discount from the already-questionable ceiling (costing up to 6.6% of revenue through comp-structure-driven gaming) and those discounts compound into permanent baseline reductions at renewal.

Continuous monetization addresses each of these by replacing hypothetical measurement with behavioral observation, one-time surveys with ongoing demand sensing, static recommendations with iterative improvement, and consultant-delivered pricing with an operational capability that compounds over time.

The research doesn’t say “never run a survey.” price-sensitivity surveys and conjoint have a place as exploratory inputs, directional data to inform an initial hypothesis. But they are a starting point, not an answer. And a pricing consultancy that treats them as the answer is selling certainty built on a foundation the research itself has demonstrated is unreliable.

Product management can use continuous monetization to better understand what customers value and will pay for, as opposed to what they tell a survey they’d be willing to pay, which is proven to be error-prone and essentially meaningless.

Everything described above (the biases in stated-preference research, the need for always-on demand sensing, the value of capturing every negotiation iteration, the compounding intelligence that gets sharper with every deal) pointed to a gap in the market. There was no platform built specifically for B2B software companies to operationalize continuous monetization.

So we built one. LevelSetter is our AI-augmented pricing platform for B2B software. It connects via API to your quoting and deal management systems, captures the full arc of every pricing interaction, and applies pattern recognition across your entire book of business, longitudinally, not as a one-time study.

It doesn’t replace pricing judgment. It replaces the fiction that a survey can tell you what your customers will pay. Instead of hypothetical willingness-to-pay data, you get observed willingness-to-pay behavior, from real deals, with real money, analyzed continuously.

If the arguments in this article resonate, LevelSetter is where they become operational.

For the current operating framework on this discipline, see continuous monetization as a B2B software pricing discipline.

If your pricing still leans on survey-stated willingness to pay, the fix starts with seeing what your own deals already prove. Describe what you are seeing and a pricing expert will reply.

FAQs

What is willingness to pay (WTP) in software pricing?

Willingness to pay is the maximum price a buyer would accept before walking away from a purchase. In B2B software, it’s commonly measured through survey methods like survey-based price sensitivity methods or conjoint analysis, but research shows these hypothetical methods overestimate actual WTP by up to 2x because respondents face no consequences for their answers.

Why don’t WTP surveys work for B2B software?

Three fundamental problems: (1) Software has no cost anchor (unlike physical goods, buyers can’t reason from material costs to a fair price. (2) Most B2B buyers don’t decide primarily on price) peer-reviewed research shows the lowest price generates as much resistance as the highest. (3) Surveys produce a static snapshot that’s outdated within a quarter as the product and market evolve.

Is conjoint analysis better than survey-based price sensitivity methods for software pricing?

Conjoint is more sophisticated, it infers WTP from trade-off choices rather than direct questions. But it’s still classified as an Indirect/Hypothetical method. Respondents still face no real consequences, B2B procurement professionals can still game their responses, and the study still produces a point-in-time snapshot. No independent peer-reviewed study has validated conjoint-derived pricing outcomes against actual B2B software market performance.

How does AI make software pricing harder to measure?

AI amplifies every WTP measurement problem. Buyers can’t estimate usage because a single action may trigger multiple hidden model calls. Vendors can’t isolate per-customer AI costs from aggregate infrastructure bills. Token and credit-based pricing adds layers of abstraction that disconnect price from business value. And model costs change so rapidly that any survey-based price is calibrated against a cost structure that will be obsolete within months.

What is continuous monetization?

Continuous monetization replaces one-time pricing studies with an ongoing system of demand measurement, structured experimentation, and iterative price improvement. Instead of asking customers what they’d pay, it observes what they actually do, through real deal negotiations, packaging iterations, and win/loss patterns. The approach uses AI-augmented pattern recognition across transaction data to surface pricing insights that compound with every deal.

How can B2B software companies measure willingness to pay without surveys?

Through controlled incremental price changes across customer groups with similar usage and value characteristics, measuring actual demand response rather than hypothetical stated preferences. Versioning and packaging structures let buyers self-select into tiers, revealing WTP through purchasing behavior. Deal-level win/loss analysis (tracking every quoting iteration and negotiation response) produces a demand curve built from real transactions, not survey responses.

Do WTP surveys have any role in B2B software pricing?

Even if you accept the argument that surveys provide directional data, you’re still better off skipping them and going straight to measuring real demand. Iterating on pricing with actual buyers who face real consequences (through deal negotiations, packaging tests, and win/loss analysis) produces a demand curve that is both more accurate and immediately actionable. A survey delays that learning by months and anchors your team to a number the research has shown is unreliable. The time and budget spent on a WTP study is better invested in building the continuous measurement system you’ll need regardless.

What is the survey-based price sensitivity methods Price Sensitivity Meter?

The survey-based price sensitivity methods Price Sensitivity Meter is a survey method that asks four price questions (at what price is a product too cheap, cheap, expensive, and too expensive. Plotting the cumulative responses produces intersections that define an “acceptable price range” and an “optimal pricing point.” It was designed for consumer goods in the 1970s and remains widely used in pricing consulting. The core problem for B2B software: the method assumes respondents can articulate meaningful price thresholds for products with no cost anchor and no functional parity) assumptions that break down for enterprise software.

How much do WTP surveys overestimate willingness to pay?

Controlled research comparing survey-based price sensitivity methods against an incentive-aligned benchmark (where respondents had to back stated prices with real money) found hypothetical methods produced WTP estimates nearly twice as high as what people actually paid. The gap was statistically decisive. Additional research found survey-based price sensitivity methods produced price ranges spanning 264% of the base price from a small respondent sample, with the “optimal price” varying by more than $3 across bootstrapped samples. The overestimation is structural, not occasional.

How reliable is survey-based price sensitivity methods pricing data?

Less reliable than most practitioners assume. One peer-reviewed study found 23% of survey-based price sensitivity methods responses were internally inconsistent (respondents gave “too expensive” thresholds lower than their “expensive” thresholds, a logical contradiction. A separate study found the method failed to detect product differentiation effects that choice experiments captured from the same respondents. The errors are also bidirectional: some studies find survey-based price sensitivity methods overestimates willingness to pay, others find it underestimates by 2–5%. The method doesn’t reliably err in one direction) it unreliably errs in every direction.

What is hypothetical bias in pricing research?

Hypothetical bias is the systematic tendency for survey respondents to overstate what they would pay when no actual transaction follows. In a consequence-free survey, the cognitive cost of saying “$100” instead of “$60” is zero, the respondent is imagining a purchase, not making one. This affects every stated-preference method including survey-based price sensitivity methods, contingent valuation, and conjoint analysis. The bias is not a minor calibration issue; controlled studies show it can double the measured willingness to pay relative to incentive-aligned benchmarks where real money changes hands.

How does the endowment effect affect software pricing?

The endowment effect means people demand roughly three times more to give up something they already have than they would pay to acquire it. In software pricing, this creates a double distortion: buyers systematically undervalue new software relative to its actual worth while overvaluing whatever they currently use (even when it’s clearly inferior. A WTP survey captures this already-deflated valuation, then hypothetical bias inflates the number upward. The result is an inflated estimate of an already-deflated value) the errors compound confusion rather than cancelling out.

How do sales discounts undermine WTP-based pricing?

A study of 2,938 enterprise software deals averaging $850K each found salespeople gave away 4.3 percentage points in excess discounts (translating to 6.6% of total vendor revenue) primarily to pull deal timing for commission benefit. Seventy-four percent of deals closed on the last day of the quarter, with late-quarter discounts running 35–37% versus 30% mid-quarter. The compounding failure: a WTP survey produces an inflated price, the consultancy recommends based on it, the sales team discounts from that already-questionable ceiling, and the discounted price becomes the new anchor for renewals. Each step erodes value in the same direction.

What is the difference between stated and revealed preference pricing?

Stated preference methods (survey-based price sensitivity methods, conjoint, contingent valuation) ask buyers to predict what they would pay in a hypothetical scenario (no money changes hands and no procurement process constrains the answer. Revealed preference methods observe what buyers actually do) transaction data, A/B price testing, auction outcomes, deal-level win/loss patterns. The distinction matters because every stated-preference method is susceptible to hypothetical bias and strategic behavior, while revealed preference data reflects real decisions with real consequences. Continuous monetization is a revealed-preference approach applied to B2B software pricing.

This article is supported by over four decades of SPP’s direct experience with B2B software pricing and peer-reviewed research on pricing methodology, hypothetical bias, and buyer behavior, including controlled comparisons of stated-preference methods against incentive-aligned benchmarks, studies of response consistency in price sensitivity measurement, and empirical evidence on buyer cognition in enterprise software contexts. We update this analysis as new research is published. For related reading, see Why 90% of B2B Value-Based Pricing Is a Hoax, GenAI Pricing Challenges, and Why Continuous Monetization Is So Vital. Last updated April 2026.

Terms in this article

Conjoint Analysis: An indirect research method where respondents choose between product configurations with different attributes and prices. Choices are assumed to reveal preferences, allowing statistical estimation of willingness-to-pay for each attribute. The dominant form used in pricing research today is discrete-choice conjoint. The method was developed in consumer-goods market research for buyers who can hold and compare physical products with concrete reference points (price, brand, package size). SPP does not use conjoint analysis in any form. Discrete-choice conjoint is an indirect, hypothetical method — the respondent never pays, so the "realism" is in task design, not consequences. Hypothetical bias is well-documented in the peer-reviewed pricing-research literature, and peer-reviewed field experiments have also documented buyers deliberately selecting suboptimal configurations to avoid revealing high willingness-to-pay. The limits compound in B2B software: stable parameter estimates require hundreds to thousands of respondents, while B2B software buyer panels are typically tens to low hundreds; B2B purchases are multi-stakeholder consensus events, not individual choices on a card; software attributes are intangible until experienced, so stated preference for unfamiliar abstract attributes is unstable; actual prices are negotiated, with no slot in the conjoint task for the negotiation overlay; and conjoint doesn't model salesperson willingness to discount — buyers approach the choice task knowing the displayed price won't be the price they actually pay, which skews their preferences toward higher-priced configurations in ways the model treats as random noise rather than as the systematic anticipation it actually is. SPP works from transaction data, won/lost patterns, customer-group analysis, and structured commercial dialogue — direct/observed methods rather than indirect/hypothetical ones.
Continuous Monetization: SPP's term (coined 2014) for pricing as an ongoing operating discipline iterated on the product's own cadence — not a project repeated every few years.; In our work: Under episodic pricing, today's 70% discount becomes tomorrow's 92% — we see this in transaction data across decades of engagements. Continuous monetization is the discipline that keeps net-price aligned.
Contract Term: The length of time of the contract which may include multiple subscription terms
Customer Groups: Clusters of customers who derive value from a product in similar ways, regardless of company size or industry vertical. Distinct from traditional customer segments (small/medium/large, by vertical) which don't predict how customers actually use the product.
Discounting: The practice of reducing list price during a deal. Discounting splits into two buckets that often get conflated in practice. Structured discounting is incentives programmed into the pricing model itself: published volume schedules, edition step-up incentives, multi-year commitment discounts, non-profit/edu adjustments, partner-channel margins, and any other formal rule the rep applies without per-deal approval. Unstructured discounting is discretionary concessions granted on a per-deal basis, typically gated by approval thresholds held by sales leadership. The two are designed to work together: structured discounting handles the predictable cases the pricing architecture has accounted for, and unstructured discounting addresses the genuinely novel exceptions. In B2B software, this breaks down when reps use unstructured discretion to paper over missing structured rules. If the company has never formalized a non-profit discount but reps keep closing non-profit accounts, each rep accounts for the gap differently — one gives 15%, another gives 30%, another packages around it. The result reads as inconsistent rep behavior, but the root cause is architectural absence: a published rule would have produced uniform handling. Every quarter the unstructured layer grows to compensate for what the structured layer never built, the list-to-net spread widens, list-price increases get absorbed in rep discretion, and the pricing architecture exists only on paper. SPP's preferred alternative is Margin-Calibrated Discounting — engineering the structured layer (the smooth net-price function) against profitability targets so reps face fewer cases where they must improvise. The discipline question isn't "how do we eliminate discounting?" — it's "which discounts belong in the published architecture, and which genuinely need to be discretionary?"
Good-Better-Best: Classic 3-tier packaging model where access to more features/capabilities are available as you move up in tiers.
Hypothetical Bias: The systematic tendency for survey respondents to overstate what they would pay for a product when no real purchase is required. The gap between what people say they'll pay and what they actually pay when real money is on the table. Peer-reviewed pricing-research findings have measured this directly: hypothetical methods overstate WTP relative to incentive-aligned methods where respondents must back stated prices with real money, by about a fifth on average across consumer studies and more for complex products, with single low-cost experiments finding gaps as large as twice. Respondents in a survey face no consequences for overstating — the cognitive cost of saying "$100" instead of "$60" is zero. The number they give you isn't what they'd pay; it's what they can imagine paying in a low-stakes thought experiment. The B2B software gap is amplified by salesperson willingness to discount — buyers approach negotiations expecting discounting, procurement teams come prepared with other customers' net prices and discount terms, and reps trade margin for closeable deals, so the realized price routinely lands below the survey ceiling regardless of what stated WTP measured. SPP does not use survey-based WTP to set prices. We measure real demand through transaction data — what buyers actually do, not what they say they'd do.
Incentive-Aligned Methods: Research methods where respondents face real financial consequences for their stated preferences. The gold standard is the Becker-DeGroot-Marschak (BDM) mechanism, where respondents state their maximum price and then must actually purchase at a randomly drawn price if it's at or below their stated WTP. Incentive-aligned methods produce accurate WTP estimates because over-bidding or under-bidding can't help the respondent — peer-reviewed pricing-research findings consistently show BDM-derived WTP clusters tightly around the true value, while hypothetical methods overstate it, by about a fifth on average and more for complex products, with much wider variance. The problem: BDM is impractical in most B2B commercial settings. You can't ask an enterprise buyer to commit to purchasing software at a random price. SPP relies on transaction data instead — the market itself is the incentive-aligned mechanism. Every closed deal is a revealed-preference data point.
Packaging Model: The decision about how all forms of intellectual property — services, software features, and insights — are grouped into a given product offering. The packaging model covers the composition of editions (what capabilities are included at each level), the design of add-ons and modules (what sits outside the editions and can be purchased separately), capability allocation (which capabilities move between editions as buyers step up — what the industry commonly calls "feature gating"), and the service and insight components (implementation, enablement, support, advisory work, data products, benchmarks, intelligence) that accompany the software. The packaging model is one of the three strategic decisions in SPP's pricing architecture (licensing model + packaging model + pricing model). In the license agreement, packaging decisions are typically codified in the services-and-scope section — which features, modules, services, and insights are included in the edition the customer has licensed.
Pricing Model: The rulebook that computes a rational net price on every invoice — list prices, discount governance, volume breaks, renewal mechanics. NOT the metric the price attaches to (that lives in the licensing model decision).; In our work: Most "broken pricing model" calls turn out to be broken pricing architecture — when licensing, packaging, and pricing decisions don't compose, chaotic discounting fills the gap deal by deal. We see this in nearly every diagnostic engagement.
Revealed Preference: Observing what buyers actually do — market transaction data and auction results — rather than asking what they say they'd do. The opposite of hypothetical/stated preference methods. (A/B price testing is also a revealed-preference method but is not recommended in B2B, where buyers compare notes and visible price variation invites gaming.)
Subscription: A payment wrapper where the customer pays a recurring amount — monthly, quarterly, or annually — for ongoing access. Subscription is a manner of payment, not a pricing model. It wraps around whatever metric was chosen in the licensing model. The metric is the decision; the subscription is how the invoice arrives.
Tiered Pricing: A pricing technique originating in manufacturing, where the per-unit price changes in steps based on quantity purchased, structured as a series of volume tranches (e.g., $10/user for 1–50, $8/user for 51–200). The "tiers" here are price bands on the same product, not different packages with different capabilities. Tranches can be defined in units or dollars. In software, tiered pricing is structurally suboptimal and a source of revenue leakage: the step changes at tranche boundaries create cliff-edge negotiation behavior (customers gaming the volume threshold, reps round-tripping through the next tier just to access the next discount), and the price held flat between thresholds means the vendor leaves margin on the table at every commitment that isn't exactly at a tier boundary. SPP's preferred alternative is a smooth pricing surface tuned against margin targets — the technique we call Margin-Calibrated Discounting.
Usage-Based Pricing: A pricing approach where the customer pays based on consumption of a defined metric. The term is an umbrella covering fundamentally different approaches — from per-API-call billing to committed usage agreements to outcome-based pricing — which makes it nearly useless as a descriptor without specifying the metric.
Value-based Pricing: An emergent phenomenon, not a technique you apply. Value-based pricing cannot be commanded into existence — it emerges naturally when the pricing architecture is right: the licensing model captures the right metric, the offering structure reflects how different customer groups use the product, and price setting is related to value-in-use. When all three layers are aligned and the sales culture supports them, customers pay prices that reflect the value they receive. Most implementations skip the first two layers and jump straight to price setting, which is just cost-plus with a narrative. Charging the most each customer will bear isn't value-based pricing — it's situational pricing dressed up with a better name.

Newsletter Signup

Join Now For Free Tips On Optimizing Your Pricing?

Looking for profitable growth?

Book A Demo

Related Resources

Ready for profitable growth?

Hit the ground running and learn how to fix your pricing.

Book A Demo

Why Willingness-to-Pay Surveys Fail B2B Software Companies

Chris Mele

Why Willingness-to-Pay Surveys Fail B2B Software Companies

What “willingness to pay” actually measures

The overestimation problem: surveys inflate WTP by up to 2x

“But we use conjoint, not price-sensitivity surveys”

The new pitch: AI-moderated surveys

The newest version of the pitch: simulate the respondents

Why software makes the problem worse

AI compounds the problem

Most B2B buyers don’t decide on price

The endowment effect: buyers undervalue what they don’t yet have

Surveys don’t account for salespeople’s willingness to discount

B2B software pricing in practice

Why a snapshot can never replace a system

The alternative: continuous monetization

Measure demand response, not stated preference

Start below value, iterate upward

Let versioning reveal WTP through behavior

Structure the price around how customers decide

Build the pricing feedback loop into operations

Why this works at enterprise scale

The argument in summary

FAQs

What is willingness to pay (WTP) in software pricing?

Why don’t WTP surveys work for B2B software?

Is conjoint analysis better than survey-based price sensitivity methods for software pricing?

How does AI make software pricing harder to measure?

What is continuous monetization?

How can B2B software companies measure willingness to pay without surveys?

Do WTP surveys have any role in B2B software pricing?

What is the survey-based price sensitivity methods Price Sensitivity Meter?

How much do WTP surveys overestimate willingness to pay?

How reliable is survey-based price sensitivity methods pricing data?

What is hypothetical bias in pricing research?

How does the endowment effect affect software pricing?

How do sales discounts undermine WTP-based pricing?

What is the difference between stated and revealed preference pricing?

Join Now For Free Tips On Optimizing Your Pricing?

Ready for profitable growth?