Talk to an Expert

April 7, 2026 |

Why Willingness-to-Pay Surveys Fail B2B Software Companies

Author

Why Willingness-to-Pay Surveys Fail B2B Software Companies

TL;DR WTP surveys overstate what buyers will pay, the more so for complex products, can’t capture value buyers haven’t experienced, and produce static snapshots that are stale within a quarter. AI pricing compounds every one of these problems. The alternative (continuous monetization) measures real demand through transaction data, not hypothetical responses.


Most pricing consultancies start a software pricing engagement the same way: survey your customers, run a survey-based price sensitivity methods or conjoint analysis, and use the results to set your price. It’s clean, it’s quantitative, and it’s been the default playbook for decades.

There’s one problem. The research shows it doesn’t work the way people think it does.

Not because the methods are obscure, survey-based price sensitivity methods and conjoint are the standard tools in pricing research and have been for decades. The problem is more fundamental: they were designed for a world of physical goods and rational price evaluators, and B2B software is neither. The number a WTP survey produces is shaped by biases that the survey itself cannot detect.

I’ve argued before that 90% of value-based pricing and selling in B2B software is a hoax (that widely used methods borrowed from B2C markets don’t work in B2B, where products and usage are more complicated, making value harder to compare and estimate. Below is the academic evidence behind that claim) and what we’ve seen work instead after 40 years of pricing B2B software.

What “willingness to pay” actually measures

Willingness-to-pay is the maximum price a buyer would accept before walking away from a purchase. In theory, if you know every customer’s WTP, you can set prices that maximize revenue, charging each segment as close to their ceiling as possible.

The challenge is measuring it. There are three broad approaches:

Hypothetical survey methods ask buyers to state what they’d pay. The most common are:

  • survey-based price sensitivity methods Price Sensitivity Meter (price-sensitivity surveys), four questions that identify prices the respondent considers “too cheap,” “cheap,” “expensive,” and “too expensive.” Plot the cumulative responses and the intersections define an acceptable price range and an optimal pricing point.
  • Contingent Valuation (CV), a direct question: “What is the maximum you would pay for this product?”
  • Conjoint Analysis, an indirect method where respondents rank product configurations with different features and prices. Statistical modeling infers the implied WTP for each attribute.

Incentive-aligned methods put real money on the table:

  • incentive-aligned auction methods mechanism (incentive-aligned methods), respondents state their maximum price, then a random price is drawn. If the random price is at or below their stated WTP, they must actually buy at that price. If it’s above, they can’t buy. Because over-bidding or under-bidding can’t help them, respondents have an incentive to reveal their true WTP. For familiar, frequently-priced consumer goods with clear reference points, these are the most accurate methods we have. The accuracy falls off for complex, unfamiliar products buyers struggle to value, the B2B software case, and even at their best they price one person, not the group that buys.

Revealed preference methods observe what buyers actually do, market transaction data, A/B price testing, auction results.

The distinction matters because the first category (hypothetical methods) is what the vast majority of pricing consultancies use, and it’s where the measurement breaks down.

To be clear about where the tools earn their reputation: for a familiar consumer product with abundant reference points, a coffee maker, a streaming plan, a pair of running shoes, these methods work well. The buyer has bought the category before, knows the going rate, and one person decides. Two things go wrong in B2B software, and each is enough on its own.

The first is that the decision is collective. Every individual elicitation method, survey or incentive-aligned auction alike, measures one person’s willingness to pay, and almost no B2B purchase is one person’s decision. A mid-market founder decides with operations and finance. An enterprise committee evaluates a seven-figure platform. The number you want belongs to the group, and you only ever measured individuals.

To get from individual answers to a group price you stack three bets, and you check none of them. First, that each person stated their true value. Second, that you can recombine those values into the group’s value. Third, that the rule you used to combine them, sum or average, is how the group actually decides. Three guesses stacked up, presented as one number.

The middle bet is the one almost nobody examines. Averaging only works when everyone is reading the same number with noise. Two people on a buying committee are not noisy reads of one price. They are two different prices, held for two different reasons. Average two different things and you invent a number nobody holds.

A peer-reviewed field experiment on joint household purchases showed exactly this. Individual stated willingness to pay diverged sharply from the couple’s actual joint decision, and the partner with the stronger preference was over-ruled by the one who controlled the money. The joint choice was not an average of the two values. It was the outcome of a bargaining game, settled by who held the budget, not by the arithmetic of who wanted it more.

That study is about a household, not a software deal, and the numbers don’t transfer. The structure does: the moment you combine individual answers, you are smuggling in a model of how the group decides. Sum them, average them, or let the decider take all, and each rule yields a different willingness to pay from the same responses. Nothing in the survey data tells you which rule the group will use.

The second problem is that B2B software gives the buyer almost nothing to anchor on. The methods that work for the coffee maker work because the category is familiar and frequently priced. B2B software is the opposite at every size, from a small team’s first purchase to an enterprise rollout: it is often novel, bought rarely or once, high-stakes against the budget, and has no established price for the specific value it creates. Peer-reviewed measurement work finds the overestimation problem is markedly worse for exactly these complex, hard-to-assess products than for familiar ones. We unpack why the software case is the hardest of all below.

Collective buying and missing reference points are independent. Either one breaks the survey on its own; B2B software has both. So even a flawless auction misses the buying unit, and even a single decider would be guessing without a reference price. The most reliable signal of what a buying unit will pay is the one place willingness to pay is revealed under real stakes: the win rates by price and the discount patterns of the deals you actually closed.

The overestimation problem: surveys inflate WTP by up to 2x

Across the broader research, the overstatement averages about a fifth and runs larger for complex, high-consideration products; the sharpest single result comes from a controlled low-cost test, where the gap reached nearly twice. The first empirical comparison of price-sensitivity surveys against an incentive-aligned method asked consumers to value the same product using three methods: contingent valuation, incentive-aligned methods, and survey-based price sensitivity methods, in a controlled field experiment where respondents had to back their stated price with real money.

The results were stark.

When researchers simply ask people what they would pay, the answers scatter all over the map and bear little relation to what those same people actually pay when it is time to buy. People are guessing. The answers only tighten and start tracking real purchases when the research makes them put actual money on the line. That distinction is the whole point: the accurate version is not a sharper survey, it is people buying. A survey never has money at stake, so it never earns that discipline. This is why asking inflates the number.

Hypothetical methods produced WTP estimates nearly twice as high as what people actually paid when real money was on the table.

Why? The researchers attribute it to hypothetical bias, respondents in a survey face no consequences for overstating what they’d pay. When there’s no transaction at the end, the cognitive cost of saying “$100” instead of “$60” is zero. The number they give you isn’t what they’d pay. It’s what they can imagine paying in a low-stakes thought experiment.

The author’s own conclusion: price-sensitivity surveys “yields biased results because of its hypothetical nature and its focus on minimum customer resistance.” One study did find that price-sensitivity surveys’s intersection-based optimal price happened to land near an incentive-aligned benchmark, but that was for 36-cent chocolates, with the author attributing the result to two biases coincidentally cancelling and calling for validation on “more expensive and industrial products” that has never materialized.

The volatility problem runs deeper than bias direction. Even a statistician working to improve the survey-based price sensitivity methods method documented acceptable price ranges spanning multiples of the base price from a small respondent sample, with the “optimal price” bouncing between $9.50 and $12.90 across bootstrapped samples. A separate head-to-head comparison found survey-based price sensitivity methods failed to detect product differentiation effects that choice experiments captured (and 23% of price-sensitivity surveys responses contained logical contradictions, where respondents’ “too expensive” threshold was lower than their “expensive” threshold. The errors aren’t even consistently directional: some studies find overestimation, others find underestimation of 2–5% versus choice experiments. survey-based price sensitivity methods doesn’t reliably err in one direction) it unreliably errs in every direction.

Controlled research with real B2B software buyers points the same way: buyers don’t carry a single willingness-to-pay number, they carry a range. The acceptable band for the product was wide, and how buyers responded shifted sharply depending on where the price landed inside it. A survey that hands you one “optimal price” is collapsing that whole band of buyer psychology into a single figure.

Is Your Pricing Built on Inflated Survey Data?

We’ll stress-test survey-derived price recommendations against your actual transaction behavior to catch overestimation before market contact.

“But we use conjoint, not price-sensitivity surveys”

This is the most common objection from sophisticated pricing consultancies. They’ll concede that survey-based price sensitivity methods is a blunt instrument, then argue that their choice-based conjoint analysis is fundamentally different, that by forcing respondents to make trade-offs between realistic product configurations rather than stating a price directly, conjoint sidesteps the hypothetical bias problem.

It doesn’t. Here’s why.

Conjoint is still hypothetical. The standard WTP method classification places discrete-choice analysis (the dominant form of conjoint used in pricing research today) in the Indirect / Hypothetical quadrant.

 Stated Preference (Hypothetical)Stated Preference (Incentive-Aligned)Revealed Preference
Directsurvey-based price sensitivity methods, Contingent Valuationincentive-aligned methods mechanismMarket data analysis
IndirectConjoint / Discrete-ChoiceExperiments, Auctions

Source: WTP measurement method classification from peer-reviewed pricing methodology research. survey-based price sensitivity methods and conjoint (the two most common methods in B2B software pricing) both sit in the hypothetical column. Not incentive-aligned. Not revealed preference. Hypothetical. The respondent is choosing between product configurations on a screen, not signing a purchase order. The cognitive environment is identical to any other survey: no money changes hands, no procurement committee reviews the decision, no implementation risk is evaluated. The trade-off is more realistic than “what’s the maximum you’d pay?” but the fundamental problem remains, the respondent faces zero consequences for their stated choices.

Conjoint triggers strategic behavior in B2B. Research on pharmaceutical pricing methods found that direct questioning leads to “bargaining behaviour” where respondents “systematically understate willingness to pay” to influence final pricing . The researchers note that even discrete choice methods, while having a “high degree of realism,” require “substantial expertise” and “substantial time and budgets” and that they “know of no studies that have validated the survey-based price sensitivity methods approach”, a limitation they extend to all stated-preference methods. In B2B software, where your respondents are often trained procurement professionals who know exactly why you’re asking, strategic distortion is the norm, not the exception. Game-theoretic research proves this formally: buyers who know their responses inform pricing will deliberately choose suboptimal configurations to avoid revealing high willingness-to-pay.

Conjoint can’t capture value the buyer hasn’t experienced. A conjoint study asks respondents to trade off features and prices for a product they’re evaluating. But in B2B software, the most important value drivers are often things the buyer can’t evaluate in a survey, workflow improvements they haven’t seen, integrations they haven’t tested, efficiency gains that only emerge after months of organizational adoption. A conjoint study captures the buyer’s perception of value at the moment of the survey, anchored to their current experience with their current tools. It can’t measure willingness-to-pay for value that hasn’t been delivered yet.

Conjoint produces a snapshot, not a system. Even if a conjoint study perfectly captured every respondent’s true preferences at the moment of the study, those preferences change. The product ships new features quarterly. A competitor drops their price. The buyer’s business context shifts. The conjoint study that cost $150K and took three months to execute is calibrated to a market that no longer exists by the time the recommendations are implemented.

Consultancies that use conjoint claim in-market validation of their price recommendations, but this data is proprietary and self-reported. No independent peer-reviewed study has compared conjoint-derived pricing outcomes to continuous demand measurement outcomes for B2B software.

Some consultancies go further (using Bayesian hierarchical models (HB conjoint) that estimate individual-level preferences rather than population averages. The statistical modeling is more sophisticated, but the input data is identical: hypothetical choices made by respondents who face no consequences. A more precise model of hypothetical behavior is still a model of hypothetical behavior. The only validation study for HB conjoint in pricing used simulated data with known ground truth) not real purchase outcomes. When the researchers tested whether the model could recover what simulated buyers “actually” valued, it could. Whether it can do the same with real B2B software buyers who have organizational constraints, procurement processes, and strategic incentives to misrepresent their preferences has never been demonstrated.

This isn’t an argument that conjoint is useless. It’s a more sophisticated instrument than price-sensitivity surveys, and it produces richer data about feature-price trade-offs. But it shares the same foundational weakness as every stated-preference method: it asks people to predict their own behavior in a consequence-free environment, captures a static snapshot, and presents the result as the answer to a question that the market itself should be answering continuously.

Why software makes the problem worse

Physical goods have a natural price anchor: material costs. A buyer evaluating a manufactured product has at least a rough sense of what the inputs cost, steel, labor, components. Even if they don’t know the exact figure, the physical nature of the product creates a reference frame.

Software has no such anchor.

Research on information goods pricing established that software exhibits fundamentally different economics: near-zero marginal reproduction costs and high fixed development costs. A customer buying enterprise software cannot reason backward from “what this costs to produce” to “what it should cost me”, because the answer to the first question is effectively zero per unit, and the answer to the second question depends entirely on the value it creates for their specific business.

This asymmetry is structural, not incidental. Recent research on digital goods pricing found that across 87 data-driven businesses, sellers consistently guide valuation while buyers cannot optimize purchasing decisions, because the information needed to assess value is controlled by the seller. A WTP survey asks the party with the least information to provide the most consequential number.

As I wrote in Why Continuous Monetization Is So Vital: “When a consumer decides to buy a new refrigerator, they have a pretty complete idea of its value. While certain features and designs can affect that value, the buyer can make an easy price comparison because of the product’s functional parity with other offerings, generally speaking, an icemaker is an icemaker. That’s not the case in software, where customer perceptions of value can miss the subtle distinctions that create huge differences between seemingly similar products.”

There is minimal functional parity in software. An email capability is not an email capability. So when a price-sensitivity surveys survey asks a software buyer “at what price would you consider this product too expensive?”, the buyer has no anchor. They’re not estimating, they’re guessing. And that guess is shaped by whatever reference points happen to be in their head: the last software they bought, a competitor’s listed price, a number their CFO mentioned in a budget meeting.

In B2B specifically, value perception is strongly affected by organizational software experiences. Differential value is often concentrated in innovations prospective buyers haven’t yet experienced, innovations that might enable currently unimaginable operational improvements. Buyers can’t tell you in a survey what they’d be willing to pay for something they haven’t yet seen working in their environment.

And even when buyers can evaluate what’s in front of them, the method still breaks. The complexity problem has a measurable signature. When researchers used survey-based price sensitivity methods to price athletic footwear (a consumer product with just five value dimensions) the method produced an internal contradiction for the most feature-rich product: the “optimal price” fell below the price floor where buyers would question quality. The model told you to price lower than the point where your own customers would distrust the product. If price-sensitivity surveys breaks down for a $150 running shoe, the odds of it producing coherent results for enterprise software (with dozens of value dimensions, multi-stakeholder evaluation, and no physical reference frame) are vanishingly small.

The WTP number you get from the survey is a composite of arbitrary anchors, filtered through hypothetical bias. It’s precise-looking data built on sand.

AI compounds the problem

If customers couldn’t estimate what they’d pay for software before AI, they certainly can’t now.

Consider what happens when a B2B software company wraps generative AI into its product. A single user action (asking a question, generating a report, running an analysis) might trigger three to five model calls behind the scenes: one to decompose the problem, one to run the inference, one to verify the answer, one to summarize the result. The user sees one click. The vendor’s infrastructure sees a chain of API calls, each with variable cost depending on the model, prompt length, and orchestration logic. Ask the customer what they’d be willing to pay for that feature, and they’re estimating a price for a process they can’t see, powered by costs they can’t comprehend, at a volume they can’t predict.

But the opacity isn’t just on the buyer’s side. Many vendors don’t know what their AI costs are per customer. Inference costs are comingled with compute, storage, and networking on aggregate infrastructure bills. A vendor might know their AWS bill increased 40% after launching an AI feature, but they can’t attribute how much of that is Customer A’s usage versus Customer B’s. They’re pricing a product whose cost-to-serve they can’t isolate.

And both sides are pricing against a moving target. Model costs drop 50% every six to twelve months. New models change the cost-per-quality curve. Orchestration patterns evolve as engineering teams optimize. A WTP survey conducted today is calibrated against a cost structure that will be obsolete by the time the recommended pricing is implemented.

The bundling trap. Facing this uncertainty, many B2B software companies made a rational short-term decision: they included AI in existing packages without charging separately, reducing sales friction and driving adoption. This solved the go-to-market problem. But it created a monetization trap, customers now expect AI as included, and separating it later triggers the endowment effect we described above. They’d be “losing” something they already have.

The token passthrough problem. Other vendors went the opposite direction, passing AI costs directly to customers as tokens, credits, or consumption charges. This might seem like transparent pricing, but it fundamentally undermines value capture. When you price at the token level, you’re no longer selling “we help you optimize your pricing strategy.” You’re selling “our tokens cost $0.003 versus Google’s $0.002.” You’ve commoditized your own product by framing it at the infrastructure layer instead of the business outcome layer. As I’ve written about GenAI pricing challenges, application-layer software companies should sell baked cakes, not itemize the ingredients and baking time. That is not a case against usage-based pricing: a variable consumption unit can be exactly right, as long as it tracks what the buyer reads as their own value rather than your internal cost components. The further you sit from the raw compute and model layers, where those cost units genuinely are the value, the more your price should follow what the customer gets, not what it cost you to make.

Enterprise buyers confirm this. When researchers asked healthcare decision-makers (sophisticated B2B buyers managing multi-million dollar technology budgets) about pricing diagnostic AI, 76% rejected models based on technical usage metrics like tokens or API calls, calling them “economically and operationally misaligned” with how they plan and budget . They preferred hybrid models with predictable base fees and variable components tied to business outcomes, not technical consumption. If buyers can’t even conceptualize the unit of measurement, a WTP survey for that unit is meaningless.

Credits make it worse. Some vendors added yet another abstraction, converting dollars to tokens to credits to “AI units.” Each layer moves the pricing further from anything a buyer can reason about. A WTP survey asking “what would you pay for 500 AI credits per month?” is measuring a fiction filtered through four layers of abstraction. The buyer doesn’t know what a credit represents, how many credits their workflow consumes, or how that consumption will change as the vendor’s AI evolves.

The buyer can’t state WTP because they don’t know their usage. The vendor can’t cost-plus price because they don’t know their per-customer costs. Both numbers are changing quarterly. And the tools most consultancies use to measure willingness-to-pay (surveys that produce a single number at a single point in time) are pricing against a reality that doesn’t hold still long enough to be measured.

Continuous monetization isn’t just better here, it’s the only approach that can adapt as both the costs and the value change in real time.

Most B2B buyers don’t decide on price

Even if WTP surveys produced accurate numbers, there’s a more fundamental problem: for most B2B buyers, price isn’t the primary decision factor.

And there’s a quieter problem hiding underneath it: you are usually measuring the wrong people. Say you survey two of three directors, and they answer carefully and truthfully. The trouble is the owner makes the call, and you never asked the owner. You didn’t undersample. You measured the wrong people, precisely. Peer-reviewed organizational-buying research is blunt about this: job titles are poor proxies for who actually holds influence in a purchase.

More responses from non-deciders don’t fix that. They just make you confidently wrong. A bigger sample of people who don’t control the decision tightens the number around a person who was never going to sign.

We have watched this happen. A company that came to us had set its prices off a willingness-to-pay survey, launched, and shortly thereafter its new-customer growth rate had fallen by roughly a third. The response was to do more of the same: a larger sample, the survey re-run, the price points re-cut. Refining the method was easier than disowning it, since the people who had championed the survey were the ones who would have had to admit it was wrong. Growth fell by about another third. The bigger sample did not correct the original mistake. It produced a more confident version of it, a tighter number around a price the market kept walking away from. The timing made it worse, coming just after a major financing round, when a growth stumble is most visible and least easily forgiven. The cost was not only the growth. A pricing call that misses twice after a raise is the kind of decision that, at the executive level, tends to follow the people who made it.

A controlled study of B2B software buyers found the lowest price won the least interest, not the most. Shown a bargain price, buyers turned skeptical, and fewer asked to talk to sales. Interest peaked at a higher, credible price, then fell again when the price climbed too far. The lesson is uncomfortable for anyone tempted to compete on price: in B2B, a price that looks too cheap doesn’t read as a deal. It reads as risk, and it pushes away the buyers you most want.

This wasn’t a lab full of casual consumers. These were real B2B software buyers with budgets and procurement authority, and they still weren’t optimizing for the lowest number. They were reading the price as a signal of whether the product was worth their time. A price that looked too good raised as much doubt as one that looked too high.

A WTP survey treats every respondent as a price optimizer. It assumes the number they give you is the number that determines whether they buy. The B2B SaaS evidence shows the opposite: even among professional software buyers, the relationship between price and purchase intent is non-linear, and the “right” price is the one that falls within a cognitive acceptance range, not the lowest one the buyer will tolerate.

The endowment effect: buyers undervalue what they don’t yet have

There’s a third bias working against WTP accuracy, one that pulls in the opposite direction from hypothetical bias.

Research on new product adoption documented the endowment effect in purchasing decisions. Across multiple experiments, people demanded approximately 3x more compensation to give up a product they already possessed than they were willing to pay to acquire the same product.

In the most cited experiment, sellers consistently demanded multiples of what buyers were willing to pay for identical goods.

For B2B software, this means:

  • Buyers systematically undervalue new software relative to its actual worth to their business
  • They simultaneously overvalue whatever they’re currently using, even if it’s clearly inferior
  • WTP surveys capture this deflated number and present it as the ceiling

The net effect: hypothetical bias inflates the WTP number upward, while the endowment effect deflates the true valuation downward. A WTP survey gives you an inflated estimate of an already-deflated valuation. The error doesn’t cancel, it compounds the confusion.

Surveys don’t account for salespeople’s willingness to discount

Here’s where the academic evidence meets the reality of B2B software sales.

A software company can conduct survey after survey on their prospects’ willingness to pay. But as I’ve written in Forbes, these surveys don’t take into account salespeople’s willingness to discount. It’s a false assumption that the customer primarily drives willingness to pay. Salespeople have significant sway over how much money customers are willing to put forward, and software executives often underestimate that sway.

The academic research confirms this at scale. A study of thousands of enterprise software deals found that salespeople gave away 4.3 percentage points in excess discounts (translating to 6.6% of total vendor revenue) primarily to manipulate deal timing for their commission benefit. Seventy-four percent of deals closed on the last day of the quarter, and deals closing late in quarter averaged 35-37% discounts versus 30% mid-quarter.

This isn’t a few rogue reps. It’s structural. Non-linear quarterly commission plans create massive compensation differences for identical deals depending on timing. The same deal can earn a salesperson an order of magnitude more in one quarter than another. The rational response is exactly what the data shows: offer deeper discounts to pull deals into the quarter where they help your comp plan the most.

The buyer-behavior research reinforces this. When the same buyers saw a price well below the credible range, they pushed back rather than leaned in, generating more objections than support and fewer requests to talk to sales. A bargain price didn’t raise purchase intent, it lowered it. If discounting triggers the same rejection as overpricing, and salespeople are structurally incentivized to discount, then the WTP survey’s recommended price is being undermined by the very sales process it’s supposed to inform.

We see this play out constantly. A prospect might be willing to pay $30,000 a year for workflow automation software, until they learn that the salesperson they’re working with tends to give out discounts of up to 20%. Suddenly, the prospect won’t want to pay $30,000. And that discount doesn’t disappear come renewal time. It becomes the new baseline from where the renewal negotiation ensues, and that customer will likely argue for another discount on top of what they’ve already been given.

The compounding failure looks like this:

  1. A WTP survey produces an inflated number (hypothetical bias)
  2. The consultancy recommends a price based on that inflated number
  3. The sales team discounts from the already-questionable ceiling, driven by comp structures
  4. The discounted price becomes the new anchor for renewals
  5. The customer’s perception of fair value has now been permanently depressed

Each step introduces error in the same direction, downward from true value. The resulting price has only an accidental relationship to the software’s actual worth.

B2B software pricing in practice

The B2B pricing research is damning enough. But the practitioner evidence is worse.

When Philip Ideson of the Art of Procurement podcast and I discussed this, he mentioned seeing price differences of as much as 10x for the same software product sold to different buyers. I’ve personally seen even more. Our own transaction datasets across B2B software companies reveal the full spectrum (the same product or bundle of products discounted anywhere from 100% (given away free) to surcharges of several multiples above list price. When the same software is sold to similar buyers at prices that vary by orders of magnitude, no WTP survey can fix what’s broken) because the survey assumes a world of rational price-setting that doesn’t exist.

Consider the scenario from my Forbes article on What Software Companies Get Wrong About Pricing: Tanya and Tessa are trying to purchase the same software solution for their companies. Their companies are the same size, in different industries, with nearly identical use cases. After they each undergo the sales process with a different salesperson and sign on the dotted line, they meet for coffee, and Tanya learns that Tessa paid half the price that she did.

When customers feel cheated like this, they warn other prospective buyers. The company gains a reputation as one that requires wheeling and dealing just to get a fair price. And a WTP survey conducted in this environment is measuring the aftermath of dysfunction, not some fundamental truth about customer valuations.

I’ve lived this myself. When I co-founded a software company, our initial willingness-to-pay research pointed to a $2,500 price point. We launched there. Over time, as we understood how customers actually used the product and what outcomes it drove, the price moved to $100,000 (and eventually to $500,000. The WTP study didn’t just underestimate demand. It undercut our value by orders of magnitude, because the buyers we surveyed couldn’t articulate the value of something they hadn’t yet experienced at scale. No survey redesign would have fixed that) the insight only came from observing real usage and real purchasing behavior over time. The academic research focuses on overestimation because that’s what controlled experiments on low-cost products reveal. The overstatement is not uniform. It runs larger for complex, high-consideration products than for simple, low-cost ones, and enterprise software is squarely in the high-consideration category. A survey-derived ceiling for B2B software is therefore more likely to sit above what buyers will actually pay than below it, and by a wider margin than the consumer average implies. The bias runs in one direction for the products that matter here: surveys read high, and they read highest where the purchase is complex.

This pattern repeats across the B2B software companies we work with. Executives consistently tell us the same thing: their customers genuinely don’t know what they’d pay. The tools most consultancies use to answer that question are measuring a fiction that gets further distorted by every sales interaction.

Why a snapshot can never replace a system

Even if you could fix every bias in WTP measurement (eliminate hypothetical inflation, correct for the endowment effect, account for buyer type, discipline every salesperson) you’d still have a fundamental problem: a survey produces a snapshot. It tells you what buyers said they’d pay at one moment in time, for one configuration of your product, in one competitive context.

Software doesn’t work that way. Your product changes every sprint. Your market shifts every quarter.

There is a myth about pricing that software companies often let themselves believe: that once a customer purchases their software, the ongoing monthly price entitles the customer to all the new features and value going forward. The problem is that improvements and additions the company makes to its software can drastically alter the cost-value calculus. Over time, the amount the customer pays drifts out of sync with the value the software delivers.

A WTP survey taken in January is stale by March. The consultancy that delivered it has moved on to their next engagement. And you’re making pricing decisions based on data that was biased when it was collected and is now outdated on top of it.

Google’s internal research makes this point directly. Their data documents that Google runs approximately 10,000 pricing and product experiments per year, with about 1,000 running concurrently. The reason: observational data alone “cannot establish causality for pricing decisions, leading to incorrect demand curve estimation.” External factors create false correlations between price and demand that no survey can untangle.

Without the data to support it, pricing decisions can only come from narrative, assumptions, and anecdotes, a much more precarious and risky foundation.

Replace WTP Snapshots with Continuous Deal Analysis?

LevelSetter builds licensing, packaging, and pricing from your transaction patterns, then adapts as deal behavior shifts rather than freezing assumptions.

The alternative: continuous monetization

The research (and what we’ve observed across hundreds of billions of dollars in B2B software transactions since 1982) supports a fundamentally different approach. One that replaces the “measure WTP, set price, move on” model with a system of ongoing demand measurement, structured experimentation, and iterative price improvement.

We call this continuous monetization. Top-performing software companies set off on a constant hunt to get paid fairly for their intellectual property. They know that the never-ending development of a software product means that, over time, some elements will over-deliver value while others may be underused. So they persistently monitor the value drivers and adapt their pricing.

Some firms now pair an initial conjoint study with ongoing price optimization. But if the optimization framework will override the study’s recommendations within a quarter (and it usually does, as the product and market evolve) the question is whether the initial study justified its cost and the months it delayed action.

Here’s how it works and why the evidence supports each component.

Measure demand response, not stated preference

The core shift: instead of asking customers what they’d pay, observe what they actually do when pricing changes.

We can’t rely on what customers say to estimate software value. We need to understand what they do, including purchases, usage patterns, upgrade behavior, and churn signals.

We recommend performing a series of controlled incremental price changes to understand and push the boundaries of willingness to pay for customer groups with similar usage and derived value characteristics. This is an empirical, reliable, risk-mitigating method of conducting demand elasticity analysis, firmly rooted in how customers behave, not just what they say.

This method also helps you harmonize pricing with the rate of new value creation from your product roadmap. In B2B software, especially if subscription-based, customer value perception contains a futures element. Customers expect a stream of increasing value. So another way to think about this: you’re taking pricing validation steps in a journey that always keeps you on the safe side of the razor’s edge of being paid fairly for your software’s value.

Research on price experimentation algorithms confirms the approach can achieve 96.9–99.1% of optimal revenue across different demand distributions. The key insight: structured experimentation that incorporates economic theory dramatically outperforms both naive price testing and static survey-based pricing. Recent theoretical work reinforces this, researchers proved mathematically that subscription usage data alone can identify willingness to pay without requiring price experiments, because variation in how customers use a product at a fixed price reveals their underlying valuations. The data you need is already flowing through your systems.

Software companies that know precisely how their customers behave (what they buy, the features they use and don’t use, the amount they pay, how quickly and often they upgrade, downgrade, or churn) are able to analyze pricing opportunities and risks. And adapt accordingly. The closer to real-time they have that information, the more quickly they can adapt and the sooner they can benefit from the value they are providing.

We often see how this “truth on the ground” data surprises operators, challenging or outright invalidating assumptions that helped build the original pricing model.

Start below value, iterate upward

Most pricing consultancies deliver a “right price” and their engagement ends. Continuous monetization starts differently: set an initial price based on competitive positioning and value hypothesis, then systematically increase it while measuring demand response.

The research on price compression documents a pattern where B2B companies systematically underprice their most valuable offerings and overprice their least differentiated ones, compressing the price range toward the middle. It is quite common for software executives to underestimate the variety and amounts of discounts that flow through their company’s book of business. They believe they know what is happening, until a scatter chart plotting every deal by discounts and other attributes tells them a very different story.

The fix isn’t a single repricing event. It’s a process of expanding the range outward: raising prices on high-value products where demand proves inelastic, and restructuring low-value products where prices meet resistance.

The buyer-behavior research supports this directly. The acceptable price band was wide, and buyers responded positively across the whole of it, not just at the bottom. Inside that band the company held real pricing power. A company that starts at the low end and iterates upward, measuring how demand actually responds at each step, captures that headroom without ever fielding a survey question. And as the product improves, the top of the acceptable band moves up with it.

Investors reward this approach. Research on SaaS transitions shows that stock prices increase an average of 2.2% when SaaS is offered alongside perpetual licensing, but companies that force-convert existing products to SaaS-only see a 3.5% value decrease. The market wants to see pricing that evolves with the product, not pricing that gets set once and locked in.

Let versioning reveal WTP through behavior

Versioning research provides the theoretical foundation: pricing structures that let buyers self-select based on their own assessment of value reveal more about willingness-to-pay than any survey. The pricing structure itself becomes the WTP measurement instrument.

Good-better-best tiering is one version of this, but it’s far from the only one. The research validates multiple packaging architectures depending on how customers derive value:

  • Tiered plans (good-better-best), work when customer segments align cleanly with feature tiers
  • Platform with optional modules, work when customers have diverse needs and value different capabilities. Research on multi-component pricing models shows these provide “superior outcomes” for B2B services by aligning price with actual usage patterns
  • Customized bundling (“pick N modules from the catalog”), research shows 13-21% profit improvement over forcing everyone into the same bundle, specifically when customer preferences vary across the product portfolio
  • Usage-based or hybrid models, work when value scales with consumption and customers prefer paying for what they use

The specific architecture matters less than the principle: let buyers reveal their willingness-to-pay through their purchasing choices (which configuration they pick, which modules they adopt, when they upgrade) rather than asking them to guess it in a survey.

This is fundamentally more accurate than stated-preference research for three reasons:

  1. Real consequences. When a buyer chooses a configuration, they’re making a decision with their company’s money. There’s no hypothetical bias because the transaction is real.
  2. Continuous signal. Every renewal, every expansion, every add-on, every upgrade decision is a new data point. You’re not relying on a single snapshot, you’re building a longitudinal dataset of revealed preferences.
  3. Segment-level precision. Packaging structures naturally separate buyer types. Price-sensitive buyers self-select to basic configurations. Value-driven buyers choose comprehensive packages. The structure does the segmentation that a survey tries to do artificially.

But the right packaging architecture (and the right boundaries within it) must be discovered through market feedback, not set once by a consultancy.

Structure the price around how customers decide

The buyer-behavior research shows this directly. Buyers didn’t just weigh the amount, they weighed what the price signaled. The same product priced cut-rate drew suspicion (“what’s wrong with it?”), while a credible price read as a serious tool worth engaging. The price structure shaped the buyer’s frame, and that frame decided whether they took the next step.

For B2B software, how you package and present pricing tiers has more impact on perceived value than the dollar amounts themselves. What we call licensing metrics (per seat vs. per usage, monthly vs. annual, bundled vs. modular) shape buyer behavior more than any specific number on a survey. Research on information goods bundling confirms this: bundling software modules reduces variance in customer valuations, making pricing more predictable and revenue more capturable than selling components individually.

The implication: a pricing page that clearly communicates what’s included at each tier, how pricing scales, and why may outperform one that’s been “optimized” by a WTP study, because the structure itself communicates value that no survey question can capture.

Build the pricing feedback loop into operations

Continuous monetization isn’t a one-time project, it’s an operational capability.

Always-on demand sensing. The right infrastructure captures conversion rates, upgrade/downgrade flows, churn by tier, and expansion revenue by cohort continuously, not when someone remembers to pull a report. AI-augmented pattern recognition can surface pricing signals that human analysis would miss: a subtle shift in tier mix that precedes churn, a packaging configuration that consistently accelerates deal velocity, or a discount pattern that correlates with lower lifetime value. Pricing data is constantly changing. Good data is not a snapshot of past history. It lives within your customers and your organization, and as it changes, so too must your pricing model.

Structured experiments. Test pricing changes on specific segments before rolling out broadly. For B2B SaaS, this might mean testing a 10% price increase on new customers in one segment while holding existing customers constant (then measuring the conversion impact over 60-90 days. When the system is always collecting and analyzing deal data, the experiment isn’t a special event) it’s the normal operating rhythm.

Value-aligned iteration. Every major feature release is a pricing event. If you ship a capability that materially changes the value proposition for a segment, the price should reflect it. Not through a new WTP survey, through a structured increase with demand measurement. An always-on system detects when the value-price gap widens and flags the opportunity before it’s left on the table for quarters.

Longitudinal win/loss intelligence. For enterprise sales, the richest WTP data comes from the deals themselves, not as one-time post-mortems, but as a continuously enriched dataset. Not “what price did the customer say they wanted” but “at what price did they actually sign, what did they push back on, and what did they not push back on.” When AI is analyzing every quoting iteration, negotiation response, and contract term across your entire book of business over months and years, the pattern recognition compounds. What starts as 60 deals becomes a longitudinal intelligence layer that gets sharper with every transaction.

Why this works at enterprise scale

The most common objection to continuous monetization: “We close 60 enterprise deals a year. You can’t do statistically significant pricing experiments with 60 data points.”

This misunderstands what the data actually is.

Depth beats breadth. The issue with survey-based pricing was never sample size (it was data quality. Every response in a 2,000-person conjoint study is hypothetical. Every data point in a 60-deal transaction dataset is real. Depth of observation isn’t a concession from quantitative rigor) it’s an upgrade. Qualitative research methodology has established for decades that smaller samples with deep observational data produce more accurate behavioral findings than large-scale surveys with shallow, hypothetical responses. Anthropologists draw valid conclusions from 15-30 deep observations. The same principle applies to enterprise pricing: 60 deals where real money changed hands, real procurement teams pushed back, and real contracts were negotiated contain more signal about willingness-to-pay than 2,000 conjoint responses where nobody signed a check.

Over four decades of working exclusively with B2B software companies, we’ve found that controlled packaging and pricing iterations converge on accurate solutions rapidly, typically within two to three adjustment cycles. The accuracy comes not from the volume of data but from the quality: each iteration is tested against real buyer behavior with real budget consequences, and the feedback loop is measured in weeks, not the months a survey-based engagement requires.

It’s not 60 data points (it’s thousands. Each enterprise deal isn’t one observation. It’s a sequence of pricing decisions. Through our LevelSetter platform, we capture every quoting iteration, packaging configuration, and negotiation response via API) continuously, across the entire book of business. AI analysis runs against this growing dataset, identifying patterns that no quarterly review could surface. A single deal may generate 15-20 observed pricing interactions before close, the initial proposal, the counter, the packaging adjustment, the procurement pushback, the revised quote, the final terms.

Sixty active deals produce over a thousand real pricing responses, each made with budget authority, procurement oversight, and actual purchase intent. No survey produces data of this quality at any sample size, because no survey respondent faces the consequence of actually spending the money.

We see where deals break (not just whether they close. Continuous monetization doesn’t just measure outcomes. It maps the full negotiation arc. We see which packaging configuration opened the conversation, which pricing adjustment unlocked budget approval, and which counter-proposal killed the deal. This isn’t win/loss at the deal level) it’s win/loss at each decision point within the deal. Pattern-matching across these sequences reveals pricing dynamics that no point-in-time survey can detect.

Did the deal stall at the first quote? Packaging problem. Did it survive three rounds but fail at procurement? Price structure problem. Did a packaging change in round two unlock the budget? That’s a signal about what the buyer actually values, and it’s a signal you can only see in real transaction data.

The data compounds (surveys depreciate. A conjoint study is a depreciating asset: accurate (at best) on the day it’s delivered, stale within a quarter, irrelevant within a year. An always-on system built on transaction data is an appreciating asset. Every deal) win or loss (enriches the dataset. AI-driven pattern recognition gets more precise with every negotiation, every renewal, every expansion. The longer the system runs, the sharper its insights become) the opposite of a survey that begins aging the moment it’s completed.

This is the fundamental difference. Our approach is grounded in how deals are actually forged, the proposals, counteroffers, packaging iterations, and contract terms that produce real revenue. Survey-based methods are grounded in hypothetical worlds where respondents face no consequences for their answers.

The argument in summary

The traditional pricing consultancy model runs a WTP survey, delivers a “right price,” and exits. The evidence says this approach has five structural problems:

  1. The survey is biased. Hypothetical methods overstate WTP, the more so for complex products. The buyer faces no consequences for overstating.
  2. The buyer doesn’t know the answer. Software has no cost anchor. Buyers can’t tell you in a survey what they’d be willing to pay for innovations they haven’t yet experienced.
  3. The question is wrong for most buyers. Peer-reviewed B2B SaaS research shows the lowest price generates as much buyer resistance as the highest. Buyers aren’t optimizing for price, they’re evaluating whether the price signals appropriate value.
  4. The result is static. A snapshot of stated preferences in Q1 has no mechanism to adapt to product changes, competitive shifts, or segment evolution in Q2-Q4.
  5. The implementation erodes it further. Sales teams discount from the already-questionable ceiling (costing up to 6.6% of revenue through comp-structure-driven gaming) and those discounts compound into permanent baseline reductions at renewal.

Continuous monetization addresses each of these by replacing hypothetical measurement with behavioral observation, one-time surveys with ongoing demand sensing, static recommendations with iterative improvement, and consultant-delivered pricing with an operational capability that compounds over time.

The research doesn’t say “never run a survey.” price-sensitivity surveys and conjoint have a place as exploratory inputs, directional data to inform an initial hypothesis. But they are a starting point, not an answer. And a pricing consultancy that treats them as the answer is selling certainty built on a foundation the research itself has demonstrated is unreliable.

Product management can use continuous monetization to better understand what customers value and will pay for, as opposed to what they tell a survey they’d be willing to pay, which is proven to be error-prone and essentially meaningless.

Everything described above (the biases in stated-preference research, the need for always-on demand sensing, the value of capturing every negotiation iteration, the compounding intelligence that gets sharper with every deal) pointed to a gap in the market. There was no platform built specifically for B2B software companies to operationalize continuous monetization.

So we built one. LevelSetter is our AI-augmented pricing platform for B2B software. It connects via API to your quoting and deal management systems, captures the full arc of every pricing interaction, and applies pattern recognition across your entire book of business, longitudinally, not as a one-time study.

It doesn’t replace pricing judgment. It replaces the fiction that a survey can tell you what your customers will pay. Instead of hypothetical willingness-to-pay data, you get observed willingness-to-pay behavior, from real deals, with real money, analyzed continuously.

If the arguments in this article resonate, LevelSetter is where they become operational.


For the current operating framework on this discipline, see continuous monetization as a B2B software pricing discipline.

FAQs


This article is supported by over four decades of SPP’s direct experience with B2B software pricing and peer-reviewed research on pricing methodology, hypothetical bias, and buyer behavior, including controlled comparisons of stated-preference methods against incentive-aligned benchmarks, studies of response consistency in price sensitivity measurement, and empirical evidence on buyer cognition in enterprise software contexts. We update this analysis as new research is published. For related reading, see Why 90% of B2B Value-Based Pricing Is a Hoax, GenAI Pricing Challenges, and Why Continuous Monetization Is So Vital. Last updated April 2026.

Ready for profitable growth?

Hit the ground running and learn how to fix your pricing.