April 7, 2026 |

Why Willingness-to-Pay Surveys Fail B2B Software Companies

Author

Chris Mele

Why Willingness-to-Pay Surveys Fail B2B Software Companies

TL;DR WTP surveys overestimate by up to 2x, can’t capture value buyers haven’t experienced, and produce static snapshots that are stale within a quarter. AI pricing compounds every one of these problems. The alternative — continuous monetization — measures real demand through transaction data, not hypothetical responses.

Table Of Contents

Why Willingness-to-Pay Surveys Fail B2B Software Companies

Most pricing consultancies start a software pricing engagement the same way: survey your customers, run a Van Westendorp or conjoint analysis, and use the results to set your price. It’s clean, it’s quantitative, and it’s been the default playbook for decades.

There’s one problem. The research shows it doesn’t work the way people think it does.

Not because the methods are poorly designed — Van Westendorp and conjoint are both well-established tools with legitimate academic foundations. The problem is more fundamental: software is not a physical good, B2B buyers are not rational price evaluators, and the number a WTP survey produces is shaped by biases that the survey itself cannot detect.

I’ve argued before that 90% of value-based pricing and selling in B2B software is a hoax — that widely used methods borrowed from B2C markets don’t work in B2B, where products and usage are more complicated, making value harder to compare and estimate. Below is the academic evidence behind that claim — and what we’ve seen work instead after 40 years of pricing B2B software.

What “willingness to pay” actually measures

Willingness-to-pay is the maximum price a buyer would accept before walking away from a purchase. In theory, if you know every customer’s WTP, you can set prices that maximize revenue — charging each segment as close to their ceiling as possible.

The challenge is measuring it. There are three broad approaches:

Hypothetical survey methods ask buyers to state what they’d pay. The most common are:

Van Westendorp Price Sensitivity Meter (PSM) — four questions that identify prices the respondent considers “too cheap,” “cheap,” “expensive,” and “too expensive.” Plot the cumulative responses and the intersections define an acceptable price range and an optimal pricing point.
Contingent Valuation (CV) — a direct question: “What is the maximum you would pay for this product?”
Conjoint Analysis — an indirect method where respondents rank product configurations with different features and prices. Statistical modeling infers the implied WTP for each attribute.

Incentive-aligned methods put real money on the table:

Becker-DeGroot-Marschak mechanism (BDM) — respondents state their maximum price, then a random price is drawn. If the random price is at or below their stated WTP, they must actually buy at that price. If it’s above, they can’t buy. Because over-bidding or under-bidding can’t help them, respondents have an incentive to reveal their true WTP. Considered the gold standard for accuracy but impractical in most commercial settings.

Revealed preference methods observe what buyers actually do — market transaction data, A/B price testing, auction results.

The distinction matters because the first category — hypothetical methods — is what the vast majority of pricing consultancies use, and it’s where the measurement breaks down.

The overestimation problem: surveys inflate WTP by up to 2x

In 2016, Marcus Kunter published the first empirical comparison of PSM against an incentive-aligned method. The study compared WTP estimates across three methods — contingent valuation, BDM, and Van Westendorp PSM — for the same product with the same population in a controlled field experiment.

The results were stark:

Contingent valuation (the method PSM is based on) produced a mean WTP of $0.80 (SD: 0.68)
BDM (the incentive-aligned benchmark) produced a mean WTP of $0.41 (SD: 0.23)
The difference was statistically overwhelming — not a marginal finding, but a result so strong it leaves virtually no room for chance

Hypothetical methods produced WTP estimates nearly twice as high as what people actually paid when real money was on the table.

Why? Kunter attributes it to hypothetical bias — respondents in a survey face no consequences for overstating what they’d pay. When there’s no transaction at the end, the cognitive cost of saying “$100” instead of “$60” is zero. The number they give you isn’t what they’d pay. It’s what they can imagine paying in a low-stakes thought experiment.

The author’s own conclusion: PSM “yields biased results because of its hypothetical nature and its focus on minimum customer resistance.” One study did find that PSM’s intersection-based optimal price happened to land near an incentive-aligned benchmark — but that was for 36-cent chocolates, with the author attributing the result to two biases coincidentally cancelling and calling for validation on “more expensive and industrial products” that has never materialized (Kunter, 2016).

Meanwhile, a separate study of 112 IT professionals evaluating B2B SaaS pricing found that buyers don’t have a WTP number — they have a WTP range. The acceptable range for a sales force automation tool spanned $125-$200/user/month, and buyers’ cognitive responses shifted dramatically based on where the price fell within that range. A survey that produces a single “optimal price” is collapsing a $75 band of psychological responses into one number.

“But we use conjoint, not PSM”

This is the most common objection from sophisticated pricing consultancies. They’ll concede that Van Westendorp is a blunt instrument, then argue that their choice-based conjoint analysis is fundamentally different — that by forcing respondents to make trade-offs between realistic product configurations rather than stating a price directly, conjoint sidesteps the hypothetical bias problem.

It doesn’t. Here’s why.

Conjoint is still hypothetical. Kunter’s WTP method classification places discrete-choice analysis — the dominant form of conjoint used in pricing research today — in the Indirect / Hypothetical quadrant.

	Stated Preference (Hypothetical)	Stated Preference (Incentive-Aligned)	Revealed Preference
Direct	Van Westendorp PSM, Contingent Valuation	BDM mechanism	Market data analysis
Indirect	Conjoint / Discrete-Choice	—	Experiments, Auctions

Source: Kunter (2016), WTP measurement method classification. Van Westendorp and conjoint — the two most common methods in B2B software pricing — both sit in the hypothetical column. Not incentive-aligned. Not revealed preference. Hypothetical. The respondent is choosing between product configurations on a screen, not signing a purchase order. The cognitive environment is identical to any other survey: no money changes hands, no procurement committee reviews the decision, no implementation risk is evaluated. The trade-off is more realistic than “what’s the maximum you’d pay?” but the fundamental problem remains — the respondent faces zero consequences for their stated choices.

Conjoint triggers strategic behavior in B2B. Research on pharmaceutical pricing methods found that direct questioning leads to “bargaining behaviour” where respondents “systematically understate willingness to pay” to influence final pricing (Nagle & Holden, cited in Hanlon & Luery, 2002). The researchers note that even discrete choice methods, while having a “high degree of realism,” require “substantial expertise” and “substantial time and budgets” and that they “know of no studies that have validated the Van Westendorp approach” — a limitation they extend to all stated-preference methods. In B2B software, where your respondents are often trained procurement professionals who know exactly why you’re asking, strategic distortion is the norm, not the exception. Game-theoretic research proves this formally: buyers who know their responses inform pricing will deliberately choose suboptimal configurations to avoid revealing high willingness-to-pay (Aron et al., 2005).

Conjoint can’t capture value the buyer hasn’t experienced. A conjoint study asks respondents to trade off features and prices for a product they’re evaluating. But in B2B software, the most important value drivers are often things the buyer can’t evaluate in a survey — workflow improvements they haven’t seen, integrations they haven’t tested, efficiency gains that only emerge after months of organizational adoption. A conjoint study captures the buyer’s perception of value at the moment of the survey, anchored to their current experience with their current tools. It can’t measure willingness-to-pay for value that hasn’t been delivered yet.

Conjoint produces a snapshot, not a system. Even if a conjoint study perfectly captured every respondent’s true preferences at the moment of the study, those preferences change. The product ships new features quarterly. A competitor drops their price. The buyer’s business context shifts. The conjoint study that cost $150K and took three months to execute is calibrated to a market that no longer exists by the time the recommendations are implemented.

Consultancies that use conjoint claim in-market validation of their price recommendations, but this data is proprietary and self-reported. No independent peer-reviewed study has compared conjoint-derived pricing outcomes to continuous demand measurement outcomes for B2B software.

This isn’t an argument that conjoint is useless. It’s a more sophisticated instrument than PSM, and it produces richer data about feature-price trade-offs. But it shares the same foundational weakness as every stated-preference method: it asks people to predict their own behavior in a consequence-free environment, captures a static snapshot, and presents the result as the answer to a question that the market itself should be answering continuously.

Why software makes the problem worse

Physical goods have a natural price anchor: material costs. A buyer evaluating a manufactured product has at least a rough sense of what the inputs cost — steel, labor, components. Even if they don’t know the exact figure, the physical nature of the product creates a reference frame.

Software has no such anchor.

Research on information goods pricing (Shapiro & Varian) established that software exhibits fundamentally different economics: near-zero marginal reproduction costs and high fixed development costs. A customer buying enterprise software cannot reason backward from “what this costs to produce” to “what it should cost me” — because the answer to the first question is effectively zero per unit, and the answer to the second question depends entirely on the value it creates for their specific business.

As I wrote in Why Continuous Monetization Is So Vital: “When a consumer decides to buy a new refrigerator, they have a pretty complete idea of its value. While certain features and designs can affect that value, the buyer can make an easy price comparison because of the product’s functional parity with other offerings — generally speaking, an icemaker is an icemaker. That’s not the case in software, where customer perceptions of value can miss the subtle distinctions that create huge differences between seemingly similar products.”

There is minimal functional parity in software. An email capability is not an email capability. So when a PSM survey asks a software buyer “at what price would you consider this product too expensive?”, the buyer has no anchor. They’re not estimating — they’re guessing. And that guess is shaped by whatever reference points happen to be in their head: the last software they bought, a competitor’s listed price, a number their CFO mentioned in a budget meeting.

In B2B specifically, value perception is strongly affected by organizational software experiences. Differential value is often concentrated in innovations prospective buyers haven’t yet experienced — innovations that might enable currently unimaginable operational improvements. Buyers can’t tell you in a survey what they’d be willing to pay for something they haven’t yet seen working in their environment.

The WTP number you get from the survey is a composite of arbitrary anchors, filtered through hypothetical bias. It’s precise-looking data built on sand.

AI compounds the problem

If customers couldn’t estimate what they’d pay for software before AI, they certainly can’t now.

Consider what happens when a B2B software company wraps generative AI into its product. A single user action — asking a question, generating a report, running an analysis — might trigger three to five model calls behind the scenes: one to decompose the problem, one to run the inference, one to verify the answer, one to summarize the result. The user sees one click. The vendor’s infrastructure sees a chain of API calls, each with variable cost depending on the model, prompt length, and orchestration logic. Ask the customer what they’d be willing to pay for that feature, and they’re estimating a price for a process they can’t see, powered by costs they can’t comprehend, at a volume they can’t predict.

But the opacity isn’t just on the buyer’s side. Many vendors don’t know what their AI costs are per customer. Inference costs are comingled with compute, storage, and networking on aggregate infrastructure bills. A vendor might know their AWS bill increased 40% after launching an AI feature — but they can’t attribute how much of that is Customer A’s usage versus Customer B’s. They’re pricing a product whose cost-to-serve they can’t isolate.

And both sides are pricing against a moving target. Model costs drop 50% every six to twelve months. New models change the cost-per-quality curve. Orchestration patterns evolve as engineering teams optimize. A WTP survey conducted today is calibrated against a cost structure that will be obsolete by the time the recommended pricing is implemented.

The bundling trap. Facing this uncertainty, many B2B software companies made a rational short-term decision: they included AI in existing packages without charging separately, reducing sales friction and driving adoption. This solved the go-to-market problem. But it created a monetization trap — customers now expect AI as included, and separating it later triggers the endowment effect we described above. They’d be “losing” something they already have.

The token passthrough problem. Other vendors went the opposite direction — passing AI costs directly to customers as tokens, credits, or consumption charges. This might seem like transparent pricing, but it fundamentally undermines value capture. When you price at the token level, you’re no longer selling “we help you optimize your pricing strategy.” You’re selling “our tokens cost $0.003 versus Google’s $0.002.” You’ve commoditized your own product by framing it at the infrastructure layer instead of the business outcome layer. As I’ve written about GenAI pricing challenges, software companies should sell baked cakes, not itemize the ingredients and baking time.

Enterprise buyers confirm this. When researchers asked healthcare decision-makers — sophisticated B2B buyers managing multi-million dollar technology budgets — about pricing diagnostic AI, 76% rejected models based on technical usage metrics like tokens or API calls, calling them “economically and operationally misaligned” with how they plan and budget (Ebert et al., 2025). They preferred hybrid models with predictable base fees and variable components tied to business outcomes, not technical consumption. If buyers can’t even conceptualize the unit of measurement, a WTP survey for that unit is meaningless.

Credits make it worse. Some vendors added yet another abstraction — converting dollars to tokens to credits to “AI units.” Each layer moves the pricing further from anything a buyer can reason about. A WTP survey asking “what would you pay for 500 AI credits per month?” is measuring a fiction filtered through four layers of abstraction. The buyer doesn’t know what a credit represents, how many credits their workflow consumes, or how that consumption will change as the vendor’s AI evolves.

The buyer can’t state WTP because they don’t know their usage. The vendor can’t cost-plus price because they don’t know their per-customer costs. Both numbers are changing quarterly. And the tools most consultancies use to measure willingness-to-pay — surveys that produce a single number at a single point in time — are pricing against a reality that doesn’t hold still long enough to be measured.

Continuous monetization isn’t just better here — it’s the only approach that can adapt as both the costs and the value change in real time.

Most B2B buyers don’t decide on price

Even if WTP surveys produced accurate numbers, there’s a more fundamental problem: for most B2B buyers, price isn’t the primary decision factor.

A peer-reviewed study of 112 IT professionals evaluating B2B SaaS pricing provides direct evidence. When buyers evaluated a sales force automation tool at three price points ($75, $167, and $250/user/month), their behavior wasn’t a simple function of price. At the lowest price ($75), buyers generated more negative thoughts than positive ones and only 27% requested sales contact. At the acceptable price ($167), the response was strongly positive and 64% requested contact. At the high price ($250), only 22% requested contact. The statistical differences were decisive — not marginal or debatable.

The lowest price performed almost as badly as the highest. These are B2B software buyers — IT professionals with budgets — and they weren’t optimizing for price. They were evaluating whether the price signaled appropriate value. A price that was “too good” triggered as much skepticism as one that was too high.

A WTP survey treats every respondent as a price optimizer. It assumes the number they give you is the number that determines whether they buy. The B2B SaaS evidence shows the opposite: even among professional software buyers, the relationship between price and purchase intent is non-linear, and the “right” price is the one that falls within a cognitive acceptance range — not the lowest one the buyer will tolerate.

The endowment effect: buyers undervalue what they don’t yet have

There’s a third bias working against WTP accuracy — one that pulls in the opposite direction from hypothetical bias.

John Gourville’s research on new product adoption (Harvard Business Review, 2006) documented the endowment effect in purchasing decisions. Across multiple experiments, people demanded approximately 3x more compensation to give up a product they already possessed than they were willing to pay to acquire the same product.

In the most cited experiment, sellers valued coffee mugs at $7.12 while buyers valued the identical mug at $3.12.

For B2B software, this means:

Buyers systematically undervalue new software relative to its actual worth to their business
They simultaneously overvalue whatever they’re currently using — even if it’s clearly inferior
WTP surveys capture this deflated number and present it as the ceiling

The net effect: hypothetical bias inflates the WTP number upward (Kunter), while the endowment effect deflates the true valuation downward (Gourville). A WTP survey gives you an inflated estimate of an already-deflated valuation. The error doesn’t cancel — it compounds the confusion.

Surveys don’t account for salespeople’s willingness to discount

Here’s where the academic evidence meets the reality of B2B software sales.

A software company can conduct survey after survey on their prospects’ willingness to pay. But as I’ve written in Forbes, these surveys don’t take into account salespeople’s willingness to discount. It’s a false assumption that the customer primarily drives willingness to pay. Salespeople have significant sway over how much money customers are willing to put forward — and software executives often underestimate that sway.

The academic research confirms this at scale. A study of 2,938 enterprise software deals (averaging $850K each) found that salespeople gave away 4.3 percentage points in excess discounts — translating to 6.6% of total vendor revenue — primarily to manipulate deal timing for their commission benefit. Seventy-four percent of deals closed on the last day of the quarter, and deals closing late in quarter averaged 35-37% discounts versus 30% mid-quarter.

This isn’t a few rogue reps. It’s structural. Non-linear quarterly commission plans create massive compensation differences for identical deals depending on timing. The same $250K deal might earn a salesperson $5K in an empty quarter versus $62.5K in a high-revenue quarter. The rational response is exactly what the data shows: offer deeper discounts to pull deals into the quarter where they help your comp plan the most.

The B2B SaaS cognitive response study reinforces this: when IT professionals saw the lowest price ($75/user/month — less than half the acceptable range), they generated more counterarguments than support arguments and only 27% requested sales contact versus 64% at the mid-range price. Lower prices didn’t increase purchase intent — they reduced it. If discounting triggers the same cognitive rejection as overpricing, and the enterprise data shows salespeople are structurally incentivized to discount, then the WTP survey’s recommended price is being undermined by the very sales process it’s supposed to inform.

We see this play out constantly. A prospect might be willing to pay $30,000 a year for workflow automation software — until they learn that the salesperson they’re working with tends to give out discounts of up to 20%. Suddenly, the prospect won’t want to pay $30,000. And that discount doesn’t disappear come renewal time. It becomes the new baseline from where the renewal negotiation ensues, and that customer will likely argue for another discount on top of what they’ve already been given.

The compounding failure looks like this:

A WTP survey produces an inflated number (hypothetical bias)
The consultancy recommends a price based on that inflated number
The sales team discounts from the already-questionable ceiling, driven by comp structures
The discounted price becomes the new anchor for renewals
The customer’s perception of fair value has now been permanently depressed

Each step introduces error in the same direction — downward from true value. The resulting price has only an accidental relationship to the software’s actual worth.

B2B software pricing in practice

The B2B pricing research is damning enough. But the practitioner evidence is worse.

When Philip Ideson of the Art of Procurement podcast and I discussed this, he mentioned seeing price differences of as much as 10x for the same software product sold to different buyers. I’ve personally seen even more. Our own transaction datasets across B2B software companies reveal the full spectrum — the same product or bundle of products discounted anywhere from 100% (given away free) to surcharges of several multiples above list price. When the same software is sold to similar buyers at prices that vary by orders of magnitude, no WTP survey can fix what’s broken — because the survey assumes a world of rational price-setting that doesn’t exist.

Consider the scenario from my Forbes article on What Software Companies Get Wrong About Pricing: Tanya and Tessa are trying to purchase the same software solution for their companies. Their companies are the same size, in different industries, with nearly identical use cases. After they each undergo the sales process with a different salesperson and sign on the dotted line, they meet for coffee — and Tanya learns that Tessa paid half the price that she did.

When customers feel cheated like this, they warn other prospective buyers. The company gains a reputation as one that requires wheeling and dealing just to get a fair price. And a WTP survey conducted in this environment is measuring the aftermath of dysfunction, not some fundamental truth about customer valuations.

I’ve lived this myself. When I co-founded a software company, our initial willingness-to-pay research pointed to a $2,500 price point. We launched there. Over time, as we understood how customers actually used the product and what outcomes it drove, the price moved to $100,000 — and eventually to $500,000. The WTP study didn’t just underestimate demand. It undercut our value by orders of magnitude, because the buyers we surveyed couldn’t articulate the value of something they hadn’t yet experienced at scale. No survey redesign would have fixed that — the insight only came from observing real usage and real purchasing behavior over time. The academic research focuses on overestimation because that’s what controlled experiments on low-cost products reveal. But recent research by Ebert et al. (2025) found the bias actually reverses for premium products — while hypothetical WTP overestimates by 31% for low-cost items, it underestimates by up to 28% for high-cost items. B2B enterprise software sits firmly in the premium category. The error cuts both ways, and for software where value compounds with usage and integration, underestimation may be the costlier mistake.

This pattern repeats across the B2B software companies we work with. Executives consistently tell us the same thing: their customers genuinely don’t know what they’d pay. The tools most consultancies use to answer that question are measuring a fiction that gets further distorted by every sales interaction.

Why a snapshot can never replace a system

Even if you could fix every bias in WTP measurement — eliminate hypothetical inflation, correct for the endowment effect, account for buyer type, discipline every salesperson — you’d still have a fundamental problem: a survey produces a snapshot. It tells you what buyers said they’d pay at one moment in time, for one configuration of your product, in one competitive context.

Software doesn’t work that way. Your product changes every sprint. Your market shifts every quarter.

There is a myth about pricing that software companies often let themselves believe: that once a customer purchases their software, the ongoing monthly price entitles the customer to all the new features and value going forward. The problem is that improvements and additions the company makes to its software can drastically alter the cost-value calculus. Over time, the amount the customer pays drifts out of sync with the value the software delivers.

A WTP survey taken in January is stale by March. The consultancy that delivered it has moved on to their next engagement. And you’re making pricing decisions based on data that was biased when it was collected and is now outdated on top of it.

Hal Varian — Google’s chief economist and one of the architects of modern information goods pricing theory — makes this point directly. His research documents that Google runs approximately 10,000 pricing and product experiments per year, with about 1,000 running concurrently. The reason: observational data alone “cannot establish causality for pricing decisions, leading to incorrect demand curve estimation.” External factors create false correlations between price and demand that no survey can untangle.

Without the data to support it, pricing decisions can only come from narrative, assumptions, and anecdotes — a much more precarious and risky foundation.

The alternative: continuous monetization

The research — and what we’ve observed across hundreds of billions of dollars in B2B software transactions since 1982 — supports a fundamentally different approach. One that replaces the “measure WTP, set price, move on” model with a system of ongoing demand measurement, structured experimentation, and iterative price improvement.

We call this continuous monetization. Top-performing software companies set off on a constant hunt to get paid fairly for their intellectual property. They know that the never-ending development of a software product means that, over time, some elements will over-deliver value while others may be underused. So they persistently monitor the value drivers and adapt their pricing.

Some firms now pair an initial conjoint study with ongoing price optimization. But if the optimization framework will override the study’s recommendations within a quarter — and it usually does, as the product and market evolve — the question is whether the initial study justified its cost and the months it delayed action.

Here’s how it works and why the evidence supports each component.

Measure demand response, not stated preference

The core shift: instead of asking customers what they’d pay, observe what they actually do when pricing changes.

We can’t rely on what customers say to estimate software value. We need to understand what they do — including purchases, usage patterns, upgrade behavior, and churn signals.

We recommend performing a series of controlled incremental price changes to understand and push the boundaries of willingness to pay for customer groups with similar usage and derived value characteristics. This is an empirical, reliable, risk-mitigating method of conducting demand elasticity analysis, firmly rooted in how customers behave — not just what they say.

This method also helps you harmonize pricing with the rate of new value creation from your product roadmap. In B2B software, especially if subscription-based, customer value perception contains a futures element. Customers expect a stream of increasing value. So another way to think about this: you’re taking pricing validation steps in a journey that always keeps you on the safe side of the razor’s edge of being paid fairly for your software’s value.

Research on price experimentation algorithms confirms the approach can achieve 96.9–99.1% of optimal revenue across different demand distributions. The key insight: structured experimentation that incorporates economic theory dramatically outperforms both naive price testing and static survey-based pricing. Recent theoretical work reinforces this — Choi et al. (2025) proved mathematically that subscription usage data alone can identify willingness to pay without requiring price experiments, because variation in how customers use a product at a fixed price reveals their underlying valuations. The data you need is already flowing through your systems.

Software companies that know precisely how their customers behave — what they buy, the features they use and don’t use, the amount they pay, how quickly and often they upgrade, downgrade, or churn — are able to analyze pricing opportunities and risks. And adapt accordingly. The closer to real-time they have that information, the more quickly they can adapt and the sooner they can benefit from the value they are providing.

We often see how this “truth on the ground” data surprises operators — challenging or outright invalidating assumptions that helped build the original pricing model.

Start below value, iterate upward

Most pricing consultancies deliver a “right price” and their engagement ends. Continuous monetization starts differently: set an initial price based on competitive positioning and value hypothesis, then systematically increase it while measuring demand response.

The research on price compression documents a pattern where B2B companies systematically underprice their most valuable offerings and overprice their least differentiated ones — compressing the price range toward the middle. It is quite common for software executives to underestimate the variety and amounts of discounts that flow through their company’s book of business. They believe they know what is happening, until a scatter chart plotting every deal by discounts and other attributes tells them a very different story.

The fix isn’t a single repricing event. It’s a process of expanding the range outward: raising prices on high-value products where demand proves inelastic, and restructuring low-value products where prices meet resistance.

The B2B SaaS cognitive response data supports this directly. The acceptable price range for the sales force automation tool was $125-$200/user/month — a $75 band where buyers responded positively. Below $125, buyers resisted. Above $200, they resisted. But within that range, the company had significant pricing power: the difference between $125 and $200 is a 60% spread, and buyers’ cognitive responses were positive across the entire band. A company that starts at $125 and iterates upward toward $200 — measuring demand response at each step — captures that 60% without ever asking a survey question. And if the product improves, the upper bound of the acceptable range moves with it.

Investors reward this approach. Research on SaaS transitions (HBR) shows that stock prices increase an average of 2.2% when SaaS is offered alongside perpetual licensing — but companies that force-convert existing products to SaaS-only see a 3.5% value decrease. The market wants to see pricing that evolves with the product, not pricing that gets set once and locked in.

Let versioning reveal WTP through behavior

Varian’s versioning research provides the theoretical foundation: pricing structures that let buyers self-select based on their own assessment of value reveal more about willingness-to-pay than any survey. The pricing structure itself becomes the WTP measurement instrument.

Good-better-best tiering is one version of this, but it’s far from the only one. The research validates multiple packaging architectures depending on how customers derive value:

Tiered plans (good-better-best) — work when customer segments align cleanly with feature tiers
Platform with optional modules — work when customers have diverse needs and value different capabilities. Research on multi-component pricing models shows these provide “superior outcomes” for B2B services by aligning price with actual usage patterns
Customized bundling (“pick N modules from the catalog”) — research shows 13-21% profit improvement over forcing everyone into the same bundle, specifically when customer preferences vary across the product portfolio
Usage-based or hybrid models — work when value scales with consumption and customers prefer paying for what they use

The specific architecture matters less than the principle: let buyers reveal their willingness-to-pay through their purchasing choices — which configuration they pick, which modules they adopt, when they upgrade — rather than asking them to guess it in a survey.

This is fundamentally more accurate than stated-preference research for three reasons:

Real consequences. When a buyer chooses a configuration, they’re making a decision with their company’s money. There’s no hypothetical bias because the transaction is real.
Continuous signal. Every renewal, every expansion, every add-on, every upgrade decision is a new data point. You’re not relying on a single snapshot — you’re building a longitudinal dataset of revealed preferences.
Segment-level precision. Packaging structures naturally separate buyer types. Price-sensitive buyers self-select to basic configurations. Value-driven buyers choose comprehensive packages. The structure does the segmentation that a survey tries to do artificially.

But the right packaging architecture — and the right boundaries within it — must be discovered through market feedback, not set once by a consultancy.

Structure the price around how customers decide

The B2B SaaS cognitive response study demonstrates this directly. IT professionals didn’t just evaluate the amount — they evaluated what the price signaled. The same product at $75/user generated suspicion (“what’s wrong with it?”) while $167/user generated confidence (“this is a serious tool”). The price structure shaped the buyer’s cognitive frame, and that frame determined whether they engaged further.

For B2B software, how you package and present pricing tiers has more impact on perceived value than the dollar amounts themselves. What we call licensing metrics — per seat vs. per usage, monthly vs. annual, bundled vs. modular — shape buyer behavior more than any specific number on a survey. Research on information goods bundling confirms this: bundling software modules reduces variance in customer valuations, making pricing more predictable and revenue more capturable than selling components individually (Bakos & Brynjolfsson, 1999).

The implication: a pricing page that clearly communicates what’s included at each tier, how pricing scales, and why may outperform one that’s been “optimized” by a WTP study — because the structure itself communicates value that no survey question can capture.

Build the pricing feedback loop into operations

Continuous monetization isn’t a one-time project — it’s an operational capability.

Always-on demand sensing. The right infrastructure captures conversion rates, upgrade/downgrade flows, churn by tier, and expansion revenue by cohort continuously — not when someone remembers to pull a report. AI-augmented pattern recognition can surface pricing signals that human analysis would miss: a subtle shift in tier mix that precedes churn, a packaging configuration that consistently accelerates deal velocity, or a discount pattern that correlates with lower lifetime value. Pricing data is constantly changing. Good data is not a snapshot of past history. It lives within your customers and your organization, and as it changes, so too must your pricing model.

Structured experiments. Test pricing changes on specific segments before rolling out broadly. For B2B SaaS, this might mean testing a 10% price increase on new customers in one segment while holding existing customers constant — then measuring the conversion impact over 60-90 days. When the system is always collecting and analyzing deal data, the experiment isn’t a special event — it’s the normal operating rhythm.

Value-aligned iteration. Every major feature release is a pricing event. If you ship a capability that materially changes the value proposition for a segment, the price should reflect it. Not through a new WTP survey — through a structured increase with demand measurement. An always-on system detects when the value-price gap widens and flags the opportunity before it’s left on the table for quarters.

Longitudinal win/loss intelligence. For enterprise sales, the richest WTP data comes from the deals themselves — not as one-time post-mortems, but as a continuously enriched dataset. Not “what price did the customer say they wanted” but “at what price did they actually sign, what did they push back on, and what did they not push back on.” When AI is analyzing every quoting iteration, negotiation response, and contract term across your entire book of business over months and years, the pattern recognition compounds. What starts as 60 deals becomes a longitudinal intelligence layer that gets sharper with every transaction.

Why this works at enterprise scale

The most common objection to continuous monetization: “We close 60 enterprise deals a year. You can’t do statistically significant pricing experiments with 60 data points.”

This misunderstands what the data actually is.

Depth beats breadth. The issue with survey-based pricing was never sample size — it was data quality. Every response in a 2,000-person conjoint study is hypothetical. Every data point in a 60-deal transaction dataset is real. Depth of observation isn’t a concession from quantitative rigor — it’s an upgrade. Qualitative research methodology has established for decades that smaller samples with deep observational data produce more accurate behavioral findings than large-scale surveys with shallow, hypothetical responses. Anthropologists draw valid conclusions from 15-30 deep observations. The same principle applies to enterprise pricing: 60 deals where real money changed hands, real procurement teams pushed back, and real contracts were negotiated contain more signal about willingness-to-pay than 2,000 conjoint responses where nobody signed a check.

Over four decades of working exclusively with B2B software companies, we’ve found that controlled packaging and pricing iterations converge on accurate solutions rapidly — typically within two to three adjustment cycles. The accuracy comes not from the volume of data but from the quality: each iteration is tested against real buyer behavior with real budget consequences, and the feedback loop is measured in weeks, not the months a survey-based engagement requires.

It’s not 60 data points — it’s thousands. Each enterprise deal isn’t one observation. It’s a sequence of pricing decisions. Through our LevelSetter platform, we capture every quoting iteration, packaging configuration, and negotiation response via API — continuously, across the entire book of business. AI analysis runs against this growing dataset, identifying patterns that no quarterly review could surface. A single deal may generate 15-20 observed pricing interactions before close — the initial proposal, the counter, the packaging adjustment, the procurement pushback, the revised quote, the final terms.

Sixty active deals produce over a thousand real pricing responses — each made with budget authority, procurement oversight, and actual purchase intent. No survey produces data of this quality at any sample size, because no survey respondent faces the consequence of actually spending the money.

We see where deals break — not just whether they close. Continuous monetization doesn’t just measure outcomes. It maps the full negotiation arc. We see which packaging configuration opened the conversation, which pricing adjustment unlocked budget approval, and which counter-proposal killed the deal. This isn’t win/loss at the deal level — it’s win/loss at each decision point within the deal. Pattern-matching across these sequences reveals pricing dynamics that no point-in-time survey can detect.

Did the deal stall at the first quote? Packaging problem. Did it survive three rounds but fail at procurement? Price structure problem. Did a packaging change in round two unlock the budget? That’s a signal about what the buyer actually values — and it’s a signal you can only see in real transaction data.

The data compounds — surveys depreciate. A conjoint study is a depreciating asset: accurate (at best) on the day it’s delivered, stale within a quarter, irrelevant within a year. An always-on system built on transaction data is an appreciating asset. Every deal — win or loss — enriches the dataset. AI-driven pattern recognition gets more precise with every negotiation, every renewal, every expansion. The longer the system runs, the sharper its insights become — the opposite of a survey that begins aging the moment it’s completed.

This is the fundamental difference. Our approach is grounded in how deals are actually forged — the proposals, counteroffers, packaging iterations, and contract terms that produce real revenue. Survey-based methods are grounded in hypothetical worlds where respondents face no consequences for their answers.

The argument in summary

The traditional pricing consultancy model runs a WTP survey, delivers a “right price,” and exits. The evidence says this approach has five structural problems:

The survey is biased. Hypothetical methods overestimate WTP by up to 2x (Kunter, 2016). The buyer faces no consequences for overstating.
The buyer doesn’t know the answer. Software has no cost anchor. Buyers can’t tell you in a survey what they’d be willing to pay for innovations they haven’t yet experienced.
The question is wrong for most buyers. Peer-reviewed B2B SaaS research shows the lowest price generates as much buyer resistance as the highest. Buyers aren’t optimizing for price — they’re evaluating whether the price signals appropriate value.
The result is static. A snapshot of stated preferences in Q1 has no mechanism to adapt to product changes, competitive shifts, or segment evolution in Q2-Q4.
The implementation erodes it further. Sales teams discount from the already-questionable ceiling — costing up to 6.6% of revenue through comp-structure-driven gaming — and those discounts compound into permanent baseline reductions at renewal.

Continuous monetization addresses each of these by replacing hypothetical measurement with behavioral observation, one-time surveys with ongoing demand sensing, static recommendations with iterative improvement, and consultant-delivered pricing with an operational capability that compounds over time.

The research doesn’t say “never run a survey.” PSM and conjoint have a place as exploratory inputs — directional data to inform an initial hypothesis. But they are a starting point, not an answer. And a pricing consultancy that treats them as the answer is selling certainty built on a foundation the research itself has demonstrated is unreliable.

Product management can use continuous monetization to better understand what customers value and will pay for — as opposed to what they tell a survey they’d be willing to pay, which is proven to be error-prone and essentially meaningless.

Everything described above — the biases in stated-preference research, the need for always-on demand sensing, the value of capturing every negotiation iteration, the compounding intelligence that gets sharper with every deal — pointed to a gap in the market. There was no platform built specifically for B2B software companies to operationalize continuous monetization.

So we built one. LevelSetter is our AI-augmented pricing platform for B2B software. It connects via API to your quoting and deal management systems, captures the full arc of every pricing interaction, and applies pattern recognition across your entire book of business — longitudinally, not as a one-time study.

It doesn’t replace pricing judgment. It replaces the fiction that a survey can tell you what your customers will pay. Instead of hypothetical willingness-to-pay data, you get observed willingness-to-pay behavior — from real deals, with real money, analyzed continuously.

If the arguments in this article resonate, LevelSetter is where they become operational.

FAQs

What is willingness to pay (WTP) in software pricing?

Willingness to pay is the maximum price a buyer would accept before walking away from a purchase. In B2B software, it’s commonly measured through survey methods like Van Westendorp PSM or conjoint analysis — but research shows these hypothetical methods overestimate actual WTP by up to 2x because respondents face no consequences for their answers.

Why don’t WTP surveys work for B2B software?

Three fundamental problems: (1) Software has no cost anchor — unlike physical goods, buyers can’t reason from material costs to a fair price. (2) Most B2B buyers don’t decide primarily on price — peer-reviewed research shows the lowest price generates as much resistance as the highest. (3) Surveys produce a static snapshot that’s outdated within a quarter as the product and market evolve.

Is conjoint analysis better than Van Westendorp for software pricing?

Conjoint is more sophisticated — it infers WTP from trade-off choices rather than direct questions. But it’s still classified as an Indirect/Hypothetical method. Respondents still face no real consequences, B2B procurement professionals can still game their responses, and the study still produces a point-in-time snapshot. No independent peer-reviewed study has validated conjoint-derived pricing outcomes against actual B2B software market performance.

How does AI make software pricing harder to measure?

AI amplifies every WTP measurement problem. Buyers can’t estimate usage because a single action may trigger multiple hidden model calls. Vendors can’t isolate per-customer AI costs from aggregate infrastructure bills. Token and credit-based pricing adds layers of abstraction that disconnect price from business value. And model costs change so rapidly that any survey-based price is calibrated against a cost structure that will be obsolete within months.

What is continuous monetization?

Continuous monetization replaces one-time pricing studies with an ongoing system of demand measurement, structured experimentation, and iterative price improvement. Instead of asking customers what they’d pay, it observes what they actually do — through real deal negotiations, packaging iterations, and win/loss patterns. The approach uses AI-augmented pattern recognition across transaction data to surface pricing insights that compound with every deal.

How can B2B software companies measure willingness to pay without surveys?

Through controlled incremental price changes across customer groups with similar usage and value characteristics, measuring actual demand response rather than hypothetical stated preferences. Versioning and packaging structures let buyers self-select into tiers, revealing WTP through purchasing behavior. Deal-level win/loss analysis — tracking every quoting iteration and negotiation response — produces a demand curve built from real transactions, not survey responses.

Do WTP surveys have any role in B2B software pricing?

Even if you accept the argument that surveys provide directional data, you’re still better off skipping them and going straight to measuring real demand. Iterating on pricing with actual buyers who face real consequences — through deal negotiations, packaging tests, and win/loss analysis — produces a demand curve that is both more accurate and immediately actionable. A survey delays that learning by months and anchors your team to a number the research has shown is unreliable. The time and budget spent on a WTP study is better invested in building the continuous measurement system you’ll need regardless.

This article draws on peer-reviewed research including Kunter (2016), Ebert et al. (2025) on hypothetical bias reversal in premium products, Gourville (2006), Varian (2014), Shapiro & Varian (1998), Choi et al. (2025) on subscription demand identification without price variation, Ebert et al. (2025) on diagnostic AI pricing model preferences, B2B SaaS cognitive response studies, enterprise software contracting research, price experimentation algorithms, and Bakos & Brynjolfsson (1999) on information goods bundling — supported by over four decades of SPP’s direct experience with B2B software pricing. For related reading, see 3 Reasons B2C Value-Based Pricing Won’t Work for B2B Software, GenAI Pricing Challenges, and Why Continuous Monetization Is So Vital.