God of StartupsGod of Startups

From first idea to product-market fit — your founder workspace, run by 100+ AI agents.

MVPBy Alex Dalevich· 8 min read· Updated June 9, 2026

MVP Scoping: The Framework for Cutting Features Without Cutting Value

An MVP isn't the smallest version of your product. It's the smallest experiment that can change your mind. Here's how to scope one.

The MVP isn't the smallest version of your product. It's the smallest experiment that changes your mind.

Most "MVPs" are just bad products. A real MVP is the smallest experiment that meaningfully changes what you believe — about your customer, your problem, or your solution.

Why scope creep kills MVPs

Scope creep doesn't usually arrive as one bad decision. It arrives as a hundred reasonable ones. "Users will obviously want settings." "We can't ship without onboarding." "Let's just add login while we're in there." Each feature feels harmless on its own. Together they push your launch out by months and bury the one question you actually needed answered.

The cost isn't only time. Every feature you build is one you have to maintain, explain, and interpret. When a bloated MVP gets a lukewarm response, you can't tell which part fell flat — the core value, the pricing, or the fifth feature nobody asked for. You shipped late and still learned nothing.

A tightly scoped MVP does the opposite. It isolates a single belief and puts it in front of real people fast enough that the answer still matters. Speed of learning — not feature count — is what compounds.

The framework: cut features without cutting value

The trick is to separate value from volume. You can remove most of what you planned to build and keep all the value — if you cut along the right line. Three lenses make that line visible.

1. Sort every feature into three buckets

Before you write a line of code, list everything you imagine the product doing and force each item into exactly one bucket.

BucketDefinitionMVP decision
Table-stakesThe minimum that makes the product usable at all (it loads, it saves, it doesn't lose data)Build only the thinnest version
Must-haveThe thing that delivers the core value — the reason someone would switchBuild this well
Nice-to-haveImproves the experience but isn't why anyone shows upCut. Add back later if the assumption survives

Most founders get this backwards: they polish the nice-to-haves and ship a fragile must-have. The discipline is to be ruthless about buckets one and three so you can be generous with bucket two.

2. Apply the "one riskiest assumption" lens

Every startup rests on one assumption that, if wrong, kills the business. Sort your assumptions by how badly it hurts if false times how unsure you are. The top of that list is your riskiest assumption — and your MVP exists to test it, not to showcase your roadmap.

Any feature that doesn't help confirm or kill that assumption is a tax on learning speed. This lens often demotes things you assumed were must-haves. If you're testing whether people will pay, you may not need accounts, dashboards, or a polished UI at all. If you haven't pinned down which assumption is riskiest, that's a market validation problem to solve before you scope anything.

3. Ship the smallest testable version

Now ask the uncomfortable question: what is the cheapest thing I could build that would still produce a real answer? Often it isn't software at all:

  • Landing page — to test whether the promise alone earns sign-ups or pre-orders.
  • Concierge MVP — you deliver the service by hand, behind the scenes, before automating anything.
  • Wizard-of-Oz — the front end looks automated; a human does the work invisibly.

"Smallest testable" means small enough to ship in days or weeks, but real enough that the result counts. A demo your friends praise is not a test. A stranger choosing to pay, sign up, or come back is.

Where this breaks down

The riskiest-assumption-plus-fake-backend playbook is the right default, but it has hard edges. If you're in one of these cases, forcing a landing page or concierge test answers the wrong question:

  • Deep-tech, where the riskiest assumption IS "can we build it." If the real risk is technical feasibility — the model has to hit an accuracy bar, the chemistry has to work, the latency has to be physically possible — you can't Wizard-of-Oz your way past it. A human pretending to be the algorithm proves people want the output, but you already suspected that. The MVP here is a technical spike or feasibility prototype, not a demand test. Don't let "talk to users" become procrastination on the question that actually kills you.
  • Two-sided marketplaces, where the risk is liquidity. A concierge test of one side proves nothing about whether both sides show up at the same time and place. You can hand-match ten buyers to sellers and learn that each side likes the idea — and still have no signal on liquidity, the thing that actually makes or breaks a marketplace. Test the constrained side and the matching loop together in one geography or vertical, not each side in isolation.
  • Products whose value only emerges with data or network density. A social feed, a recommendation engine, a benchmarking tool — these are bad on day one by design, and get good only as data and users accumulate. A thin MVP shown to a stranger will correctly look worthless and falsely fail the test. Here you have to either simulate density (seed with credible synthetic or hand-curated data) or test a single dense pocket, never a sparse general launch.

A worked example: scoping down a meal-prep app

Say you want to build an app that plans weekly meals around what's already in someone's fridge and auto-generates a grocery list. The full vision has dozens of features: barcode scanning, nutrition tracking, recipe ratings, dietary filters, calendar sync, a social feed.

Riskiest assumption: Busy people will actually change how they shop and cook if a tool plans it for them. Not "can we build this" — the tech is straightforward. The risk is behavior change.

Run the three lenses:

  • Table-stakes: a way to see a plan and a list. Thin: a shared doc or a simple web page.
  • Must-have: a personalized weekly plan + grocery list that genuinely fits this person's fridge and tastes.
  • Nice-to-have: barcode scanning, nutrition tracking, ratings, social feed — all cut.

Smallest testable version: a concierge MVP. Ten users fill out a short form about their fridge and preferences. You build each weekly plan and list by hand and send it over. No app. Within two weeks you learn whether people open it, follow it, and ask for next week's plan — the only signal that tests behavior change.

If they ghost the plans, no amount of barcode scanning would have saved you. You found that out in two weeks instead of two quarters. (Doing this without burning cash is its own skill — see launching with no money.)

A second worked example: a B2B workflow tool

Consumer examples make concierge tests look easy. B2B is where founders most often over-build, because "we need real software before a serious company will touch it" feels true. Usually it isn't — for the test.

Say you want to build software that replaces how RevOps teams reconcile their CRM against billing data — today a painful monthly spreadsheet ritual.

Riskiest assumption: Will a team actually change an entrenched process they've run for years? Not "can we build the integration" — the tech is plumbing. The risk is organizational inertia. Teams tolerate broken processes for a long time precisely because changing them is expensive in attention and politics.

Run the three lenses:

  • Table-stakes: ingest two data sources and produce a reconciled output people trust.
  • Must-have: the reconciliation is correct and saves real hours versus the current spreadsheet.
  • Nice-to-have: integrations, dashboards, alerting, role permissions — all cut for the test.

Smallest testable version: recruit 3 design-partner teams and run their reconciliation by hand in a spreadsheet every month, before writing a line of integration code. You're the backend. The signal you're hunting: do they send you next month's data unprompted, route real decisions through your output, and start asking for it sooner? That's a team changing its process. If instead they're polite but keep running their old spreadsheet in parallel "just to be safe," the entrenched-process assumption is failing — and you learned it for the cost of three months of manual work, not an engineering team.

The pattern transfers: in B2B the concierge test is usually you doing the customer's painful workflow for them by hand until they depend on it.

How to set a kill criterion

Before you ship, write down what result would make you stop. Founders who don't pre-commit to a kill criterion never kill anything — they rationalize every weak signal as "early days."

A good kill criterion is decided in advance, numeric, and tied to the riskiest assumption. Templates:

  • "If fewer than 3 of 10 concierge users ask for a second week, the behavior-change assumption is dead."
  • "If the landing page converts under X% after Y qualified visitors, the promise doesn't land."
  • "If no one will pre-pay, willingness-to-pay is unproven — stop building."

Write it where your cofounder can see it, and set the date you'll judge it. Pre-committing protects you from the most expensive bias in startups: falling in love with the product instead of the problem. If the criterion is met, you don't grieve — you pivot or kill, having spent days, not months. Tracking these decisions in a living product vision document keeps the why visible after the heat of launch fades.

Calibrate the threshold to the traffic and the ask

The number "X%" is doing a lot of quiet work, and a single threshold across different tests will mislead you. Two things move it:

Traffic quality changes what a conversion means. A landing page fed by cold traffic (broad ads, a generic post) typically collects emails in the low single-digit percent, and that number tells you almost nothing — you're measuring ad creative as much as demand. The far stronger test is qualified traffic: visitors who arrived intent-matched, from the exact problem context, who already nodded along. When intent-matched visitors won't even give you an email, that's a real negative — the promise itself isn't landing. Don't celebrate 4% from cold traffic; don't excuse 2% from qualified traffic.

The ask changes the rate by an order of magnitude. People give an email far more readily than they pre-pay. A pre-order or deposit converts far below an email opt-in for the same demand — so a kill threshold tuned for email signups will murder a perfectly good idea if you apply it to a payment ask. Set the bar to the friction of the ask: a low pre-pay rate can still be a strong positive, because money is a much heavier vote than an email.

Be honest about sample size. Ten concierge users is plenty to read a binary behavior signal — did they come back, did they route real work through you, yes or no. It is far too small to read a conversion rate: "3 of 10 converted" is not "30%," it's noise with a wide error bar. Use small samples for clear binary verdicts; never quote a percentage off them or set rate-based kill criteria you can't actually measure at that n.

Rule out a false negative before you honor the kill

Kill criteria cut both ways. They protect you from sunk-cost denial — but a badly-run test can produce a false negative that murders a good idea with bad execution. Before you act on a met kill criterion, confirm you tested the assumption cleanly:

  • Right audience. Did the test reach the actual target segment, or whoever was cheapest to put in front of it? A landing page shown to the wrong people fails for reasons that have nothing to do with your assumption.
  • Credible promise. Was the offer believable and specific, or vague and over-hedged? People don't convert on promises they don't trust, even when they want the underlying thing.
  • Value actually delivered. In a concierge test, did you genuinely solve their problem, or ship something half-baked that would have failed regardless of demand? If the manual service was bad, "they didn't come back" tests your execution, not the market.

If any of these is shaky, the result is contaminated — re-run the test clean before you kill. A kill criterion is only as trustworthy as the cleanliness of the experiment underneath it.

FAQ

How small is too small?

Too small is when the result can't change your decision. If you'd build the next version regardless of how the test goes, you didn't run a test — you shipped a feature. The floor isn't a feature count; it's "does this produce a believable yes/no on the riskiest assumption?"

Isn't a landing page or concierge MVP "cheating" — it's not a real product?

That's the point. The MVP's job is to buy information, not to be the product. A landing page that proves nobody wants the thing just saved you a year. You build the real product once the assumption survives.

What if I have several risky assumptions?

Test them in sequence, riskiest first. Bundling them into one big build is how you end up unable to tell which one failed. Kill the scariest assumption cheaply, then move to the next.

When does this stop and real product-building begin?

When the riskiest assumption has survived a real test, you graduate from "will anyone want this?" to "how do we make it great?" That transition is also where product-market fit signals start to matter more than MVP signals.

The four steps, in order

  1. Name the riskiest assumption. The one belief that, if wrong, kills the business. Write it down.
  2. Design the smallest test. The cheapest build that would prove or disprove it — often not software.
  3. Cut everything else. Every feature that doesn't test the assumption is a tax on learning speed. Add it back later if the assumption survives.
  4. Set a kill criterion. Decide in advance what result makes you stop. Then honor it.

Scope isn't about doing less. It's about pointing everything you build at the one question whose answer you actually need.

Keep reading

Stop reading. Start building.

Put this into practice with 100+ AI agents and proven frameworks — from idea to product-market fit.