10 min read

The economics of waterfall enrichment

How a 12-provider waterfall pays for itself in the first thousand rows — and the per-field math that makes it inevitable.

Vikesh Tiwari, Founder, TexAuBuilding the GTM Spreadsheet Engine. Previously built TexAu V1 and V2, survived two cease-and-desists, and still shipping. From Bombay Slums.

The thesis in two sentences

Single-source email enrichment used to be a defensible business because data acquisition was the moat. That moat has been commoditized by twelve different vendors selling overlapping (but not identical) datasets, which means the buyer who pools them outperforms the buyer who picks one — and the vendor who pools them out-competes the vendors who don't. The waterfall isn't a feature; it's the structurally cheaper architecture, and that's what kills the single-source model.

Section 1 — The 2018 economic model

Stipulate, for a moment, that you're running an email-finder business in 2018. Your unit economics look like this:

You bought (or scraped, or partnered for) a dataset. That cost some real number — call it $X per thousand records, all-in.
You sell access to your dataset for $Y per thousand records, where Y > X by a margin big enough to cover infrastructure, GTM, and a profit.
Your buyer pays you $Y because the alternative — building their own dataset — is much more expensive than the difference between $X and $Y.

This works. It worked for Hunter, ZoomInfo, Clearbit, Apollo, Lusha, and every other vendor that owned data in this period. The buyer's logical move was to pick the vendor with the best dataset for their segment and pay the markup.

The only competitive move available to vendors was: invest in dataset depth. More records, fresher records, better verification, broader geographies. Whichever vendor had the deepest data for a given segment would win that segment.

This was the era of "pick a winner per segment": Hunter for SaaS work emails, Apollo for outbound depth, Lusha for European B2B, ZoomInfo for enterprise. Buyers picked, vendors competed on data depth, money flowed.

Section 2 — What changed

Three things happened roughly between 2020 and 2024, in rough chronological order:

Data acquisition costs fell. Public-record scraping, partnership deals between data brokers, and the maturation of B2B-data exchanges meant the cost-per-record dropped meaningfully. The moat narrowed.
Dataset overlap increased. As more vendors built datasets, their datasets started looking more like each other. The non-overlapping records — the long tail where one vendor had data and another didn't — became the differentiator, but that long tail is narrow per vendor and broad in aggregate. Said differently: each vendor's unique-to-them records became a smaller percentage of their dataset, but the union of all vendors' unique records remained large.
The waterfall pattern became practical. Custom-built waterfall workflows — call vendor A, fall back to vendor B, fall back to vendor C — used to require an engineering team. By 2022 they could be built in Clay or Make in an afternoon. By 2024 they could be built by typing one sentence into a Co-Pilot.

The combined effect: the buyer who runs a 5-vendor waterfall gets a better fill rate than the buyer who picks any single vendor — and the cost per successfully enriched record is lower if the waterfall is metered correctly (more on this below).

Section 3 — The math nobody pencils out

Pretend you're enriching a list of 1,000 leads. You have a budget of $100 to spend, and you want to maximize the number of leads where you successfully find a verified email.

Option A: Single-source vendor at $0.10/lookup, 60% match rate.

You spend $100, get 1,000 lookups, and find emails for 600 leads. Cost per enriched lead: $0.167. Coverage: 60%.

Option B: 5-vendor waterfall at $0.05 per successful match (because the waterfall stops at the first hit), with combined 90% match rate.

You spend at most $0.05 per of the 900 successful matches = $45. The 100 misses cost zero. Cost per enriched lead: $0.05. Coverage: 90%.

The waterfall is cheaper per enriched lead AND higher coverage than single-source. This is true whenever:

The waterfall's combined match rate exceeds any single vendor's by enough margin to cover the per-lookup price difference, AND
The waterfall is billed on match, not on attempt.

Both conditions hold today across every reasonable B2B segment. The first condition is empirical (combined fill rates above 85% are achievable; single-source caps below 70% in most segments). The second condition is a pricing-model choice — and the vendors that haven't moved to pay-on-match are about to be priced out by the ones that have.

This is the part of the thesis that sounds like marketing but isn't. The math is independent of which vendor you favor. Put real numbers in, and the waterfall pattern wins.

Section 4 — Why incumbents resist this

If you're a single-source vendor with a database, the waterfall pattern is existential. Your dataset becomes one input to a buyer's optimization function, not the buyer's primary tool. Your pricing power evaporates because the buyer doesn't need your data depth — they need your data plus everyone else's data, as a fallback set. The vendor who aggregates becomes the platform; the vendor whose data was being aggregated becomes a feature in someone else's checkout flow.

This is why most established single-source vendors haven't built a public waterfall product:

Cannibalization. Selling a waterfall would cannibalize their highest-margin direct-data revenue.
Identity. Their brand is "the deepest data in segment X." Selling a waterfall is admitting the data alone isn't enough.
Cost of acquisition. Adding 11 other vendors as fallbacks means 12 partner contracts, 12 API integrations, 12 SLA negotiations, and a permanent partner-management function.

What incumbents do ship is a "we improved our coverage" press release every six months. That's the response of a vendor who's defending the single-source category from the inside; it's not the response of a vendor who's adapted.

The shift is happening anyway. Buyers are migrating to waterfall-native vendors faster than the incumbents are shipping. If you've talked to Clay, BitScale, or TexAu customers in the last 18 months, you've heard "we used to be on Apollo / Lusha / ZoomInfo, then we moved to a waterfall and our fill rate jumped from 60% to 90%." That sentence is the entire thesis told in one customer call.

Section 5 — Pay-on-match is the load-bearing pricing innovation

The waterfall's economics only work cleanly when the vendor charges per successful match, not per attempt. Otherwise the buyer pays for the misses across all 12 sources and the waterfall becomes more expensive than single-source, not less.

This is why "pay-on-match" credit pricing is the load-bearing innovation. Vendors that ship waterfalls without pay-on-match are shipping a worse-priced single-source experience. Vendors that ship pay-on-match without waterfall are shipping a worse-coverage single-source experience. The two innovations only compose.

If you're shopping today, the question to ask vendors is: "On a list where 30% of the records have no findable email, what do I pay?"

A vendor charging on-attempt: 30% of your spend goes to misses.
A vendor charging on-match with a waterfall: 0% of your spend goes to misses, AND your match rate is higher than single-source.

The price-discrimination is enormous. We've seen agency teams cut their per-enriched-lead cost by 60–70% just by moving to a pay-on-match waterfall, before any other optimization. That number is not a marketing claim; it's an arithmetic consequence of the two pricing changes composing.

Section 6 — The market structure shift

If the waterfall pattern is structurally cheaper for the buyer, the market structure has to shift. Here's the shape:

Tier 1: Waterfall platforms

The vendors that aggregate 10–15 sources, charge on match, and ship the orchestration layer (cascade order, dedup, source attribution, retry logic) as a product. This is where buyer dollars consolidate. Examples today: TexAu, Clay (with caveats around their workflow-builder UX), BitScale.

Tier 2: Source-of-record data vendors

The vendors that own a dataset and sell into the Tier 1 platforms via API partnerships. They become infrastructure, not consumer brands. They're profitable but they're not the customer-facing layer. Examples: Hunter, RocketReach, Snov, Datagma.

Tier 3: Fading single-source consumer brands

The vendors that built a consumer brand around their dataset but didn't make the Tier 1 transition. They lose pricing power as buyers move up to Tier 1 platforms. Either they pivot (build their own waterfall) or they get absorbed (sell into Tier 1 as data-only). Examples: not naming names because the diagnosis is uncomfortable, but you can predict the list.

Tier 4: Specialists

Vendors with genuinely unique data — long-tail geographies, regulated industries, specific verticals — survive as specialists. They sell into Tier 1 platforms as premium fallback sources. Their TAM is smaller but their position is durable.

The shape above is consistent with how SaaS markets typically restructure when an aggregation layer wins: the aggregator captures buyer relationships and pricing power; the data sources become wholesale providers; the consumer brands that didn't aggregate fade.

Section 7 — What this means if you're shopping today

Three operating principles:

1. Don't pay per attempt.

If a vendor charges per attempt, model the worst case (50% match rate) into your pricing analysis. In most segments, this immediately makes the vendor look 2x more expensive than the headline price suggests.

2. Pick the platform with the most aggregated sources.

Match rate is monotonic in source count, with diminishing returns. The first 5 sources cover ~80% of the gain; the next 5 cover most of the rest. Below 8 sources you're leaving meaningful coverage on the table.

3. Source attribution matters.

Knowing which of the 12 sources delivered each match isn't a vanity feature. It tells you which sources are pulling weight in your specific buyer profile, lets you switch primary sources if your geographic mix changes, and surfaces vendor lock-in risk before it becomes a problem.

The vendors that get all three right become the buyer's enrichment platform of record for the next several years. The vendors that get two of three right become bargain alternatives but lose enterprise. The vendors that get only one right become single-feature tools that show up in the long tail.

Section 8 — The next two years

If the thesis is right, here's the prediction:

By end of 2026, single-source enrichment as a category has cratered. The dollars that flowed there in 2022 now flow to waterfall platforms. Two or three vendors emerge as the dominant Tier 1 category leaders, with a long tail of specialists.

By end of 2027, the AI-agent layer (see the MCP piece) compounds the waterfall layer's importance. The buyer's question shifts from "which enrichment vendor?" to "which waterfall platform exposes the cleanest agent surface?" Tier 1 vendors that didn't ship MCP get demoted; vendors that did become the new defaults.

By end of 2028, "single-source email finder" is a category description used mainly to mock vendors that didn't adapt. The same way "on-premise email server" is used today.

This is roughly how every category-restructuring story plays out: the better economic model wins, the better-architected vendors capture the value, and the buyers whose procurement teams paid attention come out ahead.

Section 9 — Why this is a TexAu blog post

Because we built TexAu around exactly this thesis. 12-source waterfall, pay-on-match, source-attribution returned on every match, free prospecting, transparent published pricing. We're betting that the architecture wins, and that the buyers who internalize this argument will pick the platform that internalized it first.

If the thesis is right, the next two years are going to be unkind to single-source vendors and very kind to waterfall platforms with clean economics. We plan to be the platform that benefits the most. The way to validate the claim isn't to read another marketing page; it's to run a 1,000-row enrichment job through your current vendor and through us, and compare the cost-per-enriched-lead.

Failed lookups cost zero. The math is the entire argument.