Vendor landscape · 4 min read

Who Actually Profits in Scraping?

The bootstrapped and PE-owned proxy-data incumbents fund growth from their own profits; the venture-backed AI-scraper cohort runs on raised capital, and Reworkd's 2025 shutdown shows where that ends.

By Signal Census Editorial Scraping Profitability
All articles
Who Actually Profits in Scraping? editorial image
Bright Data
Bright Data · vendor signal

The most durable businesses in web scraping are the ones that never depended on a venture round. Bright Data , owned since 2017 by London private-equity firm EMK Capital, has crossed roughly $300mn in annual recurring revenue and describes itself as “highly profitable,” while the venture-funded AI-scraping cohort that was supposed to disrupt it is burning capital — and one of its most-hyped names has already closed.

That contrast is the real story of the industry’s economics. Proxy and structured-data incumbents fund their own growth; the AI-native challengers fund theirs with someone else’s money. When the market separates the two, profitability — not product novelty — is what holds.

The incumbents grow on their own cash

Bright Data, formerly Luminati Networks, was bought by EMK Capital out of Hola in 2017 in a deal valued at about $200mn. Eight years on, the company says it has passed $300mn in ARR, growing more than 50% year over year, and expects to reach $400mn by mid-2026. Those are revenue figures the company has chosen to disclose; it has not published margins, and as a private business it is under no obligation to. The “highly profitable” label is its own.

Structurally, the claim is plausible in a way it would not be for a pure software startup. A residential-proxy and dataset business of this scale runs on infrastructure and bandwidth it has already built and amortised. Incremental revenue from existing capacity carries low marginal cost. That is the textbook shape of an operation that can both grow fast and generate cash — the same shape that let EMK hold the asset for eight years rather than flip it, and that now positions Bright Data as web-data infrastructure for AI at a valuation reported in excess of $1bn.

Lithuania’s Oxylabs, built inside the Tesonet venture studio and never independently venture-funded, sits in the same category. It operates one of the largest commercial residential-proxy pools — more than 102mn IPs by its own count — and has grown without an outside funding round to point to. Its revenue and margins are undisclosed, so any profit claim rests on the same structural reasoning rather than reported numbers: a bootstrapped proxy network at scale that has run since 2015 is, by definition, covering its costs.

Bootstrapped by necessity, not by virtue

These companies are disciplined because they had to be. Apify, the Prague-based scraping platform, makes the logic explicit: it raised only about $0.5mn of seed capital before 2019, then began bootstrapping, and has said that relying on profit to grow “taught us valuable lessons of frugality and efficiency.” Its chief executive, Jan Čurn, reported record 2023 revenue of $7.5mn and about $1mn of profit, growing 80% year over year, on a team of roughly 75 — small numbers beside Bright Data, but real ones, and self-funded until it took a modest €2.8mn round in 2024.

The pattern across all three is the same. None spent years subsidising usage to buy growth, because none had a balance sheet that allowed it. The constraint produced the moat: a cost base that revenue actually covers, and pricing set to clear a margin rather than to win a land grab.

The venture cohort runs on raised capital

The AI-scraping wave arrived with a different premise — that large language models would make the old proxy-and-parser stack obsolete, and that the company which wrapped scraping in an agent would take the market. The capital followed the thesis.

Firecrawl, the most prominent of these, has raised $16.2mn in total, the bulk of it an August 2025 Series A led by Nexus Venture Partners, with Shopify chief executive Tobias Lütke and Y Combinator participating. The company says it serves more than 350,000 developers, and its chief executive, Caleb Peffer, has said Firecrawl is “already profitable” — a notable claim for a recently funded startup, though it has disclosed neither revenue nor margins to substantiate it. On the disclosed facts, Firecrawl is a venture-backed company that has raised more than it is known to earn.

Reworkd is the cautionary case

The clearest evidence that the venture route is harder than it looked is Reworkd, which built AI agents to generate scraping code on the fly and raised about $4mn — a $1.25mn pre-seed and a $2.75mn seed — from an investor list that included Paul Graham, Nat Friedman and Daniel Gross’s AI Grant, SV Angel and General Catalyst. It announced it would sunset the product on 6 February 2025. The company did not detail its reasons publicly, but the outcome is unambiguous: a well-backed, well-connected AI-scraping startup ran out of room before it found a durable business.

One shutdown does not condemn a category. But it sharpens the question every buyer and investor in this market should now ask: which of these companies could survive a year with no new funding? The incumbents have answered it for eight years running. The challengers are still raising the money that would let them find out.

What durable economics look like here

Web data is closer to an infrastructure business than a software one. The defensible position belongs to whoever owns the proxy network, the bandwidth and the refresh pipeline at scale — assets that take years and cash to build and that throw off margin once built. That is why a PE-owned incumbent and a pair of bootstrappers are the ones reporting growth funded from profit, while the agent-layer startups, however elegant, are still buying their way to relevance.

The test ahead is not who ships the cleverest extraction agent. It is who still has a business when the funding that subsidises the clever ones runs dry.