AI agents · 6 min read

MCP Ate the Scraping API in Sixteen Months Flat

Anthropic's Model Context Protocol went from launch to 5,000+ community servers in sixteen months. Every serious scraping vendor now exposes one. The fight is not whether to ship MCP — it is what tool surface to expose.

By Signal Census Editorial April 28, 2026

Bright Data · vendor signal

Anthropic's Model Context Protocol went from launch to 5,000+ community servers in sixteen months.

Apify · MCP integration

Apify exposes its 20,000+ Actor catalog as discoverable tools through the Model Context Protocol — pay-per-use, no platform account required.

When Anthropic released the Model Context Protocol in November 2024, the framing was modest: an open standard for connecting LLM clients to tools and data sources, structured roughly like USB-C for AI. Sixteen months later there are more than 5,000 community-maintained MCP servers, every major LLM client speaks the protocol natively, and every serious commercial scraping vendor — Apify, Bright Data, Firecrawl — ships one as a first-class product surface.

The speed of adoption is what the press has emphasized. The architectural shape of what got adopted is what matters more, and it changes how scraping infrastructure gets built and sold from here.

The shape of what shipped

Three patterns have emerged in MCP-server design across the scraping vendors, and they are not interchangeable.

Apify ships an MCP server that exposes its entire 20,000+ Actor catalog as discoverable tools. A client connecting to the Apify MCP can ask “what actors are available for scraping LinkedIn jobs” and get a structured tool list, then invoke any of them by name with their typed input schema. Crucially, Apify supports x402 (USDC settlement on Base) and Skyfire payments, which means an LLM client can call a paid actor without ever creating an Apify account. The MCP server is, in effect, a per-call marketplace.

Bright Data ships an MCP server with a fixed tool surface focused on its core unblocking and dataset products: search, scrape_as_html, scrape_as_markdown, plus dataset-specific tools for LinkedIn, Amazon, and a handful of other targets. AIMultiple’s benchmark crowned Bright Data the leader at 76.8% task success and 48.7s average completion. The product surface is small and the success rate is high.

Firecrawl ships eight tools (scrape, batch, crawl, search, extract, map, plus two async variants), priced through its existing per-page metering. The MCP wrapper sits on top of the existing API and inherits the same credit economics.

The three approaches reflect three different bets on what the MCP buyer wants. Apify is betting that an LLM client wants the longest tail of capabilities possible, and will tolerate per-call discovery friction to get it. Bright Data is betting on the smallest possible tool list with the highest possible success rate. Firecrawl is betting that buyers want the cleanest possible “page in, markdown out” surface and nothing else.

Why this matters more than another integration story

If MCP were just a new way to call existing scraping APIs, the story would be a one-week press cycle and an SDK update. It is bigger because of two structural things the protocol does.

It standardizes tool discovery. Before MCP, a developer integrating a scraping vendor wrote vendor-specific SDK code, vendor-specific authentication, and vendor-specific schema mapping. Switching from ScrapingBee to ZenRows meant a rewrite. Under MCP, switching vendors means changing a connection string. That is a significant erosion of integration lock-in across the entire scraping API category.

It re-prices the per-call unit. When an LLM client calls a tool through MCP, the cost of that call is visible to the LLM client. The agent can — and increasingly does — choose between competing tools based on observed cost and observed success rate. This is something traditional REST APIs could not do, because the calling code was hard-coded to one vendor at integration time. Under MCP, the choice happens at runtime, by the agent.

The combined effect is that scraping infrastructure is being commoditized at the integration layer. Differentiation has to come from somewhere else: success rate, price, tool surface breadth, or distribution channel inside MCP-compatible clients.

What the AIMultiple benchmark shows

The AIMultiple benchmark ranking Bright Data first at 76.8% success is worth reading carefully. The number reflects a specific test set — agent-style tasks against e-commerce and reference targets — and the gap between Bright Data and the next vendors is real but not enormous. The point is not who won; it is that the test exists at all.

A year ago, comparing scraping vendors meant pricing pages and case studies. Now it means measured success rates on a standardized agent benchmark. That is an information shift in the buyer’s favor. Vendors who succeed on the benchmark will be cited in agent-builder documentation; vendors who do not will be invisible. The “no public benchmark” era of scraping vendor selection is over.

What MCP changes for actor publishers

For publishers in the Apify Store, MCP changes the distribution math.

The historical buying journey for an Apify Actor was: a developer browses the Store, finds a relevant Actor, reads its README, signs up for an Apify account, configures the Actor, and runs it. Every step has friction. Every step has a drop-off curve.

The MCP path collapses that to one step: an LLM agent connected to the Apify MCP server discovers the Actor at runtime and invokes it. No account, no README, no configuration UI. Just a typed tool call with structured input and structured output.

That is good news for any Actor whose value is directly legible from its input schema and a one-paragraph description. It is bad news for any Actor whose value lived in a long README, a marketing landing page, or word-of-mouth in the Apify Discord. The Actors that get found are the Actors whose surface area is machine-readable.

The implication for naming, schema design, and description copy is concrete. If your Actor’s input schema does not parse cleanly to a function signature, an LLM agent will not call it. If your Actor’s description is more than two paragraphs, the relevant signal is buried below the part the agent reads. The Q1 2026 censuses on this site already showed how compressed the buyer-facing surface is — the lead-extractor segment is dominated by actors that fit their entire pitch into the title. MCP intensifies that pressure another order of magnitude.

The thing nobody is paying for yet

There is one large unresolved question: who pays the LLM tokens consumed during MCP tool selection?

When an agent calls an Apify Actor through MCP, the agent first reads the tool list (potentially thousands of entries), then reasons about which tool to call, then formats the call, then parses the response. All of that is LLM token spend, and at scale it dwarfs the per-call price of the underlying scraping API.

For the Apify business model — which monetizes the run, not the discovery — that token cost sits on the buyer side of the ledger. For the Anthropic and OpenAI business models — which monetize the token — it is pure upside. The uncomfortable reading is that MCP is, in part, a mechanism for routing scraping demand into LLM-token consumption. That is good for foundation labs. Whether it is good for the scraping vendors depends on whether they can keep their per-call rate above the marginal token cost of having an agent route around them.

It is the sort of equilibrium question the next benchmark cycle will answer. For now, every scraping vendor ships an MCP server, the protocol works, and the integration layer is genuinely commoditizing. The next twelve months are about which vendors successfully move differentiation upstack — and which become invisible inside the tool-selection loop.

Sources