Cloudflare Just Made Scraping a Pricing Problem
Cloudflare's pay-per-crawl beta made HTTP 402 a real billing primitive for the first time. Combined with the default-block toggle on every new domain, scraping vendors now negotiate prices, not bypasses.
On July 1, 2025, Cloudflare flipped the default on every newly onboarded zone: AI crawlers are now blocked unless the site owner opts in. Existing customers got a one-click toggle in their dashboard. The change rolled out simultaneously with a private beta of “pay per crawl” — a billing system built on the long-dormant HTTP 402 Payment Required status code.
Cloudflare runs in front of roughly 20% of the public web. A change in their defaults is a change in the substrate. For commercial scraping vendors and the publishers who buy from them, this is the largest infrastructure shift since the original Cloudflare WAF rolled out a decade ago.
The notable part is not the block. It is the price.
What pay-per-crawl actually does
The mechanic is straightforward. A crawler hits a Cloudflare-fronted URL. Cloudflare returns HTTP 402 with a signed header indicating the per-request price the publisher has set. The crawler can either (a) walk away, (b) pay the published rate via a Cloudflare-administered settlement, or (c) be on the publisher’s whitelist and bypass the meter entirely.
The price is set per-site by the publisher, not negotiated per-crawler. The “marketplace” is one-sided: take it or leave it. Bot operators who want broad access to a Cloudflare-protected corpus end up with a per-domain rate card and a settlement bill at month-end.
This is the part the press undercovered. Pay-per-crawl is not primarily a tool for blocking bots. It is a tool for pricing bots. The 402 response is a quotation, not a refusal.
For a scraping vendor, that changes the engineering problem fundamentally. The historical job was bypass: given a hostile Cloudflare configuration, can we render the page? The new job — for the subset of sites that opt into pay-per-crawl — is procurement: what is the per-page rate, and is the cost-per-record still under the customer’s budget?
The bot mix has already shifted
Cloudflare publishes aggregate crawler statistics, and the picture in 2025 is dramatically different from 2024. GPTBot rose from 5% to 30% of identified AI crawl traffic in the year ending May 2025. Bytespider — ByteDance’s crawler — collapsed from 42% to 7%, almost certainly because Cloudflare and major publishers actively blocked it. Meta-ExternalAgent debuted at 19%.
The composition tells you what the demand actually is. Cloudflare’s analysis shows 80% of AI crawl traffic is for training data, 18% is for search-style indexing, and 2% is for user-action agents. The pay-per-crawl pricing problem is therefore primarily a training-data pricing problem. The agents you read about — Operator, Mariner, Computer Use — represent a sliver.
For scraping infrastructure vendors, that has a concrete implication: the customers who will pay metered rates to Cloudflare are the foundation labs, not the agent-builder startups. The same buyer profile that drives Bright Data’s enterprise revenue is the buyer profile that will end up settling Cloudflare invoices.
The fingerprint side moved at the same time
Pay-per-crawl is the carrot. The stick moved too. In February 2025 Cloudflare rolled out a browser-validation challenge that fingerprints real hardware via a Chrome bug, breaking many headless setups that had previously survived JS challenge passes. A coordinated set of detection upgrades has continued through 2025, and the effect on the scraping side is visible in the open-source signal: in February 2026 the maintainers of puppeteer-extra-stealth officially deprecated the package, conceding that patching Chromium flags is no longer viable as a general technique.
The combined message is clear. Cloudflare-protected sites in 2026 are either (a) paying customers of pay-per-crawl, in which case scrapers settle through the meter, or (b) not, in which case detection is now hard enough that the bypass cost is higher than the pay-per-crawl cost would have been. Either way, scraping is becoming a line item with a forecastable unit price.
How actor pricing absorbs the fee
For publishers in the Apify Store — and especially those operating per-event pricing — pay-per-crawl introduces a new cost component that did not exist in the financial model six months ago. An actor running against a pay-per-crawl-enabled target will, in steady state, owe Cloudflare a per-request fee on top of compute, on top of proxy, on top of CAPTCHA solver budget.
Most actor pricing on the Store today does not bake in this cost because it does not yet exist for the targets the actors hit. That will not last. As more publishers opt in — and Cloudflare has every incentive to make opt-in painless — actor publishers will face a choice: absorb the per-page fee into their PPE rate, pass it through transparently as a meter charge, or stop targeting paid sites.
The PPE model on Apify is well-suited to passing the fee through. It is straightforward to inject a “Cloudflare crawl fee: $0.0042” line into the per-event pricing of an actor that hits a paid target. The model that breaks under this is the unlimited-runs subscription model — which is exactly the model the long-tail “spray” actors on the Store rely on. Their unit economics do not survive a per-page floor.
What does not change
Two things worth being precise about.
First, only paying Cloudflare customers can opt into pay-per-crawl. The default-block applies only to new zones that have AI crawler controls available on their plan. Sites on free Cloudflare or older zones that have not flipped the switch still allow crawlers as before. The shift is therefore concentrated in the high-value tail of the web — exactly the slice scrapers care about most.
Second, logged-out scraping doctrine is unchanged. The Bright Data v. Meta ruling still holds: Meta’s Terms of Service do not bind logged-out scraping or resale of public data. Pay-per-crawl is a contractual relationship between the publisher and Cloudflare; it does not create new copyright rights or new ToS reach.
But it does create a new economic friction. Scraping at scale, against the most valuable corpora on the web, now has a unit price. That price will compete with the cost of bypass, the cost of licensing, and the cost of just buying the data from Bright Data. The market for web data is becoming legible in a way it was not a year ago.
For the first time, you can imagine a forward curve.
Sources
- Cloudflare: Introducing pay per crawl
- Cloudflare: Introducing AI Crawl Control
- Cloudflare: From Googlebot to GPTBot — who’s crawling your site in 2025
- NiemanLab: Cloudflare blocks AI scraping by default
- Cloudflare Radar: Year in Review 2025
- Browser-use blog: Bot detection in 2026
- Bright Data v. Meta ruling summary