Stack Overflow's OpenAI Deal: Two Years On, What Leaked
May 2024 Stack Overflow-OpenAI deal terms stayed private. Subsequent leaks point to multi-year licensing with seven-figure annual base + usage component. OverflowAPI subscriptions stayed modest. Developer backlash contributed to measurable question-volume slowdown through 2025.
When Stack Overflow announced its OpenAI licensing deal in May 2024, the financial terms were not disclosed. Two years on, the deal’s specific structure remains private but enough signal has leaked through subsequent earnings calls, regulatory filings, and developer-community reaction to map the broad shape. The deal is meaningful but smaller than the Reddit-Google equivalent. The most consequential effects are on the developer-side reaction rather than on the deal’s direct revenue.
The patterns visible in the Stack Overflow case extend across the broader publisher-data-licensing landscape. Deals are getting done. The revenue is real. The community pushback is consistently larger than the deal-makers anticipated, and the second-order effects on platform health (engagement, contribution, growth) are measurable.
What can be reconstructed
The pieces of the Stack Overflow-OpenAI deal that have leaked or been disclosed since May 2024:
Multi-year commitment. Subsequent Prosus (Stack Overflow’s parent) earnings calls referenced “long-term licensing arrangements with leading AI companies” without naming OpenAI specifically, but the timing and reporting context point at the May 2024 deal. The implied duration is 3-5 years.
Base + usage structure. Bloomberg reporting from late 2024 suggested the deal includes a fixed annual base payment plus usage-based components tied to API call volume from OpenAI’s grounding/retrieval workflows. The exact split was not disclosed.
Seven-figure base annually. Triangulating from Prosus financial disclosures and industry-source reporting, the base payment is in the low seven-figure range — meaningfully below the $60mn-per-year Reddit-Google deal but consistent with the relative content scale.
OverflowAPI as the technical layer. Stack Overflow shipped OverflowAPI specifically to enable the licensed access. The subscription numbers on OverflowAPI (paid developer access separate from the OpenAI deal) have not been disclosed but third-party estimates put commercial OverflowAPI revenue in the low millions annually — not a meaningful contributor to Stack Overflow’s overall revenue.
The developer backlash
The May 2024 announcement triggered immediate developer-community pushback. Multiple high-reputation Stack Overflow users publicly announced they were deleting their answers or modifying contributions in protest. Stack Overflow’s moderation team initially threatened bans for content deletion, then walked back the threat after broader community criticism.
The measurable second-order effects through 2025-2026:
Question-volume growth slowed. Stack Overflow’s annual question volume has been declining since roughly 2022, partly because LLM assistants are answering programming questions directly. The May 2024 backlash accelerated the slowdown. Q2 2024 vs Q2 2025 question volume comparisons show a steeper drop than the trailing trend predicted.
High-reputation user activity dropped. A measurable share of users with 50k+ reputation reduced their answering activity or stopped entirely. The contributions that remained came disproportionately from newer, lower-reputation users, which shifted the quality distribution downward.
Stack Exchange network engagement softened. The Stack Exchange family of sites (Stack Overflow plus 170+ topical communities) saw consistent engagement declines across most subdomains, with the pattern more pronounced on the technical-content sites than on the non-technical communities.
The deal’s direct revenue contribution was probably positive for Stack Overflow’s financials. The indirect cost to platform health was larger than the direct revenue. Whether the net was positive or negative depends on how the platform’s long-term value is being weighed against the near-term cash flow.
What the Stack Overflow case predicts
Three forward-looking implications visible from this pattern.
Publisher-deal terms will trend toward usage-based. The fixed-base-plus-usage structure that Stack Overflow used is the model emerging across publisher-AI deals. Pure flat licensing exposes both sides to volume risk. Usage-based aligns incentives but requires technical infrastructure (logging, attribution, billing) that older publishers may not have.
The community-management cost is structural. Publishers entering licensing deals need to budget for substantial community-management work. Stack Overflow’s case shows that surprise announcements without prior community consultation generate sustained backlash. The publishers handling this better (some news publishers, certain academic-content vendors) consulted contributors and community moderators before the deals were signed.
The second-order revenue effects are large. A small direct-revenue deal (low seven figures annually) can produce double-digit-percent declines in engagement metrics on the platform. The net economic effect on platform value can be negative even when the deal itself is profitable. Publisher-side discount rates on data-licensing deals should price in this risk.
What this means for scraping
For the scraping infrastructure ecosystem, the Stack Overflow case is informative in three ways.
Licensing reduces scraper supply at the top. Once Stack Overflow had a licensed offering, the demand from foundation labs for scraping Stack Overflow content dropped meaningfully. The licensed channel is preferred when available — it’s legally cleaner, technically cleaner, and the publisher cooperates with the requester rather than defending against them.
Mid-tier scraper demand persists. Mid-volume buyers (smaller AI companies, dataset compilers, academic research) that can’t justify the licensing fee continue to scrape. The scraping segment for Stack Overflow content is smaller than it was pre-2024 but is not zero. The Apify Store hosts Stack Overflow scrapers that serve this band.
The pattern is platform-specific but the structure is general. Every publisher with valuable defendable data faces the same choice — license to the top tier and accept the community-cost, or refuse to license and accept the scraping. Most publishers will eventually license because the financial pressure outweighs the community pressure over time. The Stack Overflow case is the canonical example of this trade-off being made.
For Apify Store publishers building actors against Stack Overflow or equivalent technical-content platforms, the practical posture is to expect the platforms to continue adding licensed channels that compete with scraping. The scraping channel survives in the segments where the licensing channel does not reach (smaller buyers, specific use cases the licensed product does not support, lower-volume needs). The licensing channel will progressively absorb the high-volume top-tier buyer.
The longer-term equilibrium for technical-content platforms looks like the broader publisher-data landscape: a small number of large licensed deals capturing most of the high-value buyer revenue, plus a smaller scraping segment serving the long tail. Stack Overflow is two years into that transition; most other technical-content platforms are still in the early phases.
Sources