This is the methodology that governs every Signal Census report and scorecard published while v1.0 is the active version. It is dated, public, and supersedable. Any later version that changes scope, scoring, or sourcing is published under a new version number with an explicit supersedes field, and earlier reports remain bound to the version under which they were scored.
Methodology comes before scoring. Scores are downstream of these rules; if the rules change, the scores must be re-run.
1. What a Signal Census report covers
A Signal Census report covers a single product category for a single quarter. The category boundary is named explicitly in the report’s title (for example, “job-board scrapers”) and the inclusion rules below decide which products enter the census.
A product is in scope when it meets all of the following:
- It is a product, not a code repository. A tool is in scope if it is sold, hosted, or operated as a usable product — whether by a vendor, a marketplace listing (e.g. an Apify actor), or a hosted API. A bare GitHub repository that requires the user to deploy and maintain their own infrastructure is out of scope for v1.0.
- It is generally available. Closed beta, waitlist-only, or invite-only products are excluded. The product must be purchasable or self-serve at the time of census.
- It is in the named category. A general-purpose web scraping platform is included in a category-specific report only if the vendor markets a category-specific product or pre-built workflow. A horizontal platform without a category-specific offering is out of scope for that report.
- It is operated commercially or as a sustained free service. A product abandoned by its publisher (no commits, no support, no updates over the prior two quarters) is excluded.
A product is out of scope when:
- It is a bare open-source library or repository with no hosted or commercial offering.
- It lacks evidence of sustained operation, maintenance, and user-facing support.
- It is a vendor’s internal tooling not sold to third parties.
- It is a wrapper around another in-scope product whose only differentiation is repackaging.
2. Census sourcing
For each report we maintain a discovery protocol that lists the channels searched and the dates searched. The standard channels for v1.0 are:
- The Apify Store, filtered by category-relevant keywords
- Vendor directories (G2, Capterra, AlternativeTo) filtered by category
- Direct vendor websites discovered via search engines using the category’s primary keywords
- Reddit, Hacker News, and developer forum threads from the prior twelve months
- The previous Signal Census report in the category, if one exists
The discovery output is a list of candidate products. Each candidate is then evaluated against the inclusion rules in §1. The full discovery list — including products that were excluded and the reason for exclusion — is published as part of the report’s downloadable dataset. This is what makes the census reproducible: a reader can take the same channels, run the same search, and produce a list that overlaps materially with ours.
3. Scoring axes
Each in-scope product is scored on six axes, each on a 0–10 scale. The axes were chosen because they can be evaluated from public evidence and because they together describe the practical question a buyer asks: will this tool give me the data I need, reliably, at a cost I can predict?
- Coverage — does the product address the category’s full range of sources, or only a narrow slice? Scored against the category’s enumerated source list (which is itself published with the report).
- Extraction quality — when the product runs, does it return complete and accurate records? Scored against a fixed sample of test queries with a predefined expected field set and manually reviewed completeness and accuracy criteria. Absolute ground truth is not always available; where judgment is required, the criteria and reviewer are recorded with the score.
- Reliability — does the product complete its runs without silent failures, partial output, or undocumented limits? Scored primarily from our own repeated test runs. Publisher-documented error rates and operator-reported incidents are used only as secondary, corroborating evidence where they exist. Reliability is evaluated over a fixed minimum number of repeated runs per product within the census window; the number and distribution of runs are recorded in the report’s dataset.
- Freshness and maintenance — is the product actively maintained? Scored from changelog cadence, last release date, and documented response to source-site changes in the prior two quarters.
- Documentation and usability — can a competent technical buyer evaluate, configure, and operate the product from public documentation alone? In v1.0 this axis measures evaluability and operability from public documentation, not end-user UX in a broader sense. Scored against a fixed checklist (input schema clarity, error handling, output schema, sample code, support channels).
- Pricing transparency — is the cost predictable in advance? Scored from whether public pricing exists, whether the unit of cost is published, whether a worked example is given, and whether trial conditions are stated.
Missing evidence. Where public evidence required by an axis does not exist — no changelog, no public pricing, no public documentation — the product is scored as having missing evidence on that axis, not given benefit of the doubt. Opaque vendors score lower than transparent ones; this is intentional.
Overall score. A product’s overall score is the unweighted arithmetic mean of the six axis scores, rounded to one decimal. v1.0 uses equal weighting deliberately: the correct weights depend on an individual buyer’s priorities, and a fixed weighting scheme would hide that subjectivity behind false precision. Readers who weight differently can recompute from the per-axis scores in the dataset.
Ties. When two products have the same overall score to one decimal, they share the same rank in the published ranking. Where a strict ordering is required (for example, a single “top pick”), ties are broken first by the Reliability score, then by the Extraction quality score, then alphabetically by product name. The tie-breaker sequence is fixed and published here rather than chosen per report.
4. Conflict of interest
When the author of a Signal Census report has a commercial relationship with a product in the census — including authorship of the product, an affiliate relationship with the vendor, or paid consulting — the report carries a conflict of interest declaration at the top of the page. The declaration names the relationship and the products affected. An independent reviewer is named in the report’s frontmatter and must have signed off on the scoring of the affected products. The reviewer’s name is published in the page’s structured data. The full COI policy is at /coi-policy.
Signal Census does not artificially suppress conflicted products from rankings. If the methodology produces a high score for a conflicted product, that score is published. Suppressing a methodologically derived result would be a different form of dishonesty.
5. Limitations of v1.0
Several limitations are known and accepted in v1.0:
- No support axis. Vendor support quality varies across plan tiers and over time, and we do not yet have a consistent way to measure it from public evidence. We may add this axis in v2.0.
- No GitHub repos. Open-source-only tools that require the user to self-host are excluded for v1.0. They will likely be a separate report category in a later version, scored against different axes appropriate to that distribution model.
- No vendor responses. v1.0 does not solicit pre-publication responses from vendors. Vendors may submit corrections through /contact; accepted corrections appear in the page’s per-page changelog.
- No paid testing. Scoring is based on free tiers, trial accounts, and publicly observable behavior. A paid-tier evaluation may produce different scores; this is noted on each scorecard where it applies.
6. Supersession and corrections
This document is v1.0. Future methodology versions will be published at /methodology/v2, /methodology/v3 and so on, each with an explicit supersedes field naming the prior version. A report published under v1.0 remains bound to v1.0 even after a successor version is published; we do not retroactively rescore.
Corrections to this methodology — including clarifications, scope boundary refinements, and fixes to ambiguous language — appear in the per-page changelog at the bottom of this page. Substantive changes that alter scoring outcomes require a new version number, not an in-place edit.
Reports for which v1.0 is the binding methodology are listed at /data.