Method

Why the WP.org install bucket and raw download count both lie, and how The Plugins Reviews builds a spike-resistant, burst-aware quality signal on top of them, distilled into a single 0-100 score.

How the score works

The headline number on every scored plugin is a 0-100 score: our single, honest read of how good a plugin really is for a real person. It's a transparent composite, not a black box, and every plugin shows its full breakdown. Half the weight is judged by AI reading the actual reviews and readme; half is measured directly from the install and release data.

Loved · 40	Genuine user satisfaction, grounded in the verified (drive-by-filtered) star rating and how much credible evidence backs it, then adjusted by an AI read of whether the praise is real and the complaints serious. Judged by AI.
Trustworthy · 15	How real the reviews and rating are. We detect hollow, contentless, or "never used yet" 5-star reviews, review bursts, and gaps between the shown and earned rating. A few thin reviews are normal; a rating propped up by them is not. Judged by AI.
Fair & Honest · 15	Is the offering honest, or a trap? A genuinely useful free version with an optional paid tier scores full marks, because that's normal freemium. We dock for crippled free tiers, forced lock-in (no way to use your own API key), misleading "free" claims, auto-renewal dark patterns, or acting on a site without consent. Judged by AI.
Maintained · 18	Actively and honestly cared for: recent updates, healthy release cadence, no fake-busy "release pumping". Measured from data.
Momentum · 12	Install trajectory, year over year. Deliberately the smallest dimension: this is a quality site, not a popularity contest. Measured from data.

On top of that, a light trust guardrail scales the total down when the verified rating is genuinely low or there are signs of review manipulation, so an actively-maintained but genuinely bad plugin can't bank a passing score on maintenance alone. It is keyed off the rating and manipulation signals, not the A-F grade, so a beloved but simply-finished plugin is never punished twice for not needing updates.

The AI half is judged by a large language model from the credibility-filtered reviews and the plugin's own readme. It is opinionated and frank, and we hold our own plugins to exactly the same standard, but it is a read of the available evidence, not a verdict. The A-F grade below remains as a stricter, purely mechanical signal alongside the score.

The problem

Two metrics dominate how WordPress plugin quality gets evaluated:

Active installations is a bucket (10k+, 100k+, 200k+) that almost never moves. Useless for trends.
Daily downloads moves daily — but is dominated by release-day auto-update spikes. Every release triggers a wave that can be 10× the organic baseline.

A plugin that releases every 5 days looks like it's getting twice the downloads of one that releases once a quarter. That's not real install demand — it's update-pumping.

Meanwhile, ratings can be bought: a plugin can pay for a burst of 50 five-star reviews in a week and ride that average for years.

Estimating "real" active installs

wp.org publishes only a rounded bucket — 10k+, 20k+, 100k+, 1M+ — and it updates lazily. For a plugin growing from 80k to 120k installs, the bucket can show "80k+" for months before flipping. There's no public non-bucketed number available anywhere; wp.org doesn't expose one.

What we do instead: read the auto-update wave.

WordPress plugin auto-update fires within 1–3 days of a release. The cumulative downloads in that window ≈ the number of active sites that have auto-update enabled for that plugin. We take the median wave volume across the last 3–5 release events (more stable than any single spike day), then divide by an estimated auto-update penetration rate.

That rate varies dramatically by audience: developer-targeted plugins see 40–70% auto-update adoption, while end-user widgets see 8–15%. A single global rate can't capture this honestly, so we report a range bracketing the plausible interval (10% rate → upper bound, 40% rate → lower bound, 25% → midpoint).

Display logic: we show our midpoint estimate when it exceeds the wp.org bucket midpoint (signaling wp.org has lagged behind growth). When our estimate falls below the bucket — common for low-auto-update plugins — we trust wp.org and show the bucket. Both numbers are surfaced so you can judge.

Abandoned plugin detection

WordPress core ships major versions every ~4 months. A plugin that hasn't been updated in over a year has stopped being meaningfully maintained relative to that cadence. We classify by recency:

Stale — 6–12 months since last update. Minor flag. Watch this.
Abandoned — 1–2 years. Hard grade floor of D. Hidden from all best-of lists by default.
Deeply abandoned — 2+ years. Hard grade floor of F. Never appears in recommendations.

In the directory table, abandoned plugins render at reduced opacity with a strikethrough on the name. The "Include abandoned" toggle reveals them. On their individual pages, a prominent red banner tells visitors not to install on a current site.

Spike-resistant downloads (the "shape" classifier)

For every plugin, we classify each day in the history as one of:

Spike — a known release date (from SVN tags), or a day where downloads exceed 2.5× the p25 floor of the trailing 28-day window. We use p25, not median — because for plugins that release often, more than half of recent days are release days, and a trailing median would itself be inflated. p25 stays anchored to the genuine floor.
Tail — up to 3 days after a spike that are still elevated above the floor. This is the rolling auto-update wave.
Organic — everything else. The true new-install signal.

The "baseline" is the median of organic days in the most recent week. The "trend" compares it to the prior week's organic median — never to days inflated by releases.

Review burst detection

The wp.org API exposes total review counts but no per-review dates, so we scrape the public reviews page (rate-limited, incrementally cached). For each plugin we compute:

Max month share — the fraction of all reviews that fell in the single busiest 30-day window. Over 30% is suspicious for plugins with sufficient history.
Coefficient of variation across monthly review counts. Above 1.5 means very bursty distribution; under 0.6 is unusually even (real organic adoption).
Recent burst — the most recent month is 3× or more the trailing 12-month mean. Often a marketing push or paid wave in progress.
5★ wall — near-100% 5★ reviews combined with bursty timing. The two together are a much stronger signal than either alone.

Drive-by reviewers (single-review accounts)

A common review-gaming pattern is to spin up disposable wp.org accounts that exist only to leave one 5-star review. We detect these by fetching each reviewer's wp.org profile page and checking their activity timeline. A reviewer whose entire visible activity is one created topic — and that topic is the review they just left — is tagged as a "1-shot" account.

Two ratings are surfaced for every plugin:

Raw rating — what wp.org shows. Includes all reviewers.
Verified rating — recomputed from our scraped sample, excluding 1-shot accounts.

Calibration: 30–40% of reviewers on healthy popular plugins are 1-shot accounts — most happy WordPress users never engage with the forums except to leave one review of a plugin they love. So drive-by share alone is a weak signal. We only flag it when it's combined with a review burst or a 5★ wall (that's the actual fake-review pattern), or when it crosses 70% (where even casual reviewers wouldn't all be one-and-done).

Profile fetches are cached for 90 days per username — the same drive-by accounts often appear across many plugins, so the cache hit rate grows quickly.

Distribution-backed plugins

Some authors run plugins inside a parent product with millions of installs (Elementor, Yoast, Awesome Motive, Automattic, …). When the parent product cross-promotes a plugin from inside the WordPress admin, install growth looks like organic demand but is largely captive distribution. We surface a distributed by X badge on these so the trend numbers are read with that context — not as an accusation, just as background.

The grade

The grade is a sum of two scores:

1. Rating-derived base (uses the verified rating when available, otherwise wp.org's reported rating; gated to plugins with ≥5 ratings to avoid noise on brand-new plugins):

≥ 4.0★	0 pts
3.5–4.0★	1 pt
3.0–3.5★	2 pts
2.0–3.0★	3 pts
< 2.0★	4 pts

2. Flag severity — sum of the red flags' severity (1 for minor, 2 for major).

The total maps to a letter:

A	0 pts. Solid rating, no concerns.
B	1 pt. One minor signal or rating in the 3.5–4.0 range.
C	2 pts. One major signal (pumping, review burst, 5★ wall, drive-by majority) or rating in the 3.0–3.5 range.
D	3 pts.
F	4+ pts.

This means: a 1.5★ plugin with no fraud flags is still graded F, because being genuinely bad is also a quality signal. And a 4.9★ plugin gamed via drive-by reviewers gets dragged down by the flag severity, not the rating.

When the plugin's install base or history window is too small for confident classification, we mark confidence: low on the page — the grade is still shown but should be read as preliminary.

What we don't claim

A high grade is not a recommendation, and a low grade is not proof of fraud. The signals here are statistical patterns; they correlate with the behaviors they're named for, but exceptions exist.

A plugin can be flagged for "review burst" because it was featured in a popular newsletter. A plugin can be flagged for "pumping" because it's in active development before a major release. Use the grade as a starting point, not a verdict.

Data sources

api.wordpress.org/stats/plugin/1.0/downloads.php — daily download counts
api.wordpress.org/plugins/info/1.2/ — metadata, ratings, changelog
plugins.svn.wordpress.org/<slug>/tags/ via WebDAV PROPFIND — authoritative per-version release dates
wordpress.org/support/plugin/<slug>/reviews/ — scraped at 1 req/sec, cached incrementally

No private data, no scraping of user profiles, no calls outside the public wp.org surface.