Why We Gave Bugs an ELO Rating

· Zoltan Erdos
A gritty, high-tech industrial leaderboard displaying bug reports as metallic chess pieces, with a glowing ELO score next to each, dark cyberpunk atmosphere, neon red and green lighting, dramatic shadows

A gritty, high-tech industrial leaderboard displaying bug reports as metallic chess pieces, with a glowing ELO score next to each, dark cyberpunk atmosphere, neon red and green lighting, dramatic shadows

We run 80+ MCP tools across half a dozen Cloudflare Workers. Until last week, errors went to console.error() and disappeared. Users had no way to report bugs. And our rate limiting was a flat per-IP window — the same limit for a first-time visitor and a power user who’d been on the platform for months.

This is the story of how we fixed all three problems with one system.


The Problem: Internal DDoS

When you expose 80 tools to the internet via MCP, you’re one bad actor away from a very expensive afternoon. A single agent in a retry loop can hammer your Workers, your D1 database, your KV store — and you can’t distinguish it from a legitimate user making rapid tool calls.

IP-based rate limiting doesn’t cut it. Agents share IPs. VPNs exist. And the real question isn’t “how many requests per minute?” — it’s “should this caller be trusted at all?”

We needed a reputation system.

Why ELO?

We already had an ELO engine in our codebase — the chess package uses it for player rankings. ELO has a beautiful property: it’s a single number that encodes trust, and it adjusts itself based on behavior.

The insight was: ELO isn’t just for chess players. It works for anything that competes.

So we gave ELO ratings to two things:

  1. Bugs — which bugs matter most?
  2. Users — who should be trusted with platform access?

Bug-vs-Bug ELO

Every bug starts at ELO 1200. When someone reports a bug, that bug “wins” a match against a random bug in the same category. Its ELO goes up; the other’s goes down.

This means:

  • Frequently reported bugs climb the leaderboard
  • Old bugs that nobody encounters anymore decay naturally
  • The ELO leaderboard is a live priority list that maintains itself

We didn’t want product managers manually triaging bugs. We wanted bugs to tell us how important they are through the collective signal of user reports.

Bug Lifecycle

CANDIDATE → ACTIVE → FIXED → DEPRECATED

A bug starts as a CANDIDATE when first reported. After 3 independent reports, it auto-promotes to ACTIVE. When fixed, it’s marked FIXED. If it’s irrelevant for 5+ sessions, ELO decay pushes it toward DEPRECATED.

User ELO: Trust as a Number

Every user (and their agents) starts at ELO 1200. Actions shift the score:

EventELO Change
Report a valid bug+25
Bug you reported gets confirmed+10
Successful tool use+1
False bug report-15
Hit a rate limit-5
Abuse flag-50

There’s a daily gain cap of +100 to prevent gaming. The math is adapted from chess K-factors — new users have higher volatility (K=40) that stabilizes over time (K=16 above ELO 2400).

Tier Gating

ELO maps to three tiers:

TierELO RangeEffect
Free0–9994x rate limit multiplier
Pro1000–1499Standard access
Elite1500+Full access, lower limits

A brand-new user (1200 ELO) starts as Pro. Good behavior keeps them there. Abuse drops them to Free tier, where rate limits are 4x stricter — effectively 30 requests per minute instead of 120.

This is the anti-DDoS mechanism. It’s not a wall; it’s a gradient. Bad actors experience progressively worse performance until the platform becomes unusable for them, while legitimate users never notice.

Tools Can Require Tiers

Any MCP tool can declare a requiredTier:

{
  name: "expensive_operation",
  requiredTier: "elite",
  // ...
}

Most tools have no tier requirement. But computationally expensive operations (image generation, code compilation) can require Pro or Elite access. This prevents a zero-reputation agent from burning through GPU credits.

Feedback Tools Everywhere

Every MCP server in our platform now has a feedback tool:

ServerTool Name
spike-land-mcpmcp_feedback
mcp-image-studioimg_feedback
hackernews-mcphackernews_feedback
esbuild-wasm-mcpesbuild_feedback
openclaw-mcpopenclaw_feedback

When an agent encounters an error, it can report it through the closest feedback tool. The report flows to the central Bugbook in spike-edge, where it’s matched against existing bugs (by error code + service name) or creates a new entry.

For Cloudflare Workers, this uses service bindings — zero-latency, zero-cost inter-Worker communication. For Node.js MCP servers, it’s an HTTPS POST to the edge API.

The key insight: agents are the best bug reporters. They hit edge cases humans never would. They can describe the exact tool call, parameters, and error message. And they report immediately — no Jira ticket sitting in a backlog for three sprints.

Centralized Error Collection

Every unhandled error in spike-edge now gets logged to D1 via waitUntil:

app.onError((err, c) => {
  c.executionCtx.waitUntil(
    c.env.DB.prepare(
      "INSERT INTO error_logs (...) VALUES (...)"
    ).bind(/* ... */).run()
  );
  return c.json({ error: "Internal Server Error" }, 500);
});

This replaces console.error() with structured, queryable error data. Service name, error code, stack trace, request metadata — all searchable. The Bugbook API can correlate user reports with actual errors, closing the loop between “user says something is broken” and “here’s the stack trace.”

The Public Bugbook

The Bugbook is public at /bugbook. Anyone can see:

  • All active bugs, ranked by ELO
  • Bug detail pages with report history and ELO timeline
  • A leaderboard showing top bugs and top reporters

Users who report bugs can track their reports and see when bugs get fixed. This is deliberate transparency — if you report a bug, you shouldn’t have to wonder if anyone noticed.

Authenticated users can:

  • Submit new bug reports
  • Confirm existing bugs (“I have this bug too”)
  • View their own report history

Blog Comments with ELO Consequences

We also added comments to all blog articles. Logged-in users can comment at any point in an article, and other users can upvote or downvote comments.

Here’s where it gets interesting: if a comment accumulates a score of -10 or lower (overwhelmingly downvoted), the comment author receives an abuse_flag ELO event (-50 points). This is the community self-moderating — if you post spam or abuse in article comments, the community’s downvotes directly impact your platform reputation.

Architecture Summary

                    ┌──────────────────┐
                    │   spike-app      │
                    │   /bugbook UI    │
                    └────────┬─────────┘
                             │ HTTPS
                    ┌────────▼─────────┐
                    │   spike-edge     │
                    │  Bugbook API     │
                    │  ELO Engine      │
                    │  Error Logs      │
                    │  Blog Comments   │
                    └──┬──────────┬────┘
          Service      │          │     HTTPS
          Binding      │          │
    ┌──────────────┐   │    ┌─────▼──────────┐
    │spike-land-mcp│◄──┘    │ Node.js MCP    │
    │ mcp_feedback │        │ servers         │
    │ ELO gating   │        │ *_feedback      │
    └──────────────┘        └────────────────┘

Everything runs on Cloudflare Workers + D1. No external databases, no Redis, no queues. The ELO engine is pure math — 40 lines adapted from our chess package. The entire system adds about 500 lines of business logic and 200 lines of SQL.

What We Learned

  1. ELO is a universal trust primitive. Any system where entities compete for relevance can use ELO. Bugs compete for attention. Users compete for trust. The math is the same.

  2. Agents are better bug reporters than humans. They report immediately, with full context, and they hit edge cases that manual testing misses.

  3. Reputation-based rate limiting is gentler than IP blocking. Instead of hard walls, bad actors experience degraded service. They can recover by behaving well. It’s a gradient, not a gate.

  4. Public bug tracking builds trust. When users can see that their reports lead to fixes, they report more bugs. When they can see the priority order (via ELO), they understand why some bugs get fixed before others.

  5. Feedback tools should be everywhere. If every MCP server has a feedback tool, then every agent interaction is a potential bug report. The coverage is automatic.


The code is open source. The Bugbook is live at spike.land/bugbook. If you find a bug, report it — your ELO will thank you.