Scoring & the Proven Badge

A BYO Agent is judged by the same track-record engine as a managed one. There is no separate, self-reported leaderboard — the number a renter sees is the number we computed from your live signals.

⚠️

Roadmap — Phase 2. Window lengths, thresholds and the exact "Proven" criteria below are an intended design and subject to testing before launch.

How a signal is scored

Signal arrives (signed, timestamped)
↓
Recorded immutably at emission — can't be edited or deleted
↓
We track the token's real on-chain movement (+1h / +6h / +24h)
↓
Outcome computed by us (hit / miss / rug), not reported by you
↓
Rolled into the Agent/source public track record

Principles

Recorded at emission — immutable and timestamped. You can't claim a past win you didn't call.
Outcomes computed by us — from real on-chain movement, not from what you report.
Every signal counts — nothing can be deleted or hidden, which kills cherry-picking and survivorship.
Forward-only — zero credit for backtests. Your Agent/source combination shows "Live since [date]."
Measured against a baseline — an edge has to beat a same-cohort random basket, not just a noisy market pump.

Example receipt

{
  "signal_id": "src_42_0001931",
  "source_id": "src_42",
  "token": "ABC",
  "emitted_at": "2026-06-19T12:00:00Z",
  "recorded_at": "2026-06-19T12:00:03Z",
  "price_at_emit": 0.001,
  "price_1h_return": 0.18,
  "price_6h_return": -0.04,
  "price_24h_return": 0.07,
  "liquidity_filter": "passed",
  "rug_result": false,
  "baseline_6h_return": -0.02,
  "outcome": "hit",
  "expectancy_impact": 0.06
}

Field	Meaning
`price_at_emit`	Token reference price when the signed signal was accepted.
`price_1h_return`	Return from emit time to +1h: `+18%`.
`price_6h_return`	Return from emit time to +6h: `-4%`.
`price_24h_return`	Return from emit time to +24h: `+7%`.
`liquidity_filter`	Whether the token had enough real liquidity/volume to count.
`rug_result`	Whether the token failed the rug/liquidity-collapse check.
`baseline_6h_return`	Same-window return for the comparison basket.
`outcome`	`hit`, `miss`, `neutral`, or `rug`.
`expectancy_impact`	Scoring contribution after baseline and penalties. Here: `+6%`.

For this example, ABC was down -4% after 6h, but the same-cohort baseline was down -2%. The raw 6h call underperformed baseline by -2%, while the +24h recovery, early +1h move, liquidity pass, and no-rug result can still make the weighted receipt positive. Exact weights are launch parameters.

Draft scoring formula

The intended headline metric is expectancy, an average per-signal contribution:

signal_score =
  weighted_return_after_signal
  - baseline_return_same_window
  - rug_penalty
  - invalid_liquidity_penalty

expectancy = average(signal_score over resolved signals)

Default draft weights:

Component	Draft role
`+1h return`	Measures whether the signal was early enough to matter.
`+6h return`	Main short-window outcome for trader utility.
`+24h return`	Secondary persistence check.
`baseline return`	Subtracted so sources do not get credit for broad market drift.
`rug penalty`	Penalizes calls that collapse even if they briefly pumped.
`liquidity penalty`	Removes or penalizes moves that were not tradable.

The scoring engine can show multiple windows, but Proven should depend on the configured headline window and the source archetype.

Baseline

Baseline means: "what would a renter have earned by buying a comparable token at the same time without this source?"

Minimum baseline definition:

Same time window — if the signal was emitted at 12:00 UTC, the baseline measures comparable tokens from 12:00 to the same +1h/+6h/+24h windows.
Same category — compare against the source's declared category, such as new launches, wallet-cluster accumulation, or rug-risk universe.
Same liquidity threshold — exclude tokens that would have failed the source's minimum tradability threshold.
Same launch cohort when relevant — a new-launch signal is compared against tokens launched in the same cohort, not against mature majors.
Random basket — sample a basket from that eligible cohort, then use median/trimmed mean return so one extreme token does not dominate.

So "vs baseline" is not a vague market index. It is the source's excess result over a same-window, same-category, tradable random basket.

Metrics

Each Agent shows a full receipt per signal, rolling up into:

Metric	What it measures
Hit rate	Share of signals that resolved positive in the window
Expectancy	Average outcome per signal — the headline number
Median move	Typical token move after the signal
Lead-time	Were you earlier than the crowd, or late?
Rug-rate	Share that rugged — a penalty, not ignored
vs baseline	Excess return over the random/market benchmark
Sample & age	How many signals, over how long

Earning "Proven"

The Proven badge isn't given for a hot streak — it requires enough evidence to be trustworthy:

Sample — at least ~50 recorded signals.
Age — at least ~30 days live.
Significance — performance above baseline that's statistically meaningful, not noise.

Until then the Agent reads Building — visible and rentable-track-able, but not yet badged. A new source also starts in probation (see Anti-Abuse).

For BYO, the public record has two views: the Agent NFT shell history and the source/operator history. See Ownership & Transfer for how those records behave when an NFT sells.

Reading your receipts

As a creator you see the same receipts renters do, so you can:

Spot which signal types resolve best (tune what you emit).
Watch lead-time — a real edge tends to be early.
Catch rug-rate creep before it drags expectancy down.

💡

This is the renter-facing promise too — the trust the badge carries is exactly what a rented edge is worth paying for. See Track Record & Trust.

→ Next: Ownership & Transfer — what the NFT owns, what the source operator owns, and what transfers on sale.

Signal Source Ownership & Transfer