Scoring & the Proven Badge
A BYO Agent is judged by the same track-record engine as a managed one. There is no separate, self-reported leaderboard — the number a renter sees is the number we computed from your live signals.
How a signal is scored
Signal arrives (signed, timestamped)
↓
Recorded immutably at emission — can't be edited or deleted
↓
We track the token's real on-chain movement (+1h / +6h / +24h)
↓
Outcome computed by us (hit / miss / rug), not reported by you
↓
Rolled into the Agent/source public track recordPrinciples
- Recorded at emission — immutable and timestamped. You can't claim a past win you didn't call.
- Outcomes computed by us — from real on-chain movement, not from what you report.
- Every signal counts — nothing can be deleted or hidden, which kills cherry-picking and survivorship.
- Forward-only — zero credit for backtests. Your Agent/source combination shows "Live since [date]."
- Measured against a baseline — an edge has to beat a same-cohort random basket, not just a noisy market pump.
Example receipt
{
"signal_id": "src_42_0001931",
"source_id": "src_42",
"token": "ABC",
"emitted_at": "2026-06-19T12:00:00Z",
"recorded_at": "2026-06-19T12:00:03Z",
"price_at_emit": 0.001,
"price_1h_return": 0.18,
"price_6h_return": -0.04,
"price_24h_return": 0.07,
"liquidity_filter": "passed",
"rug_result": false,
"baseline_6h_return": -0.02,
"outcome": "hit",
"expectancy_impact": 0.06
}| Field | Meaning |
|---|---|
price_at_emit | Token reference price when the signed signal was accepted. |
price_1h_return | Return from emit time to +1h: +18%. |
price_6h_return | Return from emit time to +6h: -4%. |
price_24h_return | Return from emit time to +24h: +7%. |
liquidity_filter | Whether the token had enough real liquidity/volume to count. |
rug_result | Whether the token failed the rug/liquidity-collapse check. |
baseline_6h_return | Same-window return for the comparison basket. |
outcome | hit, miss, neutral, or rug. |
expectancy_impact | Scoring contribution after baseline and penalties. Here: +6%. |
For this example, ABC was down -4% after 6h, but the same-cohort baseline was down -2%. The raw 6h call underperformed baseline by -2%, while the +24h recovery, early +1h move, liquidity pass, and no-rug result can still make the weighted receipt positive. Exact weights are launch parameters.
Draft scoring formula
The intended headline metric is expectancy, an average per-signal contribution:
signal_score =
weighted_return_after_signal
- baseline_return_same_window
- rug_penalty
- invalid_liquidity_penalty
expectancy = average(signal_score over resolved signals)Default draft weights:
| Component | Draft role |
|---|---|
+1h return | Measures whether the signal was early enough to matter. |
+6h return | Main short-window outcome for trader utility. |
+24h return | Secondary persistence check. |
baseline return | Subtracted so sources do not get credit for broad market drift. |
rug penalty | Penalizes calls that collapse even if they briefly pumped. |
liquidity penalty | Removes or penalizes moves that were not tradable. |
The scoring engine can show multiple windows, but Proven should depend on the configured headline window and the source archetype.
Baseline
Baseline means: "what would a renter have earned by buying a comparable token at the same time without this source?"
Minimum baseline definition:
- Same time window — if the signal was emitted at 12:00 UTC, the baseline measures comparable tokens from 12:00 to the same +1h/+6h/+24h windows.
- Same category — compare against the source's declared category, such as new launches, wallet-cluster accumulation, or rug-risk universe.
- Same liquidity threshold — exclude tokens that would have failed the source's minimum tradability threshold.
- Same launch cohort when relevant — a new-launch signal is compared against tokens launched in the same cohort, not against mature majors.
- Random basket — sample a basket from that eligible cohort, then use median/trimmed mean return so one extreme token does not dominate.
So "vs baseline" is not a vague market index. It is the source's excess result over a same-window, same-category, tradable random basket.
Metrics
Each Agent shows a full receipt per signal, rolling up into:
| Metric | What it measures |
|---|---|
| Hit rate | Share of signals that resolved positive in the window |
| Expectancy | Average outcome per signal — the headline number |
| Median move | Typical token move after the signal |
| Lead-time | Were you earlier than the crowd, or late? |
| Rug-rate | Share that rugged — a penalty, not ignored |
| vs baseline | Excess return over the random/market benchmark |
| Sample & age | How many signals, over how long |
Earning "Proven"
The Proven badge isn't given for a hot streak — it requires enough evidence to be trustworthy:
- Sample — at least ~50 recorded signals.
- Age — at least ~30 days live.
- Significance — performance above baseline that's statistically meaningful, not noise.
Until then the Agent reads Building — visible and rentable-track-able, but not yet badged. A new source also starts in probation (see Anti-Abuse).
For BYO, the public record has two views: the Agent NFT shell history and the source/operator history. See Ownership & Transfer for how those records behave when an NFT sells.
Reading your receipts
As a creator you see the same receipts renters do, so you can:
- Spot which signal types resolve best (tune what you emit).
- Watch lead-time — a real edge tends to be early.
- Catch rug-rate creep before it drags expectancy down.
→ Next: Ownership & Transfer — what the NFT owns, what the source operator owns, and what transfers on sale.