Create & Rent Your Edge
Scoring & Proven

Scoring & the Proven Badge

A BYO Agent is judged by the same track-record engine as a managed one. There is no separate, self-reported leaderboard — the number a renter sees is the number we computed from your live signals.

⚠️
Roadmap — Phase 2. Window lengths, thresholds and the exact "Proven" criteria below are an intended design and subject to testing before launch.

How a signal is scored

Signal arrives (signed, timestamped)

Recorded immutably at emission — can't be edited or deleted

We track the token's real on-chain movement (+1h / +6h / +24h)

Outcome computed by us (hit / miss / rug), not reported by you

Rolled into the Agent/source public track record

Principles

  • Recorded at emission — immutable and timestamped. You can't claim a past win you didn't call.
  • Outcomes computed by us — from real on-chain movement, not from what you report.
  • Every signal counts — nothing can be deleted or hidden, which kills cherry-picking and survivorship.
  • Forward-only — zero credit for backtests. Your Agent/source combination shows "Live since [date]."
  • Measured against a baseline — an edge has to beat a same-cohort random basket, not just a noisy market pump.

Example receipt

{
  "signal_id": "src_42_0001931",
  "source_id": "src_42",
  "token": "ABC",
  "emitted_at": "2026-06-19T12:00:00Z",
  "recorded_at": "2026-06-19T12:00:03Z",
  "price_at_emit": 0.001,
  "price_1h_return": 0.18,
  "price_6h_return": -0.04,
  "price_24h_return": 0.07,
  "liquidity_filter": "passed",
  "rug_result": false,
  "baseline_6h_return": -0.02,
  "outcome": "hit",
  "expectancy_impact": 0.06
}
FieldMeaning
price_at_emitToken reference price when the signed signal was accepted.
price_1h_returnReturn from emit time to +1h: +18%.
price_6h_returnReturn from emit time to +6h: -4%.
price_24h_returnReturn from emit time to +24h: +7%.
liquidity_filterWhether the token had enough real liquidity/volume to count.
rug_resultWhether the token failed the rug/liquidity-collapse check.
baseline_6h_returnSame-window return for the comparison basket.
outcomehit, miss, neutral, or rug.
expectancy_impactScoring contribution after baseline and penalties. Here: +6%.

For this example, ABC was down -4% after 6h, but the same-cohort baseline was down -2%. The raw 6h call underperformed baseline by -2%, while the +24h recovery, early +1h move, liquidity pass, and no-rug result can still make the weighted receipt positive. Exact weights are launch parameters.

Draft scoring formula

The intended headline metric is expectancy, an average per-signal contribution:

signal_score =
  weighted_return_after_signal
  - baseline_return_same_window
  - rug_penalty
  - invalid_liquidity_penalty

expectancy = average(signal_score over resolved signals)

Default draft weights:

ComponentDraft role
+1h returnMeasures whether the signal was early enough to matter.
+6h returnMain short-window outcome for trader utility.
+24h returnSecondary persistence check.
baseline returnSubtracted so sources do not get credit for broad market drift.
rug penaltyPenalizes calls that collapse even if they briefly pumped.
liquidity penaltyRemoves or penalizes moves that were not tradable.

The scoring engine can show multiple windows, but Proven should depend on the configured headline window and the source archetype.

Baseline

Baseline means: "what would a renter have earned by buying a comparable token at the same time without this source?"

Minimum baseline definition:

  • Same time window — if the signal was emitted at 12:00 UTC, the baseline measures comparable tokens from 12:00 to the same +1h/+6h/+24h windows.
  • Same category — compare against the source's declared category, such as new launches, wallet-cluster accumulation, or rug-risk universe.
  • Same liquidity threshold — exclude tokens that would have failed the source's minimum tradability threshold.
  • Same launch cohort when relevant — a new-launch signal is compared against tokens launched in the same cohort, not against mature majors.
  • Random basket — sample a basket from that eligible cohort, then use median/trimmed mean return so one extreme token does not dominate.

So "vs baseline" is not a vague market index. It is the source's excess result over a same-window, same-category, tradable random basket.

Metrics

Each Agent shows a full receipt per signal, rolling up into:

MetricWhat it measures
Hit rateShare of signals that resolved positive in the window
ExpectancyAverage outcome per signal — the headline number
Median moveTypical token move after the signal
Lead-timeWere you earlier than the crowd, or late?
Rug-rateShare that rugged — a penalty, not ignored
vs baselineExcess return over the random/market benchmark
Sample & ageHow many signals, over how long

Earning "Proven"

The Proven badge isn't given for a hot streak — it requires enough evidence to be trustworthy:

  • Sample — at least ~50 recorded signals.
  • Age — at least ~30 days live.
  • Significance — performance above baseline that's statistically meaningful, not noise.

Until then the Agent reads Building — visible and rentable-track-able, but not yet badged. A new source also starts in probation (see Anti-Abuse).

For BYO, the public record has two views: the Agent NFT shell history and the source/operator history. See Ownership & Transfer for how those records behave when an NFT sells.

Reading your receipts

As a creator you see the same receipts renters do, so you can:

  • Spot which signal types resolve best (tune what you emit).
  • Watch lead-time — a real edge tends to be early.
  • Catch rug-rate creep before it drags expectancy down.
💡
This is the renter-facing promise too — the trust the badge carries is exactly what a rented edge is worth paying for. See Track Record & Trust.

→ Next: Ownership & Transfer — what the NFT owns, what the source operator owns, and what transfers on sale.