Methodology

Wallet Scoring Methodology — How We Rank Traders

A transparent breakdown of every metric the leaderboard uses, why we weight them the way we do, and how we filter out lucky wallets.

Last reviewed · Maria Ostrowski, Poly Syncer

Poly Syncer ranks Polymarket wallets with a composite score that blends 30-day risk-adjusted return (Sharpe), realized ROI, hit rate versus implied odds, and maximum drawdown. It is the smart money tracker behind our Polymarket leaderboard, surfacing the top Polymarket traders in real time. The score is recomputed every 15 minutes across 1,891–2,492 indexed wallets, with Hampel/MAD outlier filters and a 30-trade minimum sample to keep small-N flukes off the leaderboard. This page is the plain-English version; the formal spec lives on /methodology.

Why a composite score at all?

Any single metric is gameable. ROI alone rewards leverage and luck. Win-rate alone rewards trading $0.95 favorites that pay $0.01 of edge. Sharpe alone undervalues the trader who hits one massive correct call per quarter. The composite is built precisely because the failure modes of each metric are different — and a wallet has to look healthy on all of them to climb the leaderboard.

Concretely, the Poly Syncer composite is:

Score = 0.40 * z(Sharpe_30d)
      + 0.25 * z(ROI_30d)
      + 0.20 * z(EdgeAdjustedHitRate)
      - 0.15 * z(MaxDrawdown_30d)

Each input is z-scored against the same 30-day cohort so the weights operate on comparable scales. Wallets with fewer than 30 trades in the window are excluded. Wallets with a single position > 35% of bankroll in the window are flagged but not removed — concentration is a fact about the trader you should know.

1. Sharpe ratio for prediction markets

The Sharpe ratio is a return-per-unit-of-volatility measure: it asks "how much return did this wallet get for the swings it endured?" In equities, returns are computed daily on a continuous price series. In Polymarket the series is jaggier — a wallet might fill three trades on Tuesday and zero on Wednesday — so we adapt the calculation in two ways.

A 30-day Sharpe above 2.0 is good in this universe. Above 3.0 is rare and usually concentrated in a small number of categories. We treat anything above 5.0 with strong skepticism and apply additional sample-size and concentration checks before showing it on the public leaderboard.

2. Window selection: why 30 days, with 90-day context

We default to a 30-day rolling window because it captures recent regime and is long enough that a single lucky day cannot dominate the score. We also publish 7-day and 90-day Sharpes alongside, because:

For copy-trade decisions we recommend filtering on 90-day Sharpe ≥ 1.6 and 30-day Sharpe ≥ 2.0 simultaneously. The combination is a meaningful screen.

3. Outlier detection (Hampel filter / MAD)

A single market resolution can produce a return observation that is 10× the wallet's typical trade. If that one observation drives the Sharpe and the ROI, the wallet's score is brittle. We run a median absolute deviation (MAD) filter, also known as the Hampel filter, on each wallet's return series:

This is the single most-important filter we run. About 12% of wallets that look elite by raw 30-day Sharpe collapse below the threshold under winsorization. Those wallets are not banned — you can still copy them — but the badge tells you what you're looking at.

4. The win-rate caveat

"Win-rate" on Polymarket is misleading because the contract universe is binary. If you only ever bet on contracts trading at $0.90, you will win ~90% of the time and earn roughly 0% edge. Raw win-rate is therefore not in the composite at all. What we use instead is edge-adjusted hit rate:

EdgeAdjustedHitRate = (ActualWins / Trades) − AvgImpliedProb

If a wallet bets on contracts whose average entry price is $0.42 and wins 51% of the time, the edge-adjusted hit rate is +0.09 — that is the genuinely informative number. A wallet with a 73% raw win rate but average entry of $0.74 has edge-adjusted hit rate −0.01, i.e., is approximately fair-priced and getting paid for nothing.

5. Drawdown and recovery

Maximum drawdown is the deepest peak-to-trough decline in equity over the window. We track three drawdown statistics:

6. Rank stability between refreshes

The leaderboard refreshes every 15 minutes. If the rank of a wallet changes wildly between refreshes, the score is too sensitive and the user experience is bad. We monitor the Spearman rank correlation between consecutive refreshes for the top-100 segment. Our running target is ρ ≥ 0.92. When ρ drops below 0.85 (it has happened twice in the last six months, both during major election-night resolution waves) we surface a banner explaining the noise.

Stability is a feature, not a side effect. A leaderboard that thrashes is a leaderboard you cannot trust to follow.

7. A worked example

Consider three plausible wallets in a single 30-day window:

Wallet Trades ROI Sharpe Edge HR Max DD Composite
0xAAAA…1111142+38%2.7+0.08−9%+1.84
0xBBBB…222261+91%3.4+0.04−28%+1.21
0xCCCC…3333388+22%2.1+0.11−6%+1.62

Wallet B has the highest raw ROI but ranks below A and C because of the deeper drawdown and lower edge-adjusted hit rate. Wallet A wins the composite by being good on all four axes. This is the behavior we want: the leaderboard rewards balanced excellence, not a single hot streak.

8. Sample size and the small-N trap

The single most common reason a wallet looks elite for two weeks and then collapses is sample size. With 30 binary trades, the standard error on win-rate alone is roughly 9 percentage points; with 200 trades it is 3.5 percentage points. A wallet with 32 trades and a 65% raw win rate could plausibly be a 50% wallet that got lucky. We respond in three ways:

The cleanest practical filter for users picking leaders to copy is to require ≥100 trades in the 30-day window. About 38% of the 12,438 indexed wallets meet that bar; the rest are in the universe but flagged.

9. Concentration and self-trading detection

A wallet whose return is dominated by a single position is structurally fragile and may signal a one-shot lucky guess rather than reproducible edge. We compute a Herfindahl-Hirschman-style concentration index (HHI) on each wallet's USDC volume by market. Any wallet with HHI > 0.4 (i.e., a single market accounts for >40% of volume) is flagged "concentrated" on the profile page.

Self-trading detection runs on transaction graph data. We watch for cycles where wallet A buys YES at $X and a closely-linked wallet B sells YES at $Y > $X seconds later in the same market. Identified clusters are merged for scoring purposes (so the operator does not appear at the top of the leaderboard 4 times) and clear manipulation patterns trigger removal. The threshold is documented; the false-positive rate has been measured at roughly 0.7% of flagged clusters in our internal audits.

10. What the score doesn't capture

We are explicit about the gaps:

  1. Survivorship bias. Wallets that blew up and stopped trading are not in the cohort. We address this in our risk-management post.
  2. Off-chain knowledge. We cannot tell whether a wallet is run by a journalist with a source, a sports analyst with a model, or a lucky novice. Sample size and edge-adjusted hit rate are the proxies.
  3. Capacity. A wallet trading $200 positions cannot be copied with $20,000 positions without slippage that breaks the strategy. Our strategies page models this.
  4. Future-proofing. Past edge is not a guarantee. Composite is a screen, not a promise.

Frequently asked questions

How often is each wallet's score updated?

Every 15 minutes. The leaderboard timestamp shows the exact last-refresh moment, and the API at /developers exposes the same data programmatically.

Why isn't ROI weighted higher in the composite?

Because ROI without volatility context is misleading. A wallet that quadrupled its bankroll in a 30-day window with three concentrated bets has the same ROI as a wallet that earned 300% across 200 well-sized bets. The Sharpe weighting captures the difference; ROI is in the composite at 25% weight precisely so it cannot dominate.

Do you ever remove wallets from the leaderboard?

Yes. Wallets are excluded if (a) sample size drops below 30 trades in the window, (b) the wallet is identified as a known operator's secondary address (we de-duplicate), or (c) clear self-trading patterns are detected. Manipulation flags are documented on /methodology.

Can I see the raw inputs for a specific wallet?

Yes. Every wallet's profile page on the leaderboard shows the four component scores, the trade list, and the winsorized vs raw Sharpe. The API exposes the same fields.

How the score evolves between refreshes

Every 15 minutes the scoring pipeline runs from scratch on a rolling window: fetch all closed positions for each indexed wallet over the trailing 30 days, compute returns, run the MAD filter, recompute Sharpe and edge-adjusted hit rate, refresh the composite, re-rank. Total compute time is roughly 38 seconds across the full 1,891–2,492-wallet cohort on the production indexer; the leaderboard publish cycle is therefore comfortably under the 15-minute refresh budget.

Two refresh-related properties are worth understanding:

What we deliberately do not do

Some choices are easier to defend by stating explicitly what we ruled out:

Want to use the score in your own model?

The full schema is on /methodology, the API is documented at /developers, and the glossary defines every term used here. If you'd rather skip the modeling and just follow vetted wallets, the leaderboard is one click away at /leaderboard — browse free, then upgrade to Pro at /dashboard/billing when you’re ready to mirror trades.