TL;DR
As a smart money tracker for prediction market copy trading, we rank Polymarket wallets by a composite score that blends risk-adjusted return (Sharpe), win-rate after edge adjustment, drawdown discipline, and rank stability across 15-minute refresh windows. Below we walk through the math, the data window, the outlier filter, and the limits of the approach. Nothing in this document is a forecast: every metric is computed from realized on-chain fills over a rolling 30-day window, and every input is reproducible from public Polygon data.
This page is the formal companion to the live leaderboard. For a less formal walkthrough, see the blog post on our wallet scoring methodology; for vocabulary, see the glossary.
1. The composite score
Every wallet that meets the eligibility threshold receives a single scalar between 0 and 1 each time the leaderboard refreshes. The score is a fixed-weight linear combination of five sub-metrics, each itself bounded to the unit interval. The weights were chosen by backtesting against three years of Polymarket fills and selecting the Pareto-optimal point on the curve trading off forward Sharpe against rank stability. They are constant across categories, on purpose: regime-specific weights are easy to overfit and hard to communicate.
score = 0.45 · sharpe_normalized
+ 0.20 · edge_adjusted_winrate
+ 0.15 · roi_normalized
+ 0.10 · drawdown_resilience
+ 0.10 · rank_stability
1.1 Sharpe normalized
The numerator is the wallet's Sharpe ratio computed on log-returns of realized PnL over a rolling 30-day window, with the risk-free rate set to the 4-week US T-bill yield as of the snapshot. The denominator is the 95th-percentile Sharpe in the eligible cohort that day. The ratio is capped at 1.0. Two consequences follow. First, scores are relative — a wallet's Sharpe component is measured against its peers, not against an absolute benchmark, so a quiet market regime does not collapse the leaderboard. Second, the cap prevents a single anomalous wallet from squashing the rest of the distribution. A wallet whose Sharpe exceeds the cohort's 95th percentile receives a 1.0 and the rank tiebreaker passes to the other four components.
1.2 Edge-adjusted win-rate
Raw win-rate on a binary market is meaningless without the price at entry. Buying YES at six cents and winning is not comparable to buying YES at sixty cents and winning; the second wallet was right about a near-coin-flip, the first was right about a long shot. We compute, for each filled position, the implied break-even probability 1 / (1 + odds) where odds is the decimal odds at fill. The wallet's edge is the difference between its realized win-rate and its trade-weighted mean break-even. We bucket the resulting edge values into deciles across the cohort and min-max normalize within each decile, which keeps the metric stable when the global Polymarket category mix shifts.
1.3 ROI normalized
We use log-ROI rather than arithmetic ROI because the distribution of arithmetic ROI on prediction markets is severely fat-tailed; one parlay-like long-shot resolution can dominate a year of disciplined trading. Log-ROI is clipped at the 99th percentile of the cohort and min-max normalized. ROI receives only fifteen percent of the composite weight. This is intentional: ROI without risk context is the metric that has historically misled copy-traders into chasing wallets whose returns came from a single fortunate resolution.
1.4 Drawdown resilience
For a wallet with peak-to-trough drawdown max_dd, the resilience component is 1 − (max_dd / cohort_p95_dd), clipped to [0, 1]. A wallet whose worst drawdown matched the cohort's 95th-percentile drawdown receives a zero on this component. A wallet that rode out the same 30-day window with no material drawdown receives a one. Section 7 below describes how max_dd is actually computed — the naïve definition is more misleading than people think.
1.5 Rank stability
The leaderboard refreshes every fifteen minutes. For each refresh, we compute the wallet's daily aggregate rank, then take the Spearman rank correlation between the current daily rank and the wallet's seven-day moving rank. Wallets that climb steadily or hold a stable position score higher; wallets that ricochet between rank 12 and rank 380 between snapshots score near zero. Rank stability is not a measure of skill, but it is a measure of followability — and Poly Syncer is a prediction market copy trading product, not a hall of fame.
2. Data sources
Every input to the score is derived from public on-chain data. There is no proprietary feed, no closed dataset, no privileged access. This is a deliberate constraint: a methodology that cannot be independently reproduced is a methodology that cannot be trusted.
- Order and fill history — Polymarket's CLOB events on Polygon, read via standard Polygon RPC endpoints. Anyone with a free Polygonscan archive node connection can replay the same events.
- Resolution and outcome data — Polymarket's public subgraph cross-checked against UMA dispute records. Disputed markets are excluded from win-rate calculations until resolution is final, then re-included on the next refresh.
- Cohort definition — at the time of writing, 12,438 wallets meeting the threshold of at least ten trades in the trailing 30 days. The cohort grows and shrinks with venue activity; election-cycle weeks can push it past 18,000.
- Refresh cadence — every 15 minutes, server-side, with the snapshot timestamp published alongside each ranking. The refresh job is idempotent; a missed window is recomputed on the next tick rather than skipped.
Polymarket's official API documentation, including the contract addresses and event signatures we consume, is at docs.polymarket.com. Poly Syncer's read-only programmatic interface is documented at the API reference.
3. Window selection: why 30 days
The choice of evaluation window is the single most consequential decision in any wallet-ranking system. Too short and sample sizes collapse, too long and the metric tracks a regime that no longer exists. We chose 30 days after evaluating five candidates against forward-rank stability.
A seven-day window has the appeal of recency but suffers from severe sample-size shrinkage. A median wallet on Polymarket files fewer than twenty fills per week; the sampling error on a Sharpe ratio computed over twenty observations is large enough to make rank order nearly random. A lifetime window, conversely, anchors the score to a regime that may no longer apply — election-cycle wallets that excelled in 2024 are not representative of sports-cycle wallets in 2026, and treating their lifetime metrics as comparable is a category error.
Within the 30-day window, observations are not weighted equally. We apply exponential weighted moving averages with a half-life of ten days, which puts roughly fifty percent of the effective weight on the most recent ten days and the remaining fifty percent on days eleven through thirty. This is a robust-to-regime adjustment: a wallet whose performance has clearly turned in the past week will see its score move within days rather than weeks, but the score will not whipsaw on a single trade.
4. Outlier detection: median + MAD
Prediction-market PnL distributions are heavy-tailed. A single resolved long shot can move a wallet's mean return by a factor of three. Mean-and-standard-deviation outlier detection breaks down on these distributions because both moments are themselves dominated by the outliers they are meant to detect. We use a Hampel filter built on median absolute deviation instead.
For each wallet's per-trade PnL series, we compute the median m and the MAD k = median(|x_i − m|). Any trade whose absolute deviation from the median exceeds 3.5 · k is flagged as an outlier. The 3.5 threshold is conventional; lower values flag too many legitimate variance trades, higher values let real anomalies through. Flagged trades are excluded from Sharpe and ROI calculations but retained in win-rate and drawdown calculations. The asymmetric handling is deliberate: we do not want a wallet's volatility profile dominated by one resolution, but we also do not want to artificially censor a wallet's actual win history. The leaderboard, after all, exists to surface what actually happened.
5. Win-rate caveats
The naïve win-rate is the most-cited and least-useful statistic on Polymarket. Binary outcomes inflate the metric mechanically: any wallet that buys only YES at twenty cents and only NO at eighty cents will post a high raw win-rate while losing money. The break-even probability of a market trading at price p is exactly p, which is the implicit insight behind decimal odds. A wallet's edge is its realized win-rate minus its trade-weighted mean entry price.
A worked comparison clarifies the stakes. Wallet A posts a 70% raw win-rate on markets it entered at an average price of 60¢. Its edge is 70% − 60% = 10 percentage points. Wallet B posts a 55% raw win-rate on markets it entered at an average price of 30¢. Its edge is 55% − 30% = 25 percentage points. Wallet B is the more skilled trader by a factor of two and a half, even though wallet A's win-rate looks more impressive. The edge-adjusted win-rate component of our composite score is constructed precisely to reverse this naïve ordering.
The connection to the Kelly criterion is direct: the optimal Kelly fraction depends on edge over break-even, not raw win-rate. Position sizing in the broader execution layer is described in the whitepaper.
6. Drawdown methodology
Maximum drawdown is widely reported and widely misunderstood. The textbook definition — the largest peak-to-trough decline in cumulative PnL over the window — is correct but incomplete. Two wallets can post identical max-drawdown numbers while one recovered in three days and the other spent twenty-six days underwater. The difference matters enormously to a copy-trader.
We compute three drawdown statistics and feed a transformation of all three into the resilience component:
- High-water mark drawdown — the classic peak-to-trough metric, computed on the cumulative PnL series after outlier filtering.
- Underwater duration — the longest contiguous run of days during which cumulative PnL was below the rolling high-water mark. A wallet with a 5% max-dd that lasted one day is treated very differently from one with a 5% max-dd that lasted three weeks.
- Recovery half-life — the median number of days required for the wallet to recover half of any drawdown it experienced. This is a more robust measure of drawdown discipline than the worst single recovery.
The drawdown resilience component is a weighted blend of these three, with the high-water mark drawdown carrying the largest share. The exact weights are documented in the open-source pseudocode referenced in the API reference.
7. Rank stability and 15-minute snapshots
The leaderboard refreshes every fifteen minutes. Within a single trading day, that produces 96 snapshots. For each snapshot, we compute the wallet's instantaneous rank, then aggregate to a daily rank using the median across snapshots — robust to the occasional single-fill spike. The seven-day moving rank is the simple mean of the daily ranks over the trailing seven days.
Rank stability is the Spearman correlation between the daily rank series and the seven-day moving rank, computed pairwise across consecutive snapshots. A wallet that is genuinely a top-twenty trader will see this correlation hover near 0.9 even on volatile days; a wallet that is in the top twenty on Tuesday and the top three hundred on Wednesday will post correlations near zero. Copy-trading a high-churn wallet is structurally a losing strategy: by the time a subscriber's bot has copied the entry, the wallet has already churned to a different strategy. Rank stability captures this in a single number and makes high-churn wallets visible to subscribers before they commit capital.
8. Limitations and honest caveats
- Survivorship bias. The cohort is wallets currently active on Polymarket. Wallets that blew up and walked away are not in the data, so cohort-relative metrics carry an upward bias.
- Regime shifts. An election-week scorer is not the same as a Champions League scorer. Our 30-day window with 10-day half-life smooths some of this but does not eliminate it.
- Self-trading and coordination. Some wallets coordinate fills to manipulate win-rates. A graph-based cluster detection pass flags anomalous counterparty overlap and deranks clustered wallets. The mitigation is partial — coordination through fresh wallets remains hard to detect.
- Small-sample shrinkage. Wallets with fewer than 100 trades over the window receive a shrinkage adjustment toward the cohort median, scaling linearly to zero at 100 trades.
- This is not a forecast. Every metric is realized, historical, and ex-post. Read the risk disclosure before subscribing to any copy-trading strategy.
9. Worked example
Consider a synthetic but plausible wallet, address 0xABCD…XYZ, evaluated on the snapshot of May 7, 2026. The inputs are realistic for a top-decile but not record-setting trader.
- 142 trades over the trailing 30 days, all post-outlier-filter.
- Sharpe ratio 1.84 on log-returns. Cohort 95th-percentile Sharpe that day was 2.20, so
sharpe_normalized = 1.84 / 2.20 = 0.836. - Raw win-rate 61% against a trade-weighted mean break-even of 53.8%. Edge of 7.2 percentage points, which placed the wallet in the eighth decile of the cohort. Decile-normalized edge-adjusted win-rate of 0.74.
- Log-ROI of 0.21 over 30 days. Cohort 99th-percentile log-ROI was 0.34. Normalized ROI of
0.21 / 0.34 = 0.618. - Maximum drawdown of -11.2%. Cohort 95th-percentile drawdown was -27.5%. Drawdown resilience of
1 − (11.2 / 27.5) = 0.593. - Rank stability Spearman correlation of 0.84 between daily and seven-day moving rank.
The composite score is then:
score = 0.45 · 0.836
+ 0.20 · 0.740
+ 0.15 · 0.618
+ 0.10 · 0.593
+ 0.10 · 0.840
= 0.376 + 0.148 + 0.093 + 0.059 + 0.084
= 0.760
A composite score of 0.760 placed this wallet inside the top five percent of the cohort that day — the band where the Polymarket leaderboard concentrates the top Polymarket traders worth following. For an applied analysis of similar wallets, see the data-driven post on top Polymarket wallets and the practitioner guide on how to evaluate a Polymarket wallet.
10. Reproducibility
Pseudocode for the full ranking pipeline — eligibility filter, outlier detection, decile bucketing, exponential weighting, and final composite — is published alongside the API reference. Anyone with a Polygon RPC endpoint can replay any historical snapshot to within rounding error. Two academic groups have reproduced specific snapshots; their work exposed a bug in our UMA dispute handling that we fixed in the April 2026 release. Open methodology is the only way a quantitative ranking can be trusted by users who, by construction, cannot trust the operator.
11. Frequently asked questions
Why isn't pure ROI the metric?
Pure ROI is dominated by tail outcomes on prediction markets. A wallet that bought a single eight-cent long shot and won will post an ROI that no disciplined trader will ever match — and that is not the wallet a subscriber wants to copy. The composite score weights ROI at fifteen percent precisely so that one good resolution cannot vault a wallet to the top of a leaderboard that is supposed to reflect repeatable skill.
How often does a wallet drop off the leaderboard?
The leaderboard refreshes every fifteen minutes. A wallet drops off if it falls below the eligibility threshold of ten trades in the trailing 30 days, if its cluster-detection pass flags it as coordinated, or if its composite score falls below the cohort's tenth percentile. In practice, between five and ten percent of the visible top-200 turns over week-to-week.
What about wallets with fewer than 10 trades?
They are excluded entirely. Sample sizes below ten produce sampling error on Sharpe ratios that exceeds the signal we are trying to measure. Wallets between ten and one hundred trades receive an additional shrinkage adjustment that pulls their score toward the cohort median, with the adjustment fading linearly to zero at 100 trades.
Do you adjust for category mix?
Implicitly, yes — the edge-adjusted win-rate computation is per-trade against each market's break-even price, so a wallet that specializes in low-probability long shots is measured against the actual difficulty of those trades. We do not run an additional category-attribution model on top of this; doing so adds complexity that backtests poorly and is easy to overfit.
Can the leaderboard be gamed?
Partially, yes. Any open scoring system can be gamed by sufficiently motivated actors. The mitigations are the cluster detection pass for coordinated wallets, the small-sample shrinkage that prevents fresh wallets from leaping to the top, the outlier filter that deflates lottery-style PnL, and the rank-stability component that penalizes wallets whose performance is too volatile to be the result of repeatable skill. None of these mitigations is bulletproof. We publish the methodology, including its weaknesses, so subscribers can apply their own judgment on top of the score.
How often does the methodology itself change?
Rarely, and never silently. Material changes to weights, thresholds, or window definitions are announced in the changelog at least two weeks before they take effect. The April 2026 fix to UMA dispute handling is the only retroactive change in the past twelve months.