Methodology

Poly Syncer wallet scoring methodology

The math, the data window, and the limits of how Poly Syncer ranks Polymarket wallets — composite score, outlier handling, drawdown discipline, and rank stability across 15-minute snapshots.

Last reviewed · Eli Marsh, Quantitative Research

TL;DR

As a smart money tracker for prediction market copy trading, we rank Polymarket wallets by a composite score that blends risk-adjusted return (Sharpe), win-rate after edge adjustment, drawdown discipline, and rank stability across 15-minute refresh windows. Below we walk through the math, the data window, the outlier filter, and the limits of the approach. Nothing in this document is a forecast: every metric is computed from realized on-chain fills over a rolling 30-day window, and every input is reproducible from public Polygon data.

This page is the formal companion to the live leaderboard. For a less formal walkthrough, see the blog post on our wallet scoring methodology; for vocabulary, see the glossary.

1. The composite score

Every wallet that meets the eligibility threshold receives a single scalar between 0 and 1 each time the leaderboard refreshes. The score is a fixed-weight linear combination of five sub-metrics, each itself bounded to the unit interval. The weights were chosen by backtesting against three years of Polymarket fills and selecting the Pareto-optimal point on the curve trading off forward Sharpe against rank stability. They are constant across categories, on purpose: regime-specific weights are easy to overfit and hard to communicate.

score = 0.45 · sharpe_normalized
      + 0.20 · edge_adjusted_winrate
      + 0.15 · roi_normalized
      + 0.10 · drawdown_resilience
      + 0.10 · rank_stability

1.1 Sharpe normalized

The numerator is the wallet's Sharpe ratio computed on log-returns of realized PnL over a rolling 30-day window, with the risk-free rate set to the 4-week US T-bill yield as of the snapshot. The denominator is the 95th-percentile Sharpe in the eligible cohort that day. The ratio is capped at 1.0. Two consequences follow. First, scores are relative — a wallet's Sharpe component is measured against its peers, not against an absolute benchmark, so a quiet market regime does not collapse the leaderboard. Second, the cap prevents a single anomalous wallet from squashing the rest of the distribution. A wallet whose Sharpe exceeds the cohort's 95th percentile receives a 1.0 and the rank tiebreaker passes to the other four components.

1.2 Edge-adjusted win-rate

Raw win-rate on a binary market is meaningless without the price at entry. Buying YES at six cents and winning is not comparable to buying YES at sixty cents and winning; the second wallet was right about a near-coin-flip, the first was right about a long shot. We compute, for each filled position, the implied break-even probability 1 / (1 + odds) where odds is the decimal odds at fill. The wallet's edge is the difference between its realized win-rate and its trade-weighted mean break-even. We bucket the resulting edge values into deciles across the cohort and min-max normalize within each decile, which keeps the metric stable when the global Polymarket category mix shifts.

1.3 ROI normalized

We use log-ROI rather than arithmetic ROI because the distribution of arithmetic ROI on prediction markets is severely fat-tailed; one parlay-like long-shot resolution can dominate a year of disciplined trading. Log-ROI is clipped at the 99th percentile of the cohort and min-max normalized. ROI receives only fifteen percent of the composite weight. This is intentional: ROI without risk context is the metric that has historically misled copy-traders into chasing wallets whose returns came from a single fortunate resolution.

1.4 Drawdown resilience

For a wallet with peak-to-trough drawdown max_dd, the resilience component is 1 − (max_dd / cohort_p95_dd), clipped to [0, 1]. A wallet whose worst drawdown matched the cohort's 95th-percentile drawdown receives a zero on this component. A wallet that rode out the same 30-day window with no material drawdown receives a one. Section 7 below describes how max_dd is actually computed — the naïve definition is more misleading than people think.

1.5 Rank stability

The leaderboard refreshes every fifteen minutes. For each refresh, we compute the wallet's daily aggregate rank, then take the Spearman rank correlation between the current daily rank and the wallet's seven-day moving rank. Wallets that climb steadily or hold a stable position score higher; wallets that ricochet between rank 12 and rank 380 between snapshots score near zero. Rank stability is not a measure of skill, but it is a measure of followability — and Poly Syncer is a prediction market copy trading product, not a hall of fame.

2. Data sources

Every input to the score is derived from public on-chain data. There is no proprietary feed, no closed dataset, no privileged access. This is a deliberate constraint: a methodology that cannot be independently reproduced is a methodology that cannot be trusted.

Polymarket's official API documentation, including the contract addresses and event signatures we consume, is at docs.polymarket.com. Poly Syncer's read-only programmatic interface is documented at the API reference.

3. Window selection: why 30 days

The choice of evaluation window is the single most consequential decision in any wallet-ranking system. Too short and sample sizes collapse, too long and the metric tracks a regime that no longer exists. We chose 30 days after evaluating five candidates against forward-rank stability.

A seven-day window has the appeal of recency but suffers from severe sample-size shrinkage. A median wallet on Polymarket files fewer than twenty fills per week; the sampling error on a Sharpe ratio computed over twenty observations is large enough to make rank order nearly random. A lifetime window, conversely, anchors the score to a regime that may no longer apply — election-cycle wallets that excelled in 2024 are not representative of sports-cycle wallets in 2026, and treating their lifetime metrics as comparable is a category error.

Within the 30-day window, observations are not weighted equally. We apply exponential weighted moving averages with a half-life of ten days, which puts roughly fifty percent of the effective weight on the most recent ten days and the remaining fifty percent on days eleven through thirty. This is a robust-to-regime adjustment: a wallet whose performance has clearly turned in the past week will see its score move within days rather than weeks, but the score will not whipsaw on a single trade.

4. Outlier detection: median + MAD

Prediction-market PnL distributions are heavy-tailed. A single resolved long shot can move a wallet's mean return by a factor of three. Mean-and-standard-deviation outlier detection breaks down on these distributions because both moments are themselves dominated by the outliers they are meant to detect. We use a Hampel filter built on median absolute deviation instead.

For each wallet's per-trade PnL series, we compute the median m and the MAD k = median(|x_i − m|). Any trade whose absolute deviation from the median exceeds 3.5 · k is flagged as an outlier. The 3.5 threshold is conventional; lower values flag too many legitimate variance trades, higher values let real anomalies through. Flagged trades are excluded from Sharpe and ROI calculations but retained in win-rate and drawdown calculations. The asymmetric handling is deliberate: we do not want a wallet's volatility profile dominated by one resolution, but we also do not want to artificially censor a wallet's actual win history. The leaderboard, after all, exists to surface what actually happened.

5. Win-rate caveats

The naïve win-rate is the most-cited and least-useful statistic on Polymarket. Binary outcomes inflate the metric mechanically: any wallet that buys only YES at twenty cents and only NO at eighty cents will post a high raw win-rate while losing money. The break-even probability of a market trading at price p is exactly p, which is the implicit insight behind decimal odds. A wallet's edge is its realized win-rate minus its trade-weighted mean entry price.

A worked comparison clarifies the stakes. Wallet A posts a 70% raw win-rate on markets it entered at an average price of 60¢. Its edge is 70% − 60% = 10 percentage points. Wallet B posts a 55% raw win-rate on markets it entered at an average price of 30¢. Its edge is 55% − 30% = 25 percentage points. Wallet B is the more skilled trader by a factor of two and a half, even though wallet A's win-rate looks more impressive. The edge-adjusted win-rate component of our composite score is constructed precisely to reverse this naïve ordering.

The connection to the Kelly criterion is direct: the optimal Kelly fraction depends on edge over break-even, not raw win-rate. Position sizing in the broader execution layer is described in the whitepaper.

6. Drawdown methodology

Maximum drawdown is widely reported and widely misunderstood. The textbook definition — the largest peak-to-trough decline in cumulative PnL over the window — is correct but incomplete. Two wallets can post identical max-drawdown numbers while one recovered in three days and the other spent twenty-six days underwater. The difference matters enormously to a copy-trader.

We compute three drawdown statistics and feed a transformation of all three into the resilience component:

The drawdown resilience component is a weighted blend of these three, with the high-water mark drawdown carrying the largest share. The exact weights are documented in the open-source pseudocode referenced in the API reference.

7. Rank stability and 15-minute snapshots

The leaderboard refreshes every fifteen minutes. Within a single trading day, that produces 96 snapshots. For each snapshot, we compute the wallet's instantaneous rank, then aggregate to a daily rank using the median across snapshots — robust to the occasional single-fill spike. The seven-day moving rank is the simple mean of the daily ranks over the trailing seven days.

Rank stability is the Spearman correlation between the daily rank series and the seven-day moving rank, computed pairwise across consecutive snapshots. A wallet that is genuinely a top-twenty trader will see this correlation hover near 0.9 even on volatile days; a wallet that is in the top twenty on Tuesday and the top three hundred on Wednesday will post correlations near zero. Copy-trading a high-churn wallet is structurally a losing strategy: by the time a subscriber's bot has copied the entry, the wallet has already churned to a different strategy. Rank stability captures this in a single number and makes high-churn wallets visible to subscribers before they commit capital.

8. Limitations and honest caveats

9. Worked example

Consider a synthetic but plausible wallet, address 0xABCD…XYZ, evaluated on the snapshot of May 7, 2026. The inputs are realistic for a top-decile but not record-setting trader.

The composite score is then:

score = 0.45 · 0.836
      + 0.20 · 0.740
      + 0.15 · 0.618
      + 0.10 · 0.593
      + 0.10 · 0.840

      = 0.376 + 0.148 + 0.093 + 0.059 + 0.084

      = 0.760

A composite score of 0.760 placed this wallet inside the top five percent of the cohort that day — the band where the Polymarket leaderboard concentrates the top Polymarket traders worth following. For an applied analysis of similar wallets, see the data-driven post on top Polymarket wallets and the practitioner guide on how to evaluate a Polymarket wallet.

10. Reproducibility

Pseudocode for the full ranking pipeline — eligibility filter, outlier detection, decile bucketing, exponential weighting, and final composite — is published alongside the API reference. Anyone with a Polygon RPC endpoint can replay any historical snapshot to within rounding error. Two academic groups have reproduced specific snapshots; their work exposed a bug in our UMA dispute handling that we fixed in the April 2026 release. Open methodology is the only way a quantitative ranking can be trusted by users who, by construction, cannot trust the operator.

11. Frequently asked questions

Why isn't pure ROI the metric?

Pure ROI is dominated by tail outcomes on prediction markets. A wallet that bought a single eight-cent long shot and won will post an ROI that no disciplined trader will ever match — and that is not the wallet a subscriber wants to copy. The composite score weights ROI at fifteen percent precisely so that one good resolution cannot vault a wallet to the top of a leaderboard that is supposed to reflect repeatable skill.

How often does a wallet drop off the leaderboard?

The leaderboard refreshes every fifteen minutes. A wallet drops off if it falls below the eligibility threshold of ten trades in the trailing 30 days, if its cluster-detection pass flags it as coordinated, or if its composite score falls below the cohort's tenth percentile. In practice, between five and ten percent of the visible top-200 turns over week-to-week.

What about wallets with fewer than 10 trades?

They are excluded entirely. Sample sizes below ten produce sampling error on Sharpe ratios that exceeds the signal we are trying to measure. Wallets between ten and one hundred trades receive an additional shrinkage adjustment that pulls their score toward the cohort median, with the adjustment fading linearly to zero at 100 trades.

Do you adjust for category mix?

Implicitly, yes — the edge-adjusted win-rate computation is per-trade against each market's break-even price, so a wallet that specializes in low-probability long shots is measured against the actual difficulty of those trades. We do not run an additional category-attribution model on top of this; doing so adds complexity that backtests poorly and is easy to overfit.

Can the leaderboard be gamed?

Partially, yes. Any open scoring system can be gamed by sufficiently motivated actors. The mitigations are the cluster detection pass for coordinated wallets, the small-sample shrinkage that prevents fresh wallets from leaping to the top, the outlier filter that deflates lottery-style PnL, and the rank-stability component that penalizes wallets whose performance is too volatile to be the result of repeatable skill. None of these mitigations is bulletproof. We publish the methodology, including its weaknesses, so subscribers can apply their own judgment on top of the score.

How often does the methodology itself change?

Rarely, and never silently. Material changes to weights, thresholds, or window definitions are announced in the changelog at least two weeks before they take effect. The April 2026 fix to UMA dispute handling is the only retroactive change in the past twelve months.