Wallet Scoring Methodology — How We Rank Traders â€” Poly Syncer

Q: Why isn't ROI weighted higher in the composite?

Because ROI without volatility context is misleading. The Sharpe weighting captures the difference; ROI is in the composite at 25% weight precisely so it cannot dominate.

Q: Do you ever remove wallets from the leaderboard?

Yes. Wallets are excluded if sample size drops below 30 trades, the wallet is a duplicate of a known operator, or self-trading patterns are detected.

11-minute read · By Maria Ostrowski · Updated May 6, 2026

Poly Syncer ranks Polymarket wallets with a composite score that blends 30-day risk-adjusted return (Sharpe), realized ROI, hit rate versus implied odds, and maximum drawdown. It is the smart money tracker behind our Polymarket leaderboard, surfacing the top Polymarket traders in real time. The score is recomputed every 15 minutes across 1,891–2,492 indexed wallets, with Hampel/MAD outlier filters and a 30-trade minimum sample to keep small-N flukes off the leaderboard. This page is the plain-English version; the formal spec lives on /methodology.

Why a composite score at all?

Any single metric is gameable. ROI alone rewards leverage and luck. Win-rate alone rewards trading $0.95 favorites that pay $0.01 of edge. Sharpe alone undervalues the trader who hits one massive correct call per quarter. The composite is built precisely because the failure modes of each metric are different — and a wallet has to look healthy on all of them to climb the leaderboard.

Concretely, the Poly Syncer composite is:

Score = 0.40 * z(Sharpe_30d)
      + 0.25 * z(ROI_30d)
      + 0.20 * z(EdgeAdjustedHitRate)
      - 0.15 * z(MaxDrawdown_30d)

Each input is z-scored against the same 30-day cohort so the weights operate on comparable scales. Wallets with fewer than 30 trades in the window are excluded. Wallets with a single position > 35% of bankroll in the window are flagged but not removed — concentration is a fact about the trader you should know.

1. Sharpe ratio for prediction markets

The Sharpe ratio is a return-per-unit-of-volatility measure: it asks "how much return did this wallet get for the swings it endured?" In equities, returns are computed daily on a continuous price series. In Polymarket the series is jaggier — a wallet might fill three trades on Tuesday and zero on Wednesday — so we adapt the calculation in two ways.

Trade-bucketed returns, not time-bucketed. Each closed position (or resolution) produces one return observation. Volatility is the standard deviation of those observations.
Annualized using a wallet-specific cadence. If a wallet averages 4.2 closed positions per day, we scale by √(4.2 × 365) rather than the equity-default √252.

A 30-day Sharpe above 2.0 is good in this universe. Above 3.0 is rare and usually concentrated in a small number of categories. We treat anything above 5.0 with strong skepticism and apply additional sample-size and concentration checks before showing it on the public leaderboard.

2. Window selection: why 30 days, with 90-day context

We default to a 30-day rolling window because it captures recent regime and is long enough that a single lucky day cannot dominate the score. We also publish 7-day and 90-day Sharpes alongside, because:

7-day Sharpe is mostly noise but useful for spotting regime change — a previously top wallet that has gone cold.
90-day Sharpe is the cleanest predictor of next-30-day Sharpe in our backtests (Spearman ρ ≈ 0.41 in our internal data; 30-day-on-30-day is ρ ≈ 0.28).

For copy-trade decisions we recommend filtering on 90-day Sharpe ≥ 1.6 and 30-day Sharpe ≥ 2.0 simultaneously. The combination is a meaningful screen.

3. Outlier detection (Hampel filter / MAD)

A single market resolution can produce a return observation that is 10× the wallet's typical trade. If that one observation drives the Sharpe and the ROI, the wallet's score is brittle. We run a median absolute deviation (MAD) filter, also known as the Hampel filter, on each wallet's return series:

Compute the median m and MAD across all closed-position returns in the window.
Flag any observation more than 4.5 × MAD from m.
Compute a winsorized Sharpe alongside the raw Sharpe.
If the difference between the two is > 25%, the wallet is marked outlier-driven on the leaderboard.

This is the single most-important filter we run. About 12% of wallets that look elite by raw 30-day Sharpe collapse below the threshold under winsorization. Those wallets are not banned — you can still copy them — but the badge tells you what you're looking at.

4. The win-rate caveat

"Win-rate" on Polymarket is misleading because the contract universe is binary. If you only ever bet on contracts trading at $0.90, you will win ~90% of the time and earn roughly 0% edge. Raw win-rate is therefore not in the composite at all. What we use instead is edge-adjusted hit rate:

EdgeAdjustedHitRate = (ActualWins / Trades) − AvgImpliedProb

If a wallet bets on contracts whose average entry price is $0.42 and wins 51% of the time, the edge-adjusted hit rate is +0.09 — that is the genuinely informative number. A wallet with a 73% raw win rate but average entry of $0.74 has edge-adjusted hit rate −0.01, i.e., is approximately fair-priced and getting paid for nothing.

5. Drawdown and recovery

Maximum drawdown is the deepest peak-to-trough decline in equity over the window. We track three drawdown statistics:

Max drawdown (%): the standard measure.
Time-to-recovery: trades or days from trough back to prior peak. A wallet that drew down 18% but recovered in 6 trades is structurally healthier than one that drew down 12% and stayed underwater for 3 weeks.
Calmar ratio: ROI_30d / |MaxDrawdown_30d|. Published alongside Sharpe; not in the composite to avoid double-counting drawdown.

6. Rank stability between refreshes

The leaderboard refreshes every 15 minutes. If the rank of a wallet changes wildly between refreshes, the score is too sensitive and the user experience is bad. We monitor the Spearman rank correlation between consecutive refreshes for the top-100 segment. Our running target is ρ ≥ 0.92. When ρ drops below 0.85 (it has happened twice in the last six months, both during major election-night resolution waves) we surface a banner explaining the noise.

Stability is a feature, not a side effect. A leaderboard that thrashes is a leaderboard you cannot trust to follow.

7. A worked example

Consider three plausible wallets in a single 30-day window:

Wallet	Trades	ROI	Sharpe	Edge HR	Max DD	Composite
0xAAAA…1111	142	+38%	2.7	+0.08	−9%	+1.84
0xBBBB…2222	61	+91%	3.4	+0.04	−28%	+1.21
0xCCCC…3333	388	+22%	2.1	+0.11	−6%	+1.62

Wallet B has the highest raw ROI but ranks below A and C because of the deeper drawdown and lower edge-adjusted hit rate. Wallet A wins the composite by being good on all four axes. This is the behavior we want: the leaderboard rewards balanced excellence, not a single hot streak.

8. Sample size and the small-N trap

The single most common reason a wallet looks elite for two weeks and then collapses is sample size. With 30 binary trades, the standard error on win-rate alone is roughly 9 percentage points; with 200 trades it is 3.5 percentage points. A wallet with 32 trades and a 65% raw win rate could plausibly be a 50% wallet that got lucky. We respond in three ways:

Hard floor of 30 trades for inclusion in the leaderboard. Below this the wallet shows on the profile page but not in the ranked list.
Confidence-shrunk Sharpe: every Sharpe is shrunk toward the cohort median by a factor that depends on sample size, so a 31-trade Sharpe of 4.0 reports as roughly 2.7. The full shrinkage formula is in /methodology.
Sample-size badge on every wallet card — a green dot at ≥200 trades, amber 100–199, red 30–99.

The cleanest practical filter for users picking leaders to copy is to require ≥100 trades in the 30-day window. About 38% of the 12,438 indexed wallets meet that bar; the rest are in the universe but flagged.

9. Concentration and self-trading detection

A wallet whose return is dominated by a single position is structurally fragile and may signal a one-shot lucky guess rather than reproducible edge. We compute a Herfindahl-Hirschman-style concentration index (HHI) on each wallet's USDC volume by market. Any wallet with HHI > 0.4 (i.e., a single market accounts for >40% of volume) is flagged "concentrated" on the profile page.

Self-trading detection runs on transaction graph data. We watch for cycles where wallet A buys YES at $X and a closely-linked wallet B sells YES at $Y > $X seconds later in the same market. Identified clusters are merged for scoring purposes (so the operator does not appear at the top of the leaderboard 4 times) and clear manipulation patterns trigger removal. The threshold is documented; the false-positive rate has been measured at roughly 0.7% of flagged clusters in our internal audits.

10. What the score doesn't capture

We are explicit about the gaps:

Survivorship bias. Wallets that blew up and stopped trading are not in the cohort. We address this in our risk-management post.
Off-chain knowledge. We cannot tell whether a wallet is run by a journalist with a source, a sports analyst with a model, or a lucky novice. Sample size and edge-adjusted hit rate are the proxies.
Capacity. A wallet trading $200 positions cannot be copied with $20,000 positions without slippage that breaks the strategy. Our strategies page models this.
Future-proofing. Past edge is not a guarantee. Composite is a screen, not a promise.

Frequently asked questions

How often is each wallet's score updated?

Every 15 minutes. The leaderboard timestamp shows the exact last-refresh moment, and the API at /developers exposes the same data programmatically.

Why isn't ROI weighted higher in the composite?

Because ROI without volatility context is misleading. A wallet that quadrupled its bankroll in a 30-day window with three concentrated bets has the same ROI as a wallet that earned 300% across 200 well-sized bets. The Sharpe weighting captures the difference; ROI is in the composite at 25% weight precisely so it cannot dominate.

Do you ever remove wallets from the leaderboard?

Yes. Wallets are excluded if (a) sample size drops below 30 trades in the window, (b) the wallet is identified as a known operator's secondary address (we de-duplicate), or (c) clear self-trading patterns are detected. Manipulation flags are documented on /methodology.

Can I see the raw inputs for a specific wallet?

Yes. Every wallet's profile page on the leaderboard shows the four component scores, the trade list, and the winsorized vs raw Sharpe. The API exposes the same fields.

How the score evolves between refreshes

Every 15 minutes the scoring pipeline runs from scratch on a rolling window: fetch all closed positions for each indexed wallet over the trailing 30 days, compute returns, run the MAD filter, recompute Sharpe and edge-adjusted hit rate, refresh the composite, re-rank. Total compute time is roughly 38 seconds across the full 1,891–2,492-wallet cohort on the production indexer; the leaderboard publish cycle is therefore comfortably under the 15-minute refresh budget.

Two refresh-related properties are worth understanding:

Newly resolved markets shift scores discontinuously. When a major election or earnings event resolves, every wallet with exposure sees its return series change in a single tick. This is real, not a bug, and is why we publish the Spearman rank correlation between consecutive refreshes — so users can see when the leaderboard is in a high-noise window.
Newly indexed wallets enter mid-window. When a wallet hits the 30-trade minimum it appears on the leaderboard with its full 30-day window already populated. There is no "ramp-up" advantage; we do not lower the bar for new entrants.

What we deliberately do not do

Some choices are easier to defend by stating explicitly what we ruled out:

We do not weight by USDC volume. A wallet that trades $5,000 positions and a wallet that trades $50 positions are scored on the same axes. Volume is shown but not in the composite, because volume rewards capital, not edge.
We do not include realized gas/slippage costs. Polymarket gas is a few cents on Polygon and is roughly constant per trade, so its inclusion would just shift every wallet's ROI by a similar small amount. The mirror-side slippage that matters for copy traders is captured separately on the changelog.
We do not adjust for "skill" via attribution models. We considered Bayesian shrinkage toward category priors but rejected it as more theatrical than predictive. Sample-size shrinkage on the raw Sharpe is the only smoothing in the stack.

Want to use the score in your own model?

The full schema is on /methodology, the API is documented at /developers, and the glossary defines every term used here. If you'd rather skip the modeling and just follow vetted wallets, the leaderboard is one click away at /leaderboard — browse free, then upgrade to Pro at /dashboard/billing when you’re ready to mirror trades.