How do you get historical Polymarket data?
Polymarket exposes public APIs: the Gamma API for market metadata and resolutions, and the CLOB API for prices and order books. For backtesting, Pancake maintains a canonical pool of prediction-market evidence datasets — an agent calls search_datasets over MCP and gets validated, content-hashed rows without scraping anything.
For raw access, Polymarket's Gamma API serves market metadata — questions, outcomes, end dates, resolution status — and the CLOB API serves market pricing data. Both are publicly documented. Assembling backtest-grade history from them takes real work: joining price snapshots to resolution records, aligning timestamps, and being disciplined about what was knowable when.
That assembly step is where most DIY Polymarket backtests silently break. A row that uses a price observed after the decision time, or a resolution joined to the wrong outcome token, produces results that look fine and mean nothing.
Pancake's canonical dataset pool packages this work: evidence datasets with a market link, decision time, entry price, resolution time, and resolved outcome per row, validated at ingest (schema, lookahead, monotonicity, ranges) and content-hashed (rows_sha256). An agent finds them with the search_datasets MCP tool and runs a backtest against them directly. Custom rows can also be uploaded via create_evidence_dataset and get the same validation.
Every dataset records provenance — source URLs and transformations — so a result built on pool data remains auditable back to the upstream source.