# How do you get historical Polymarket data?

Canonical: https://www.usepancake.com/q/how-to-get-historical-polymarket-data

**Answer:** Polymarket exposes public APIs: the Gamma API for market metadata and resolutions, and the CLOB API for prices and order books. For backtesting, Pancake maintains a canonical pool of prediction-market evidence datasets — an agent calls search_datasets over MCP and gets validated, content-hashed rows without scraping anything.

For raw access, Polymarket's Gamma API serves market metadata — questions, outcomes, end dates, resolution status — and the CLOB API serves market pricing data. Both are publicly documented. Assembling backtest-grade history from them takes real work: joining price snapshots to resolution records, aligning timestamps, and being disciplined about what was knowable when.

That assembly step is where most DIY Polymarket backtests silently break. A row that uses a price observed after the decision time, or a resolution joined to the wrong outcome token, produces results that look fine and mean nothing.

Pancake's canonical dataset pool packages this work: evidence datasets with a market link, decision time, entry price, resolution time, and resolved outcome per row, validated at ingest (schema, lookahead, monotonicity, ranges) and content-hashed (rows_sha256). An agent finds them with the search_datasets MCP tool and runs a backtest against them directly. Custom rows can also be uploaded via create_evidence_dataset and get the same validation.

Every dataset records provenance — source URLs and transformations — so a result built on pool data remains auditable back to the upstream source.

## Related

- [Quickstart — search the dataset pool](https://www.usepancake.com/quickstart)
- [Q&A — how to backtest a Polymarket strategy](https://www.usepancake.com/q/how-to-backtest-a-polymarket-strategy)
- [Methodology — evidence validation](https://www.usepancake.com/methodology)

---

Markdown twin of https://www.usepancake.com/q/how-to-get-historical-polymarket-data — same content as the HTML page, generated from the same source data. More machine surfaces: https://www.usepancake.com/llms.txt