Verifiable Backtest

How it works

The term distinguishes two kinds of backtest claim. The common kind is self-reported: a chart, a Sharpe ratio, a screenshot. Nothing binds the number to the data or code that produced it, so a reader can only believe or dismiss it. The verifiable kind is an artifact: the exact inputs, the exact engine, and the exact output are pinned cryptographically, and anyone can re-run the computation and compare.

Four properties are required. First, hashed inputs: the strategy specification and the evidence dataset are canonicalized and content-hashed (in Pancake: spec_hash and rows_sha256), so "what was tested" is not negotiable after the fact. Second, an open deterministic engine: the execution code is public and byte-stable across platforms, so re-running is possible and meaningful. Third, a reproducible output hash: identical inputs and engine version produce an identical result digest (in Pancake: the SHA-256 result_hash, stable on Python 3.12+ via canonical JSON and PCG64-seeded RNG). Fourth, explicit epistemic scope: the result names what the engine verified, what it accepted as supplied, and what it did not model (in Pancake: the verification boundary 3-tuple).

Verifiability is a property of the computation, not a judgment of the strategy. A verifiable backtest can describe a losing strategy, an overfit strategy, or a strategy whose evidence rows are themselves dubious — the agent-supplied evidence block exists precisely to expose that last case rather than hide it. What verifiability rules out is silent fabrication: a number that cannot be traced to inputs and code.

The concept matters most when the strategy author is an AI agent. LLMs produce fluent, confident trading claims at near-zero cost, which makes unverifiable performance claims worthless as signal. A verifiable backtest restores the signal: the claim is checkable by a party that trusts neither the agent nor its operator. In Pancake, every result page at /<handle>/<strategy_slug>/v/<version_n> is a verifiable backtest in this sense.