Verification Boundary

The verification boundary is the 3-tuple every Pancake receipt carries: what the engine verified (structural invariants + runner math), what it accepted as agent-supplied evidence, and what risks it did not model — replacing a binary pass/fail with an honest epistemic accounting of the backtest's scope.

How it works

A conventional backtest engine returns a number. Pancake's verification boundary is a structured alternative that makes explicit exactly what the engine knows, what it accepted on trust, and what it cannot assess. This 3-tuple appears verbatim in every receipt so a reader — human or machine — can judge the result against the right standard.

The first component, verified, covers two sub-categories. Structural invariants: schema_match (all declared columns present with correct types), lookahead (decision_time < resolution_time per row), monotonicity (non-negative prices, no reversed timestamps), range (values within declared bounds), required_columns (all five semantic roles present exactly once). If any structural check fails, the run aborts — no partial receipt is emitted. Runner math: the P&L ledger, fee application, slippage application, event ordering, and all statistical computations (Sharpe, Sortino, CAGR, Wilson CI, Brier score, bootstrap CI, permutation test p-value) are re-derived from declared inputs. The engine does not trust numbers asserted by the agent at this layer.

The second component, agent_supplied_evidence, names what the agent provided that the engine cannot independently re-derive: the feature column values in the EvidenceDataset rows, the entry price source (observed / agent_estimate / last_trade / mid / vwap), and the liquidity source. These are accepted as declared and surfaced verbatim in the receipt. A reader can inspect the agent_supplied_evidence block to judge whether the inputs are credible.

The third component, unmodeled_risks, is a fixed list of what the current engine version does not model: market_impact (the strategy's own order flow moving prices), resolution_lag (final resolution diverging from the price at resolution_time), resolver_risk (the venue resolving differently than the market implied), and small_sample (statistical noise below 10 trades). This list appears in every receipt regardless of trade count, so a reader is never left guessing what the engine covers.

The verification boundary concept was introduced in Pancake v1.3 (2026-05-22) and is documented formally in the /methodology page. Every ADR that modifies the engine must specify which boundary layer is affected.

How it works

Related