Result Hash

How it works

Every Pancake receipt carries a result_hash: a SHA-256 hexadecimal digest computed from the canonical JSON representation of the execution output envelope. This hash is the primary reproducibility guarantee — it lets any reader verify that a receipt is an honest representation of what was actually run.

Reproducing the result_hash requires three inputs: the strategy spec (available in the receipt's spec_viewer), the EvidenceDataset rows (identified by rows_sha256 in the receipt), and the batter package at the exact engine_version pinned in the receipt. Given these three inputs and Python 3.12+, running batter will produce an output envelope whose SHA-256 matches the receipt's result_hash exactly.

Byte-stability depends on three properties of the batter engine. First, canonical JSON serialization: the strategy spec is sorted lexicographically at every nesting level before hashing, so key reordering does not affect the hash. Second, PCG64 seeded RNG: all stochastic operations (bootstrap CI, permutation test) use NumPy's PCG64 generator seeded from the spec_hash, producing identical resamples across platforms. Third, Python 3.12+: a CPython change in version 3.12 (gh-100946) altered the sum() implementation for homogeneous float lists, producing a 1-ULP difference from 3.11 that propagates through bootstrap CI into a completely different SHA-256. Python 3.11 is permanently out of scope.

The result_hash is also used as the seed for the bootstrap and permutation test RNG. This creates a virtuous dependency: the hash is derived from the run output, and the run output is deterministically derived from the spec_hash. This means the bootstrap resamples are coupled to the exact run that produced them — no other run can produce the same bootstrap CI with a different hash.

The result_hash concept is documented in /engine and /engine/determinism. The hash algorithm (SHA-256 of canonical output envelope) is part of the batter open-source package API.