Skip to content

ADR-002: CalibrationResult is a persistable aggregate sidecar

Status: Accepted (retroactively documented 2026-05-08) Date: 2026-04-30 (commit f83eed5d, "polymorphic apply_conformal(experiment, ...) with CalibrationResult"); landed on main via PR #361 on 2026-05-02 (commit 2e88cd85) Authors: ai-tommytf (per git log on models/meta_models/conformalise.py)

Context

Before PR #361, conformal calibration was a transient computation that lived inside the postprocess walk-forward loop. Half-widths were computed from residuals, embedded directly into run_dir/postprocessed/national.parquet as lower/upper columns, and discarded. There was no first-class on-disk representation of "the calibration that was fitted from this run".

That design failed for two concrete reasons:

  1. Forecast had no residuals to fit from. At forecast time the production fold's obs_yield_kg_ha is NaN (the harvest has not happened), so a forecast-time call site cannot reconstruct the OOS residuals that postprocess used. The CV folds it needs to scan are the same artefacts postprocess already scanned. Re-fitting per init_date duplicated the walk-forward residual scan that postprocess had just done.
  2. Multiple residual recipes co-exist. PR #361 introduced four residual_mode values (market_insights_models/src/commodity_hindcast/models/meta_models/types.py:16) describing which residuals feed the quantile: per init_md, per year, fully pooled, and an in-sample fallback. Embedding "the calibration" in national.parquet collapses all four to whichever one happened to run; consumers who want a different mode have to re-run postprocess.

The wiki entity page records that an earlier domain_model/AGGREGATES.md entry described CalibrationResult as "a transient value returned from stages/run_meta_models.py" (wiki/commodity_hindcast/entities/CalibrationResult.md "Correction to AGGREGATES.md" section, lines 24-35). That description has been corrected; wiki/domain_model/AGGREGATES.md:214 now calls it "a persisted aggregate sidecar, not a transient value".

Decision

CalibrationResult is a frozen dataclass with first-class persistence, written once per configured mode during postprocess and loaded by every downstream consumer.

  • Frozen dataclass declaration at market_insights_models/src/commodity_hindcast/models/meta_models/conformalise.py:106-117.
  • Long-format parquet serialisation via to_frame (conformalise.py:136), save (conformalise.py:215), and load (conformalise.py:225). I/O routes through treefera_market_insights.shared.utils.dataframes.read_dataframe / write_dataframe and accepts cloudpathlib.AnyPath for S3.
  • One sidecar per mode at the canonical path run_dir/conformal/{mode}.parquet, computed by calibration_path (stages/run_meta_models.py:60).
  • Postprocess persists every mode listed in config.postprocess.conformalise via fit_and_save_all_configured (stages/run_meta_models.py:85); the first-listed mode is the primary calibration applied at runtime by primary_calibration (stages/run_meta_models.py:119).
  • Application surface: predict_interval(sim, *, fold_year=None, init_md=None) dispatches by populated field (conformalise.py:360-398). The four mode handlers (conformalise.py:470, :493, :526, :546) each populate exactly one of per_init_md / per_year / pooled — the one-of invariant noted in the dataclass docstring (conformalise.py:108-117).

Consequences

Positive

  • Forecast, delivery, and diagnostics read a CV-derived calibration without re-running postprocess; get_or_fit_calibration (stages/run_meta_models.py:100-116) loads the sidecar when present and only fits on demand if it is missing.
  • A hindcast run can pre-fit several candidate calibrations without forcing the forecast to use any specific one (stages/run_meta_models.py:119-135); per-mode sidecars allow comparing modes side-by-side from the same run_dir (wiki/commodity_hindcast/sources/prs/PR-361.md "95% half-width comparison across modes").
  • Round-trip safety is enforced: to_frame raises ValueError rather than serialising an empty sidecar (conformalise.py:187-197).
  • Self-describing: residual_mode, method, experiment_key, n_residuals, per_year_fallback persist as columns (conformalise.py:138-212); load reconstructs without out-of-band metadata.

Negative

  • One additional artefact per configured mode to persist (extra disk).
  • Schema changes to CalibrationResult columns require migrating existing sidecars; consumers must not hardcode the old paths (wiki/commodity_hindcast/sources/prs/PR-361.md "Lessons captured").

Neutral

  • The commodity_ filename prefix was dropped in PR #361 because run_dir already encodes the commodity-region (wiki/commodity_hindcast/sources/prs/PR-361.md "Filename layout").

Alternatives considered

  • Keep transient (status quo ante). Rejected because forecast had to re-fit calibration per init_date, duplicating the CV residual scan postprocess had just done; and because forecast time has no production-fold residuals to fit from at all (production obs is NaN).
  • Embed in postprocessed/national.parquet. Rejected because that collapses calibration to whichever single mode postprocess ran with; storing four modes side-by-side per (year, init_date) row would bloat the wide national frame and tie consumers to its schema.
  • In-sample-only fallback as the default. Rejected on statistical grounds — _in_sample_pooled (conformalise.py:546-568) emits a loguru warning that intervals are biased narrow because in-sample residuals under-estimate true OOS uncertainty; it is retained only as the fallback when no CV folds exist.

Verification

  • Round-trip property: tests/unit/commodity_hindcast/test_apply_conformal_experiment.py:146 (test_apply_conformal_hindcast_oos_per_init_date_round_trips_brackets_and_orders) exercises cal.save(path) then CalibrationResult.load(path) and asserts dictionary equality across per_init_md keys and levels.
  • Empty-row guard at to_frame() raises ValueError (conformalise.py:187-197); dedicated test coverage is [PLACEHOLDER: not located in the audit window].
  • One-of invariant encoded by the four mode handlers (conformalise.py:470, :493, :526, :546).

References

  • market_insights_models/src/commodity_hindcast/models/meta_models/conformalise.py:106-117 — frozen dataclass declaration
  • models/meta_models/conformalise.py:136to_frame
  • models/meta_models/conformalise.py:187-197 — empty-row guard
  • models/meta_models/conformalise.py:215save
  • models/meta_models/conformalise.py:225load classmethod
  • models/meta_models/conformalise.py:360-398predict_interval dispatch
  • models/meta_models/conformalise.py:470, :493, :526, :546 — four mode handlers
  • models/meta_models/types.py:16ResidualMode literal
  • stages/run_meta_models.py:60calibration_path
  • stages/run_meta_models.py:85fit_and_save_all_configured
  • stages/run_meta_models.py:100-116get_or_fit_calibration
  • stages/run_meta_models.py:119primary_calibration
  • tests/unit/commodity_hindcast/test_apply_conformal_experiment.py:146 — round-trip test
  • wiki/commodity_hindcast/entities/CalibrationResult.md
  • wiki/commodity_hindcast/sources/prs/PR-361.md
  • wiki/commodity_hindcast/concepts/conformal_modes.md
  • wiki/commodity_hindcast/concepts/residual_modes.md
  • wiki/domain_model/AGGREGATES.md:214
  • PR #361 (commit 2e88cd85, merged 2026-05-02)