PR #361 — feat(commodity_hindcast): USE MAPIE Conformalise and turn into proper MetaModel — multi-mode CalibrationResult with mode-keyed sidecars¶
At a glance¶
- Author: ai-tommytf
- Merged: 2026-05-02
- Branch:
tl/fix-cis - Net effect: Eight-commit refactor that overhauled how conformal prediction intervals are fitted, persisted, and applied. Introduced
CalibrationResult(a frozen, persistable dataclass), fourresidual_modevalues, mode-keyed sidecar parquets underrun_dir/conformal/, and a polymorphicapply_conformalentry point. Removed ~150 lines of interleaved, duplicated calibration logic from the postprocess walk-forward loop. - Why this matters: Calibration is now a first-class persistent artefact; consumers can load any mode sidecar and call
CalibrationResult.predict_interval(sim, init_md=...)without re-running postprocess. Thecommodity_prefix was also dropped from run-level artefact filenames.
PR body (faithful extract)¶
What is conformal calibration?¶
residual r = obs − sim
half-width = quantile_α( |r| )
interval = [sim − half-width, sim + half-width]
The only non-trivial choice is which residuals feed the quantile. That is where residual_mode comes from.
The four residual_mode values¶
residual_mode data input output shape
----------------------------------------------------------------------------------
hindcast_oos_per_init_date pool by init_md (MM-DD) one CI per init_date
hindcast_oos_per_year walk-forward bootstrap one CI per fold-year
hindcast_oos_fully_pooled every (year, init_date) ONE CI broadcast
in_sample_pooled production train_preds ONE CI broadcast (fallback)
Visual partition of a 3-CV-year × 3-init-date hindcast:
init_dates →
year ↓ Jan-01 Apr-10 Jul-03
2022 r_22_Jan r_22_Apr r_22_Jul
2023 r_23_Jan r_23_Apr r_23_Jul
2024 r_24_Jan r_24_Apr r_24_Jul
per_init_date groups COLUMNS → CI_Jan, CI_Apr, CI_Jul
per_year groups ROWS → walk-forward: NaN, CI_2023, CI_2024
fully_pooled one big bag → one CI broadcast everywhere
CalibrationResult — fit once, apply anywhere¶
A frozen dataclass:
CalibrationResult.save(path) # write long-format parquet
CalibrationResult.load(path) # reconstruct from disk
.predict_interval(sim, *, fold_year=None, init_md=None, strict=True) # apply
The strict flag controls whether unknown lookup keys raise KeyError or return NaN bounds.
Pipeline data flow after the refactor¶
Postprocess phase (canonical writer):
for mode in config.postprocess.conformalise:
fit_and_save_calibration(result, ci_levels, mode)
↓ writes
run_dir/conformal/{mode}.parquet
primary = first listed mode
build_rows uses primary's predict_interval to populate
run_dir/postprocessed/national.parquet
Forecast / delivery / plot consumers:
cal = get_or_fit_calibration(result, ci_levels, mode) # load if present, fit+save on demand
intervals = cal.predict_interval(sim, init_md=…, strict=False)
Configuration¶
postprocess:
conformalise:
- hindcast_oos_per_init_date # primary — populates delivery CSVs
- hindcast_oos_fully_pooled # sidecar
- hindcast_oos_per_year # sidecar
Live demo — corn hindcast sidecar schema¶
Schema:
fold_year: Int32
fold_init_md: category
level: float64
half_width_kg_ha: float64
n_residuals: int64
residual_mode: category
method: category
commodity: category
Total rows: 210 = 42 init_mds x 5 levels
95% half-width comparison across modes (kg/ha)¶
hindcast_oos_per_init_date[01-01]: 309
hindcast_oos_per_init_date[04-08]: 305
hindcast_oos_per_init_date[07-08]: 422
hindcast_oos_per_init_date[10-15]: 309
hindcast_oos_fully_pooled: 347
hindcast_oos_per_year[2020]: NaN
hindcast_oos_per_year[2021]: 83
hindcast_oos_per_year[2022]: 330
hindcast_oos_per_year[2023]: 317
hindcast_oos_per_year[2024]: 304
per_year fold 2020 → NaN (no prior CV residuals). The point of having all three on disk: re-run plots/dashboards against any mode without re-running postprocess.
Worked example — apply a calibration¶
cal = CalibrationResult.load(f"runs/{run}/conformal/hindcast_oos_per_init_date.parquet")
# Loaded: mode=hindcast_oos_per_init_date, n_residuals=210, levels=(0.5, 0.68, 0.8, 0.9, 0.95)
intervals = cal.predict_interval(11000.0, init_md="04-08")
# 50% CI: [10818, 11182]
# 80% CI: [10773, 11227]
# 95% CI: [10695, 11305]
# strict=False — unknown init_md returns NaN without raising:
lo, hi = cal.predict_interval(11000.0, init_md="02-29", strict=False)[0.95]
# -> (nan, nan)
Filename layout — commodity_ prefix dropped¶
Before: After:
runs/.../postprocessed/corn_national.parquet → runs/.../postprocessed/national.parquet
runs/.../conformal/corn_<mode>.parquet → runs/.../conformal/<mode>.parquet
calibration_path() and ExperimentResult.postprocessed_national_path lost their commodity argument.
Test coverage¶
Two pivotal test modules added:
tests/unit/commodity_hindcast/test_apply_conformal_experiment.py— hypothesis-driven stress test: round-trip equality, bracketing at every level, monotonic widening with confidence level.tests/unit/commodity_hindcast/test_apply_delivery_post_transforms.py— exercises all four flag combinations ofenforce_ci_narrowing/drop_frozen_tail.
21 conformal-related tests pass. Full unit suite: 302 → 306 tests.
Public API surface (key symbols in models/meta_models/conformalise.py)¶
ConformalMethod = Literal["quantile", "split_conformal"]
ResidualMode = Literal[...] # four values above
class CalibrationResult: ...
def apply_conformal(...): ... # polymorphic (residuals array OR ExperimentResult)
class MapieConformalRegressor: ...
Orchestration helpers in stages/run_meta_models.py:
| Function | Used by | What it does |
|---|---|---|
fit_and_save_calibration(result, ci_levels, mode) |
postprocess | Single-mode fit + save |
fit_and_save_all_configured(result, ci_levels) |
postprocess | Loop over all configured modes |
get_or_fit_calibration(result, ci_levels, mode) |
forecast / delivery / plots | Load sidecar; fit on demand if missing |
primary_calibration(result, ci_levels) |
forecast / delivery | First-listed mode's calibration |
calibration_path(run_dir, mode) |
everywhere | Canonical sidecar path |
Files / lines touched¶
| Additions | Deletions | File |
|---|---|---|
| +695 | -21 | market_insights_models/src/commodity_hindcast/models/meta_models/conformalise.py |
| +103 | -177 | market_insights_models/src/commodity_hindcast/stages/run_meta_models.py |
| +205 | -0 | tests/unit/commodity_hindcast/test_apply_conformal_experiment.py |
| +29 | -70 | market_insights_models/src/commodity_hindcast/models/meta_models/residuals.py |
| +94 | -0 | tests/unit/commodity_hindcast/test_mapie_conformalise.py |
| +90 | -0 | tests/unit/commodity_hindcast/test_apply_delivery_post_transforms.py |
| +65 | -21 | tests/unit/commodity_hindcast/test_postprocess.py |
| +54 | -30 | market_insights_models/src/commodity_hindcast/delivery/conversions.py |
| +8 | -57 | market_insights_models/src/commodity_hindcast/stages/run_forecast.py |
| +38 | -1 | market_insights_models/src/commodity_hindcast/config.py |
Cross-references¶
- Related entity pages: CalibrationResult, PostprocessConfig, DeliveryConfig
- Related concept pages: conformal calibration
- Related code pages: meta_models, stages (run_meta_models)
- Directly follows: PR-339 (the 9-phase restructure that created
models/meta_models/) - Directly followed by: PR-372 (which made
residual_modemandatory onForecastConfig)
Lessons captured¶
- Conformal calibration sidecars are long-format parquets under
run_dir/conformal/{mode}.parquet;commodity_prefix is absent — the run_dir name already encodes the commodity. fold_yearis<NA>forper_init_datemode (broadcast over years) andfold_init_mdis<NA>forper_yearmode (broadcast over init_dates).- The
strict=FalseAPI is the explicit contract for delivery row assembly — missing init_dates return NaN bounds without raising. enforce_ci_narrowingin delivery exists to handle quantile noise on small calibration sets; it is not a bug but an expected property of small CV sets.MapieConformalRegressormirrors the sklearn fit/conformalize/predict_interval API for use when training features(X, y)are still in memory.- The
commodity_prefix was removed frompostprocessed/national.parquetandconformal/<mode>.parquetin this PR; consumers must not hard-code the old paths.