PR #361 — feat(commodity_hindcast): USE MAPIE Conformalise and turn into proper MetaModel — multi-mode CalibrationResult with mode-keyed sidecars¶

At a glance¶

Author: ai-tommytf
Merged: 2026-05-02
Branch: tl/fix-cis
Net effect: Eight-commit refactor that overhauled how conformal prediction intervals are fitted, persisted, and applied. Introduced CalibrationResult (a frozen, persistable dataclass), four residual_mode values, mode-keyed sidecar parquets under run_dir/conformal/, and a polymorphic apply_conformal entry point. Removed ~150 lines of interleaved, duplicated calibration logic from the postprocess walk-forward loop.
Why this matters: Calibration is now a first-class persistent artefact; consumers can load any mode sidecar and call CalibrationResult.predict_interval(sim, init_md=...) without re-running postprocess. The commodity_ prefix was also dropped from run-level artefact filenames.

PR body (faithful extract)¶

What is conformal calibration?¶

residual r = obs − sim
half-width = quantile_α( |r| )
interval   = [sim − half-width, sim + half-width]

The only non-trivial choice is which residuals feed the quantile. That is where residual_mode comes from.

The four `residual_mode` values¶

residual_mode                   data input                   output shape
----------------------------------------------------------------------------------
hindcast_oos_per_init_date      pool by init_md (MM-DD)      one CI per init_date
hindcast_oos_per_year           walk-forward bootstrap       one CI per fold-year
hindcast_oos_fully_pooled       every (year, init_date)      ONE CI broadcast
in_sample_pooled                production train_preds       ONE CI broadcast (fallback)

Visual partition of a 3-CV-year × 3-init-date hindcast:

                 init_dates →
   year ↓     Jan-01    Apr-10    Jul-03
   2022      r_22_Jan  r_22_Apr  r_22_Jul
   2023      r_23_Jan  r_23_Apr  r_23_Jul
   2024      r_24_Jan  r_24_Apr  r_24_Jul

   per_init_date     groups COLUMNS    →  CI_Jan, CI_Apr, CI_Jul
   per_year          groups ROWS       →  walk-forward: NaN, CI_2023, CI_2024
   fully_pooled      one big bag       →  one CI broadcast everywhere

CalibrationResult — fit once, apply anywhere¶

A frozen dataclass:

CalibrationResult.save(path)             # write long-format parquet
CalibrationResult.load(path)             # reconstruct from disk
.predict_interval(sim, *, fold_year=None, init_md=None, strict=True)  # apply

The strict flag controls whether unknown lookup keys raise KeyError or return NaN bounds.

Pipeline data flow after the refactor¶

Postprocess phase (canonical writer):

for mode in config.postprocess.conformalise:
    fit_and_save_calibration(result, ci_levels, mode)
        ↓ writes
    run_dir/conformal/{mode}.parquet

primary = first listed mode
build_rows uses primary's predict_interval to populate
run_dir/postprocessed/national.parquet

Forecast / delivery / plot consumers:

cal = get_or_fit_calibration(result, ci_levels, mode)   # load if present, fit+save on demand
intervals = cal.predict_interval(sim, init_md=…, strict=False)

Configuration¶

postprocess:
    conformalise:
        - hindcast_oos_per_init_date    # primary — populates delivery CSVs
        - hindcast_oos_fully_pooled     # sidecar
        - hindcast_oos_per_year         # sidecar

Live demo — corn hindcast sidecar schema¶

Schema:
  fold_year: Int32
  fold_init_md: category
  level: float64
  half_width_kg_ha: float64
  n_residuals: int64
  residual_mode: category
  method: category
  commodity: category

Total rows: 210 = 42 init_mds x 5 levels

95% half-width comparison across modes (kg/ha)¶

hindcast_oos_per_init_date[01-01]: 309
hindcast_oos_per_init_date[04-08]: 305
hindcast_oos_per_init_date[07-08]: 422
hindcast_oos_per_init_date[10-15]: 309
hindcast_oos_fully_pooled: 347
hindcast_oos_per_year[2020]: NaN
hindcast_oos_per_year[2021]: 83
hindcast_oos_per_year[2022]: 330
hindcast_oos_per_year[2023]: 317
hindcast_oos_per_year[2024]: 304

per_year fold 2020 → NaN (no prior CV residuals). The point of having all three on disk: re-run plots/dashboards against any mode without re-running postprocess.

Worked example — apply a calibration¶

cal = CalibrationResult.load(f"runs/{run}/conformal/hindcast_oos_per_init_date.parquet")
# Loaded: mode=hindcast_oos_per_init_date, n_residuals=210, levels=(0.5, 0.68, 0.8, 0.9, 0.95)

intervals = cal.predict_interval(11000.0, init_md="04-08")
#   50% CI: [10818, 11182]
#   80% CI: [10773, 11227]
#   95% CI: [10695, 11305]

# strict=False — unknown init_md returns NaN without raising:
lo, hi = cal.predict_interval(11000.0, init_md="02-29", strict=False)[0.95]
# -> (nan, nan)

Filename layout — `commodity_` prefix dropped¶

Before:                                       After:
runs/.../postprocessed/corn_national.parquet  →  runs/.../postprocessed/national.parquet
runs/.../conformal/corn_<mode>.parquet        →  runs/.../conformal/<mode>.parquet

calibration_path() and ExperimentResult.postprocessed_national_path lost their commodity argument.

Test coverage¶

Two pivotal test modules added:

tests/unit/commodity_hindcast/test_apply_conformal_experiment.py — hypothesis-driven stress test: round-trip equality, bracketing at every level, monotonic widening with confidence level.
tests/unit/commodity_hindcast/test_apply_delivery_post_transforms.py — exercises all four flag combinations of enforce_ci_narrowing / drop_frozen_tail.

21 conformal-related tests pass. Full unit suite: 302 → 306 tests.

Public API surface (key symbols in `models/meta_models/conformalise.py`)¶

ConformalMethod = Literal["quantile", "split_conformal"]
ResidualMode = Literal[...]      # four values above
class CalibrationResult: ...
def apply_conformal(...): ...    # polymorphic (residuals array OR ExperimentResult)
class MapieConformalRegressor: ...

Orchestration helpers in stages/run_meta_models.py:

Function	Used by	What it does
`fit_and_save_calibration(result, ci_levels, mode)`	postprocess	Single-mode fit + save
`fit_and_save_all_configured(result, ci_levels)`	postprocess	Loop over all configured modes
`get_or_fit_calibration(result, ci_levels, mode)`	forecast / delivery / plots	Load sidecar; fit on demand if missing
`primary_calibration(result, ci_levels)`	forecast / delivery	First-listed mode's calibration
`calibration_path(run_dir, mode)`	everywhere	Canonical sidecar path

Files / lines touched¶

Additions	Deletions	File
+695	-21	`market_insights_models/src/commodity_hindcast/models/meta_models/conformalise.py`
+103	-177	`market_insights_models/src/commodity_hindcast/stages/run_meta_models.py`
+205	-0	`tests/unit/commodity_hindcast/test_apply_conformal_experiment.py`
+29	-70	`market_insights_models/src/commodity_hindcast/models/meta_models/residuals.py`
+94	-0	`tests/unit/commodity_hindcast/test_mapie_conformalise.py`
+90	-0	`tests/unit/commodity_hindcast/test_apply_delivery_post_transforms.py`
+65	-21	`tests/unit/commodity_hindcast/test_postprocess.py`
+54	-30	`market_insights_models/src/commodity_hindcast/delivery/conversions.py`
+8	-57	`market_insights_models/src/commodity_hindcast/stages/run_forecast.py`
+38	-1	`market_insights_models/src/commodity_hindcast/config.py`

Cross-references¶

Related entity pages: CalibrationResult, PostprocessConfig, DeliveryConfig
Related concept pages: conformal calibration
Related code pages: meta_models, stages (run_meta_models)
Directly follows: PR-339 (the 9-phase restructure that created models/meta_models/)
Directly followed by: PR-372 (which made residual_mode mandatory on ForecastConfig)

Lessons captured¶

Conformal calibration sidecars are long-format parquets under run_dir/conformal/{mode}.parquet; commodity_ prefix is absent — the run_dir name already encodes the commodity.
fold_year is <NA> for per_init_date mode (broadcast over years) and fold_init_md is <NA> for per_year mode (broadcast over init_dates).
The strict=False API is the explicit contract for delivery row assembly — missing init_dates return NaN bounds without raising.
enforce_ci_narrowing in delivery exists to handle quantile noise on small calibration sets; it is not a bug but an expected property of small CV sets.
MapieConformalRegressor mirrors the sklearn fit/conformalize/predict_interval API for use when training features (X, y) are still in memory.
The commodity_ prefix was removed from postprocessed/national.parquet and conformal/<mode>.parquet in this PR; consumers must not hard-code the old paths.