Skip to content

PR #361 — feat(commodity_hindcast): USE MAPIE Conformalise and turn into proper MetaModel — multi-mode CalibrationResult with mode-keyed sidecars

At a glance

  • Author: ai-tommytf
  • Merged: 2026-05-02
  • Branch: tl/fix-cis
  • Net effect: Eight-commit refactor that overhauled how conformal prediction intervals are fitted, persisted, and applied. Introduced CalibrationResult (a frozen, persistable dataclass), four residual_mode values, mode-keyed sidecar parquets under run_dir/conformal/, and a polymorphic apply_conformal entry point. Removed ~150 lines of interleaved, duplicated calibration logic from the postprocess walk-forward loop.
  • Why this matters: Calibration is now a first-class persistent artefact; consumers can load any mode sidecar and call CalibrationResult.predict_interval(sim, init_md=...) without re-running postprocess. The commodity_ prefix was also dropped from run-level artefact filenames.

PR body (faithful extract)

What is conformal calibration?

residual r = obs − sim
half-width = quantile_α( |r| )
interval   = [sim − half-width, sim + half-width]

The only non-trivial choice is which residuals feed the quantile. That is where residual_mode comes from.

The four residual_mode values

residual_mode                   data input                   output shape
----------------------------------------------------------------------------------
hindcast_oos_per_init_date      pool by init_md (MM-DD)      one CI per init_date
hindcast_oos_per_year           walk-forward bootstrap       one CI per fold-year
hindcast_oos_fully_pooled       every (year, init_date)      ONE CI broadcast
in_sample_pooled                production train_preds       ONE CI broadcast (fallback)

Visual partition of a 3-CV-year × 3-init-date hindcast:

                 init_dates →
   year ↓     Jan-01    Apr-10    Jul-03
   2022      r_22_Jan  r_22_Apr  r_22_Jul
   2023      r_23_Jan  r_23_Apr  r_23_Jul
   2024      r_24_Jan  r_24_Apr  r_24_Jul

   per_init_date     groups COLUMNS    →  CI_Jan, CI_Apr, CI_Jul
   per_year          groups ROWS       →  walk-forward: NaN, CI_2023, CI_2024
   fully_pooled      one big bag       →  one CI broadcast everywhere

CalibrationResult — fit once, apply anywhere

A frozen dataclass:

CalibrationResult.save(path)             # write long-format parquet
CalibrationResult.load(path)             # reconstruct from disk
.predict_interval(sim, *, fold_year=None, init_md=None, strict=True)  # apply

The strict flag controls whether unknown lookup keys raise KeyError or return NaN bounds.

Pipeline data flow after the refactor

Postprocess phase (canonical writer):

for mode in config.postprocess.conformalise:
    fit_and_save_calibration(result, ci_levels, mode)
        ↓ writes
    run_dir/conformal/{mode}.parquet

primary = first listed mode
build_rows uses primary's predict_interval to populate
run_dir/postprocessed/national.parquet

Forecast / delivery / plot consumers:

cal = get_or_fit_calibration(result, ci_levels, mode)   # load if present, fit+save on demand
intervals = cal.predict_interval(sim, init_md=…, strict=False)

Configuration

postprocess:
    conformalise:
        - hindcast_oos_per_init_date    # primary — populates delivery CSVs
        - hindcast_oos_fully_pooled     # sidecar
        - hindcast_oos_per_year         # sidecar

Live demo — corn hindcast sidecar schema

Schema:
  fold_year: Int32
  fold_init_md: category
  level: float64
  half_width_kg_ha: float64
  n_residuals: int64
  residual_mode: category
  method: category
  commodity: category

Total rows: 210 = 42 init_mds x 5 levels

95% half-width comparison across modes (kg/ha)

hindcast_oos_per_init_date[01-01]: 309
hindcast_oos_per_init_date[04-08]: 305
hindcast_oos_per_init_date[07-08]: 422
hindcast_oos_per_init_date[10-15]: 309
hindcast_oos_fully_pooled: 347
hindcast_oos_per_year[2020]: NaN
hindcast_oos_per_year[2021]: 83
hindcast_oos_per_year[2022]: 330
hindcast_oos_per_year[2023]: 317
hindcast_oos_per_year[2024]: 304

per_year fold 2020 → NaN (no prior CV residuals). The point of having all three on disk: re-run plots/dashboards against any mode without re-running postprocess.

Worked example — apply a calibration

cal = CalibrationResult.load(f"runs/{run}/conformal/hindcast_oos_per_init_date.parquet")
# Loaded: mode=hindcast_oos_per_init_date, n_residuals=210, levels=(0.5, 0.68, 0.8, 0.9, 0.95)

intervals = cal.predict_interval(11000.0, init_md="04-08")
#   50% CI: [10818, 11182]
#   80% CI: [10773, 11227]
#   95% CI: [10695, 11305]

# strict=False — unknown init_md returns NaN without raising:
lo, hi = cal.predict_interval(11000.0, init_md="02-29", strict=False)[0.95]
# -> (nan, nan)

Filename layout — commodity_ prefix dropped

Before:                                       After:
runs/.../postprocessed/corn_national.parquet  →  runs/.../postprocessed/national.parquet
runs/.../conformal/corn_<mode>.parquet        →  runs/.../conformal/<mode>.parquet

calibration_path() and ExperimentResult.postprocessed_national_path lost their commodity argument.

Test coverage

Two pivotal test modules added:

  • tests/unit/commodity_hindcast/test_apply_conformal_experiment.py — hypothesis-driven stress test: round-trip equality, bracketing at every level, monotonic widening with confidence level.
  • tests/unit/commodity_hindcast/test_apply_delivery_post_transforms.py — exercises all four flag combinations of enforce_ci_narrowing / drop_frozen_tail.

21 conformal-related tests pass. Full unit suite: 302 → 306 tests.

Public API surface (key symbols in models/meta_models/conformalise.py)

ConformalMethod = Literal["quantile", "split_conformal"]
ResidualMode = Literal[...]      # four values above
class CalibrationResult: ...
def apply_conformal(...): ...    # polymorphic (residuals array OR ExperimentResult)
class MapieConformalRegressor: ...

Orchestration helpers in stages/run_meta_models.py:

Function Used by What it does
fit_and_save_calibration(result, ci_levels, mode) postprocess Single-mode fit + save
fit_and_save_all_configured(result, ci_levels) postprocess Loop over all configured modes
get_or_fit_calibration(result, ci_levels, mode) forecast / delivery / plots Load sidecar; fit on demand if missing
primary_calibration(result, ci_levels) forecast / delivery First-listed mode's calibration
calibration_path(run_dir, mode) everywhere Canonical sidecar path

Files / lines touched

Additions Deletions File
+695 -21 market_insights_models/src/commodity_hindcast/models/meta_models/conformalise.py
+103 -177 market_insights_models/src/commodity_hindcast/stages/run_meta_models.py
+205 -0 tests/unit/commodity_hindcast/test_apply_conformal_experiment.py
+29 -70 market_insights_models/src/commodity_hindcast/models/meta_models/residuals.py
+94 -0 tests/unit/commodity_hindcast/test_mapie_conformalise.py
+90 -0 tests/unit/commodity_hindcast/test_apply_delivery_post_transforms.py
+65 -21 tests/unit/commodity_hindcast/test_postprocess.py
+54 -30 market_insights_models/src/commodity_hindcast/delivery/conversions.py
+8 -57 market_insights_models/src/commodity_hindcast/stages/run_forecast.py
+38 -1 market_insights_models/src/commodity_hindcast/config.py

Cross-references

Lessons captured

  • Conformal calibration sidecars are long-format parquets under run_dir/conformal/{mode}.parquet; commodity_ prefix is absent — the run_dir name already encodes the commodity.
  • fold_year is <NA> for per_init_date mode (broadcast over years) and fold_init_md is <NA> for per_year mode (broadcast over init_dates).
  • The strict=False API is the explicit contract for delivery row assembly — missing init_dates return NaN bounds without raising.
  • enforce_ci_narrowing in delivery exists to handle quantile noise on small calibration sets; it is not a bug but an expected property of small CV sets.
  • MapieConformalRegressor mirrors the sklearn fit/conformalize/predict_interval API for use when training features (X, y) are still in memory.
  • The commodity_ prefix was removed from postprocessed/national.parquet and conformal/<mode>.parquet in this PR; consumers must not hard-code the old paths.