CalibrationResult¶

Definition¶

CalibrationResult is a frozen, persistable dataclass (@dataclass(frozen=True)) that holds conformal prediction half-widths derived from an experiment's walk-forward OOS residuals. It is a first-class persistent artefact: it serialises to and deserialises from a long-format parquet file via save and load methods. It exposes predict_interval so forecast and delivery consumers can apply calibrated intervals without re-running the POSTPROCESS stage.

Kind: Frozen dataclass. Experiment-scoped artefact (one per (residual_mode, run_dir) pair). Written once during POSTPROCESS; read by FORECAST, DELIVER, and diagnostics.

Source of truth: market_insights_models/src/commodity_hindcast/models/meta_models/conformalise.py:106.

Correction to AGGREGATES.md¶

domain_model/AGGREGATES.md:214 states:

"CalibrationResult — not a persisted class in the current codebase; conformal half-widths are computed by primary_calibration() and written into postprocessed/national.parquet and sidecar parquets under run_dir/conformal/. It is a transient value returned from stages/run_meta_models.py."

This is incorrect. CalibrationResult has first-class persistence:

save method at conformalise.py:215 — writes a long-format parquet via write_dataframe.
load classmethod at conformalise.py:225 — reconstructs the full object from disk.

The parquet sidecars at run_dir/conformal/{mode}.parquet ARE CalibrationResult serialised form. The P7 lint pass will fix AGGREGATES.md.

Key attributes¶

Field	Type	Notes
`residual_mode`	`ResidualMode`	Which recipe produced this calibration (see Four Modes below)
`method`	`ConformalMethod`	`"quantile"` or `"split_conformal"`
`experiment_key`	`str`	Links back to the experiment (e.g. `"corn_usa"`)
`levels`	`tuple[float, ...]`	Confidence levels fitted (e.g. `(0.5, 0.68, 0.8, 0.9, 0.95)`)
`n_residuals`	`int`	Total residuals consumed
`per_init_md`	`dict[InitMD, dict[float, float]] \\| None`	Half-widths keyed by `"MM-DD"`; set in `hindcast_oos_per_init_date` mode only
`per_year`	`dict[int, dict[float, float]] \\| None`	Half-widths keyed by test year (int); set in `hindcast_oos_per_year` mode only
`pooled`	`dict[float, float] \\| None`	Single half-width set; set in the two pooled modes
`per_year_fallback`	`PerYearFallback`	`"mean"` or `"max"` — reducer for unknown years in `per_year` mode; persisted in all modes for round-trip consistency

Exactly one of per_init_md, per_year, pooled is non-None.

Four residual modes (introduced in PR #361)¶

ResidualMode is a Literal defined in models/meta_models/types.py:16. Each mode answers a different question about which residuals to pool:

`residual_mode`	Data pooled	Lookup key	Output field
`hindcast_oos_per_init_date`	OOS residuals grouped by `"MM-DD"` init-date label	`init_md`	`per_init_md`
`hindcast_oos_per_year`	Walk-forward bootstrap: fold k calibrated from folds < k	`fold_year`	`per_year`
`hindcast_oos_fully_pooled`	All `(year, init_date)` OOS residuals into one bag	ignored	`pooled`
`in_sample_pooled`	Production fold `train_preds.parquet` residuals	ignored	`pooled` (biased narrow — fallback only)

hindcast_oos_per_init_date is the default (listed first in postprocess.conformalise for all production commodity configs). It is the most calibration-honest mode because it captures the season-of-year uncertainty gradient: early-season forecasts carry wider intervals than late-season.

in_sample_pooled issues a logger.warning explicitly labelling the resulting intervals as biased narrow (conformalise.py:546).

Persistence — save and load¶

`save` (`conformalise.py:215`)¶

def save(self, path: Path | AnyPath | str) -> None:
    target = AnyPath(path)
    target.parent.mkdir(parents=True, exist_ok=True)
    write_dataframe(self.to_frame(), str(target))

Serialises to long-format parquet via to_frame() (conformalise.py:136). One row per (key, level) pair. Categorical dtype columns for residual_mode, method, experiment_key, fold_init_md, per_year_fallback. fold_year uses nullable Int32. Accepts local paths and S3 URIs via AnyPath. Raises ValueError if serialisation would produce zero rows (diagnostic message included).

`load` (`conformalise.py:225`)¶

@classmethod
def load(cls, path: Path | AnyPath | str) -> CalibrationResult:
    df = read_dataframe(str(AnyPath(path)))
    # reconstructs per_init_md / per_year / pooled by branching on residual_mode
    ...
    return cls(...)

Reads the parquet, reconstructs the appropriate dict field by branching on residual_mode, reconstitutes levels from the unique level column, then calls cls(...). Full round-trip consistency is preserved.

On-disk path¶

{run_dir}/conformal/{residual_mode}.parquet

e.g. runs/20260505_143022_corn_usa/conformal/hindcast_oos_per_init_date.parquet. Path constructed by calibration_path(run_dir, residual_mode) in run_meta_models.py:60. The commodity_ prefix that preceded this layout was removed in PR #361.

`predict_interval` (`conformalise.py:360`)¶

def predict_interval(
    self,
    sim_yield: float | np.ndarray,
    *,
    fold_year: int | None = None,
    init_md: InitMD | None = None,
) -> dict[float, tuple[np.ndarray, np.ndarray]]:

Returns {level: (lower_array, upper_array)}. Lookup dispatches on which field is non-None:

per_init_md — circular calendar interpolation by day-of-year; exact hit short-circuits; non-exact emits a logger.warning (conformalise.py:279).
per_year — exact-hit or reduce via per_year_fallback (nanmean / nanmax) across all calibrated years on a miss (conformalise.py:319).
pooled — broadcasts single half-width set; ignores both kwargs (conformalise.py:395).

POSTPROCESS orchestration¶

stages/run_meta_models.py exposes helpers that manage CalibrationResult lifecycle:

Function	Line	Purpose
`calibration_path(run_dir, mode)`	`run_meta_models.py:60`	Canonical sidecar path
`fit_and_save_calibration(result, ci_levels, mode)`	`run_meta_models.py:70`	Fit one mode via `apply_conformal`; call `cal.save(path)`; return
`fit_and_save_all_configured(result, ci_levels)`	`run_meta_models.py:85`	Loop over all modes in `config.postprocess.conformalise`; first = primary
`get_or_fit_calibration(result, ci_levels, mode)`	`run_meta_models.py:100`	Load sidecar if present; fit+save on demand if absent
`primary_calibration(result, ci_levels)`	`run_meta_models.py:119`	Returns calibration for `config.forecast.residual_mode`

Separation of axes: postprocess.conformalise (list) controls which sidecars are written during hindcast. forecast.residual_mode (single, mandatory since PR #372) controls which mode is applied at forecast runtime. A forecast may request a mode absent from postprocess.conformalise — it is fitted on demand by get_or_fit_calibration.

Lifecycle¶

Created: By apply_conformal(experiment, ci_levels, residual_mode=...) during POSTPROCESS. One CalibrationResult per configured mode. Immediately saved to run_dir/conformal/{mode}.parquet by fit_and_save_calibration.

Consumed: - FORECAST: primary_calibration(result, ci_levels) → cal.predict_interval(sim, init_md=...) → populates CI columns in postprocessed/national.parquet. - DELIVER: get_or_fit_calibration(...) → same call chain. - DIAGNOSTICS: loaded directly for mode comparisons. - Dashboard: loaded on demand by the Streamlit app.

Destroyed: Never; sidecars accumulate and are re-used across forecast invocations.

Relationships¶

Relationship	Entity	Notes
Scoped to	`RunDir`	One per `(residual_mode, run_dir)`; path encodes both
Produced by	`ExperimentResult`	Via `apply_conformal(experiment, ...)`
Consumed by	`ForecastSlice`	`primary_calibration` provides the slice's CI half-widths
Consumed by	`HindcastDelivery`	`build_rows` applies intervals to national frame rows
Configured by	`PostprocessConfig.conformalise`	List of modes to materialise
Mode selected by	`ForecastConfig.residual_mode`	Mandatory field since PR #372

Concepts and pipelines (forward refs to P5)¶

Concept: Conformal calibration — the statistical framework and four mode recipes
Concept: Residual mode selection — postprocess.conformalise vs forecast.residual_mode separation of axes
Pipeline: Hindcast pipeline — POSTPROCESS stage produces all configured sidecars
Pipeline: Forecast pipeline — primary_calibration is applied during _postprocess_forecast

PRs and commits¶

PR / commit	Relevance
PR-361	Introduced `CalibrationResult` as a frozen, persistable dataclass; added `save`/`load` methods; defined four `residual_mode` values; added mode-keyed sidecars at `run_dir/conformal/{mode}.parquet`; dropped `commodity_` prefix from sidecar filenames
PR-372	Made `forecast.residual_mode` mandatory on `ForecastConfig`; added defensive `ValueError` guard in `CalibrationResult.to_frame()` for empty-row serialisation; extracted `ResidualMode` to `types.py` to avoid circular import

Open questions¶

AGGREGATES.md:214 erroneously describes CalibrationResult as transient. The P7 lint pass will correct this claim.
per_year mode has NaN half-widths for the earliest CV fold (fold k=0 has no prior residuals). Consumers that query the earliest fold year receive NaN bounds — the strict=False API returns (nan, nan) silently. Whether this should be surfaced as a warning in delivery is undecided.
MapieConformalRegressor in the same module provides a feature-conditioned calibration pathway; it is unused in the main hindcast/forecast pipeline and its relationship to CalibrationResult is not documented.