Skip to content

CalibrationResult

Definition

CalibrationResult is a frozen, persistable dataclass (@dataclass(frozen=True)) that holds conformal prediction half-widths derived from an experiment's walk-forward OOS residuals. It is a first-class persistent artefact: it serialises to and deserialises from a long-format parquet file via save and load methods. It exposes predict_interval so forecast and delivery consumers can apply calibrated intervals without re-running the POSTPROCESS stage.

Kind: Frozen dataclass. Experiment-scoped artefact (one per (residual_mode, run_dir) pair). Written once during POSTPROCESS; read by FORECAST, DELIVER, and diagnostics.

Source of truth: market_insights_models/src/commodity_hindcast/models/meta_models/conformalise.py:106.

Correction to AGGREGATES.md

domain_model/AGGREGATES.md:214 states:

"CalibrationResult — not a persisted class in the current codebase; conformal half-widths are computed by primary_calibration() and written into postprocessed/national.parquet and sidecar parquets under run_dir/conformal/. It is a transient value returned from stages/run_meta_models.py."

This is incorrect. CalibrationResult has first-class persistence:

  • save method at conformalise.py:215 — writes a long-format parquet via write_dataframe.
  • load classmethod at conformalise.py:225 — reconstructs the full object from disk.

The parquet sidecars at run_dir/conformal/{mode}.parquet ARE CalibrationResult serialised form. The P7 lint pass will fix AGGREGATES.md.

Key attributes

Field Type Notes
residual_mode ResidualMode Which recipe produced this calibration (see Four Modes below)
method ConformalMethod "quantile" or "split_conformal"
experiment_key str Links back to the experiment (e.g. "corn_usa")
levels tuple[float, ...] Confidence levels fitted (e.g. (0.5, 0.68, 0.8, 0.9, 0.95))
n_residuals int Total residuals consumed
per_init_md dict[InitMD, dict[float, float]] \| None Half-widths keyed by "MM-DD"; set in hindcast_oos_per_init_date mode only
per_year dict[int, dict[float, float]] \| None Half-widths keyed by test year (int); set in hindcast_oos_per_year mode only
pooled dict[float, float] \| None Single half-width set; set in the two pooled modes
per_year_fallback PerYearFallback "mean" or "max" — reducer for unknown years in per_year mode; persisted in all modes for round-trip consistency

Exactly one of per_init_md, per_year, pooled is non-None.

Four residual modes (introduced in PR #361)

ResidualMode is a Literal defined in models/meta_models/types.py:16. Each mode answers a different question about which residuals to pool:

residual_mode Data pooled Lookup key Output field
hindcast_oos_per_init_date OOS residuals grouped by "MM-DD" init-date label init_md per_init_md
hindcast_oos_per_year Walk-forward bootstrap: fold k calibrated from folds < k fold_year per_year
hindcast_oos_fully_pooled All (year, init_date) OOS residuals into one bag ignored pooled
in_sample_pooled Production fold train_preds.parquet residuals ignored pooled (biased narrow — fallback only)

hindcast_oos_per_init_date is the default (listed first in postprocess.conformalise for all production commodity configs). It is the most calibration-honest mode because it captures the season-of-year uncertainty gradient: early-season forecasts carry wider intervals than late-season.

in_sample_pooled issues a logger.warning explicitly labelling the resulting intervals as biased narrow (conformalise.py:546).

Persistence — save and load

save (conformalise.py:215)

def save(self, path: Path | AnyPath | str) -> None:
    target = AnyPath(path)
    target.parent.mkdir(parents=True, exist_ok=True)
    write_dataframe(self.to_frame(), str(target))

Serialises to long-format parquet via to_frame() (conformalise.py:136). One row per (key, level) pair. Categorical dtype columns for residual_mode, method, experiment_key, fold_init_md, per_year_fallback. fold_year uses nullable Int32. Accepts local paths and S3 URIs via AnyPath. Raises ValueError if serialisation would produce zero rows (diagnostic message included).

load (conformalise.py:225)

@classmethod
def load(cls, path: Path | AnyPath | str) -> CalibrationResult:
    df = read_dataframe(str(AnyPath(path)))
    # reconstructs per_init_md / per_year / pooled by branching on residual_mode
    ...
    return cls(...)

Reads the parquet, reconstructs the appropriate dict field by branching on residual_mode, reconstitutes levels from the unique level column, then calls cls(...). Full round-trip consistency is preserved.

On-disk path

{run_dir}/conformal/{residual_mode}.parquet

e.g. runs/20260505_143022_corn_usa/conformal/hindcast_oos_per_init_date.parquet. Path constructed by calibration_path(run_dir, residual_mode) in run_meta_models.py:60. The commodity_ prefix that preceded this layout was removed in PR #361.

predict_interval (conformalise.py:360)

def predict_interval(
    self,
    sim_yield: float | np.ndarray,
    *,
    fold_year: int | None = None,
    init_md: InitMD | None = None,
) -> dict[float, tuple[np.ndarray, np.ndarray]]:

Returns {level: (lower_array, upper_array)}. Lookup dispatches on which field is non-None:

  • per_init_md — circular calendar interpolation by day-of-year; exact hit short-circuits; non-exact emits a logger.warning (conformalise.py:279).
  • per_year — exact-hit or reduce via per_year_fallback (nanmean / nanmax) across all calibrated years on a miss (conformalise.py:319).
  • pooled — broadcasts single half-width set; ignores both kwargs (conformalise.py:395).

POSTPROCESS orchestration

stages/run_meta_models.py exposes helpers that manage CalibrationResult lifecycle:

Function Line Purpose
calibration_path(run_dir, mode) run_meta_models.py:60 Canonical sidecar path
fit_and_save_calibration(result, ci_levels, mode) run_meta_models.py:70 Fit one mode via apply_conformal; call cal.save(path); return
fit_and_save_all_configured(result, ci_levels) run_meta_models.py:85 Loop over all modes in config.postprocess.conformalise; first = primary
get_or_fit_calibration(result, ci_levels, mode) run_meta_models.py:100 Load sidecar if present; fit+save on demand if absent
primary_calibration(result, ci_levels) run_meta_models.py:119 Returns calibration for config.forecast.residual_mode

Separation of axes: postprocess.conformalise (list) controls which sidecars are written during hindcast. forecast.residual_mode (single, mandatory since PR #372) controls which mode is applied at forecast runtime. A forecast may request a mode absent from postprocess.conformalise — it is fitted on demand by get_or_fit_calibration.

Lifecycle

Created: By apply_conformal(experiment, ci_levels, residual_mode=...) during POSTPROCESS. One CalibrationResult per configured mode. Immediately saved to run_dir/conformal/{mode}.parquet by fit_and_save_calibration.

Consumed: - FORECAST: primary_calibration(result, ci_levels)cal.predict_interval(sim, init_md=...) → populates CI columns in postprocessed/national.parquet. - DELIVER: get_or_fit_calibration(...) → same call chain. - DIAGNOSTICS: loaded directly for mode comparisons. - Dashboard: loaded on demand by the Streamlit app.

Destroyed: Never; sidecars accumulate and are re-used across forecast invocations.

Relationships

Relationship Entity Notes
Scoped to RunDir One per (residual_mode, run_dir); path encodes both
Produced by ExperimentResult Via apply_conformal(experiment, ...)
Consumed by ForecastSlice primary_calibration provides the slice's CI half-widths
Consumed by HindcastDelivery build_rows applies intervals to national frame rows
Configured by PostprocessConfig.conformalise List of modes to materialise
Mode selected by ForecastConfig.residual_mode Mandatory field since PR #372

Concepts and pipelines (forward refs to P5)

PRs and commits

PR / commit Relevance
PR-361 Introduced CalibrationResult as a frozen, persistable dataclass; added save/load methods; defined four residual_mode values; added mode-keyed sidecars at run_dir/conformal/{mode}.parquet; dropped commodity_ prefix from sidecar filenames
PR-372 Made forecast.residual_mode mandatory on ForecastConfig; added defensive ValueError guard in CalibrationResult.to_frame() for empty-row serialisation; extracted ResidualMode to types.py to avoid circular import

Open questions

  • AGGREGATES.md:214 erroneously describes CalibrationResult as transient. The P7 lint pass will correct this claim.
  • per_year mode has NaN half-widths for the earliest CV fold (fold k=0 has no prior residuals). Consumers that query the earliest fold year receive NaN bounds — the strict=False API returns (nan, nan) silently. Whether this should be surfaced as a warning in delivery is undecided.
  • MapieConformalRegressor in the same module provides a feature-conditioned calibration pathway; it is unused in the main hindcast/forecast pipeline and its relationship to CalibrationResult is not documented.