CalibrationResult¶
Definition¶
CalibrationResult is a frozen, persistable dataclass (@dataclass(frozen=True)) that holds conformal prediction half-widths derived from an experiment's walk-forward OOS residuals. It is a first-class persistent artefact: it serialises to and deserialises from a long-format parquet file via save and load methods. It exposes predict_interval so forecast and delivery consumers can apply calibrated intervals without re-running the POSTPROCESS stage.
Kind: Frozen dataclass. Experiment-scoped artefact (one per (residual_mode, run_dir) pair). Written once during POSTPROCESS; read by FORECAST, DELIVER, and diagnostics.
Source of truth: market_insights_models/src/commodity_hindcast/models/meta_models/conformalise.py:106.
Correction to AGGREGATES.md¶
domain_model/AGGREGATES.md:214 states:
"CalibrationResult — not a persisted class in the current codebase; conformal half-widths are computed by
primary_calibration()and written intopostprocessed/national.parquetand sidecar parquets underrun_dir/conformal/. It is a transient value returned fromstages/run_meta_models.py."
This is incorrect. CalibrationResult has first-class persistence:
savemethod atconformalise.py:215— writes a long-format parquet viawrite_dataframe.loadclassmethod atconformalise.py:225— reconstructs the full object from disk.
The parquet sidecars at run_dir/conformal/{mode}.parquet ARE CalibrationResult serialised form. The P7 lint pass will fix AGGREGATES.md.
Key attributes¶
| Field | Type | Notes |
|---|---|---|
residual_mode |
ResidualMode |
Which recipe produced this calibration (see Four Modes below) |
method |
ConformalMethod |
"quantile" or "split_conformal" |
experiment_key |
str |
Links back to the experiment (e.g. "corn_usa") |
levels |
tuple[float, ...] |
Confidence levels fitted (e.g. (0.5, 0.68, 0.8, 0.9, 0.95)) |
n_residuals |
int |
Total residuals consumed |
per_init_md |
dict[InitMD, dict[float, float]] \| None |
Half-widths keyed by "MM-DD"; set in hindcast_oos_per_init_date mode only |
per_year |
dict[int, dict[float, float]] \| None |
Half-widths keyed by test year (int); set in hindcast_oos_per_year mode only |
pooled |
dict[float, float] \| None |
Single half-width set; set in the two pooled modes |
per_year_fallback |
PerYearFallback |
"mean" or "max" — reducer for unknown years in per_year mode; persisted in all modes for round-trip consistency |
Exactly one of per_init_md, per_year, pooled is non-None.
Four residual modes (introduced in PR #361)¶
ResidualMode is a Literal defined in models/meta_models/types.py:16. Each mode answers a different question about which residuals to pool:
residual_mode |
Data pooled | Lookup key | Output field |
|---|---|---|---|
hindcast_oos_per_init_date |
OOS residuals grouped by "MM-DD" init-date label |
init_md |
per_init_md |
hindcast_oos_per_year |
Walk-forward bootstrap: fold k calibrated from folds < k | fold_year |
per_year |
hindcast_oos_fully_pooled |
All (year, init_date) OOS residuals into one bag |
ignored | pooled |
in_sample_pooled |
Production fold train_preds.parquet residuals |
ignored | pooled (biased narrow — fallback only) |
hindcast_oos_per_init_date is the default (listed first in postprocess.conformalise for all production commodity configs). It is the most calibration-honest mode because it captures the season-of-year uncertainty gradient: early-season forecasts carry wider intervals than late-season.
in_sample_pooled issues a logger.warning explicitly labelling the resulting intervals as biased narrow (conformalise.py:546).
Persistence — save and load¶
save (conformalise.py:215)¶
def save(self, path: Path | AnyPath | str) -> None:
target = AnyPath(path)
target.parent.mkdir(parents=True, exist_ok=True)
write_dataframe(self.to_frame(), str(target))
Serialises to long-format parquet via to_frame() (conformalise.py:136). One row per (key, level) pair. Categorical dtype columns for residual_mode, method, experiment_key, fold_init_md, per_year_fallback. fold_year uses nullable Int32. Accepts local paths and S3 URIs via AnyPath. Raises ValueError if serialisation would produce zero rows (diagnostic message included).
load (conformalise.py:225)¶
@classmethod
def load(cls, path: Path | AnyPath | str) -> CalibrationResult:
df = read_dataframe(str(AnyPath(path)))
# reconstructs per_init_md / per_year / pooled by branching on residual_mode
...
return cls(...)
Reads the parquet, reconstructs the appropriate dict field by branching on residual_mode, reconstitutes levels from the unique level column, then calls cls(...). Full round-trip consistency is preserved.
On-disk path¶
e.g. runs/20260505_143022_corn_usa/conformal/hindcast_oos_per_init_date.parquet. Path constructed by calibration_path(run_dir, residual_mode) in run_meta_models.py:60. The commodity_ prefix that preceded this layout was removed in PR #361.
predict_interval (conformalise.py:360)¶
def predict_interval(
self,
sim_yield: float | np.ndarray,
*,
fold_year: int | None = None,
init_md: InitMD | None = None,
) -> dict[float, tuple[np.ndarray, np.ndarray]]:
Returns {level: (lower_array, upper_array)}. Lookup dispatches on which field is non-None:
per_init_md— circular calendar interpolation by day-of-year; exact hit short-circuits; non-exact emits alogger.warning(conformalise.py:279).per_year— exact-hit or reduce viaper_year_fallback(nanmean/nanmax) across all calibrated years on a miss (conformalise.py:319).pooled— broadcasts single half-width set; ignores both kwargs (conformalise.py:395).
POSTPROCESS orchestration¶
stages/run_meta_models.py exposes helpers that manage CalibrationResult lifecycle:
| Function | Line | Purpose |
|---|---|---|
calibration_path(run_dir, mode) |
run_meta_models.py:60 |
Canonical sidecar path |
fit_and_save_calibration(result, ci_levels, mode) |
run_meta_models.py:70 |
Fit one mode via apply_conformal; call cal.save(path); return |
fit_and_save_all_configured(result, ci_levels) |
run_meta_models.py:85 |
Loop over all modes in config.postprocess.conformalise; first = primary |
get_or_fit_calibration(result, ci_levels, mode) |
run_meta_models.py:100 |
Load sidecar if present; fit+save on demand if absent |
primary_calibration(result, ci_levels) |
run_meta_models.py:119 |
Returns calibration for config.forecast.residual_mode |
Separation of axes: postprocess.conformalise (list) controls which sidecars are written during hindcast. forecast.residual_mode (single, mandatory since PR #372) controls which mode is applied at forecast runtime. A forecast may request a mode absent from postprocess.conformalise — it is fitted on demand by get_or_fit_calibration.
Lifecycle¶
Created: By apply_conformal(experiment, ci_levels, residual_mode=...) during POSTPROCESS. One CalibrationResult per configured mode. Immediately saved to run_dir/conformal/{mode}.parquet by fit_and_save_calibration.
Consumed:
- FORECAST: primary_calibration(result, ci_levels) → cal.predict_interval(sim, init_md=...) → populates CI columns in postprocessed/national.parquet.
- DELIVER: get_or_fit_calibration(...) → same call chain.
- DIAGNOSTICS: loaded directly for mode comparisons.
- Dashboard: loaded on demand by the Streamlit app.
Destroyed: Never; sidecars accumulate and are re-used across forecast invocations.
Relationships¶
| Relationship | Entity | Notes |
|---|---|---|
| Scoped to | RunDir |
One per (residual_mode, run_dir); path encodes both |
| Produced by | ExperimentResult |
Via apply_conformal(experiment, ...) |
| Consumed by | ForecastSlice |
primary_calibration provides the slice's CI half-widths |
| Consumed by | HindcastDelivery |
build_rows applies intervals to national frame rows |
| Configured by | PostprocessConfig.conformalise |
List of modes to materialise |
| Mode selected by | ForecastConfig.residual_mode |
Mandatory field since PR #372 |
Concepts and pipelines (forward refs to P5)¶
- Concept: Conformal calibration — the statistical framework and four mode recipes
- Concept: Residual mode selection —
postprocess.conformalisevsforecast.residual_modeseparation of axes - Pipeline: Hindcast pipeline — POSTPROCESS stage produces all configured sidecars
- Pipeline: Forecast pipeline —
primary_calibrationis applied during_postprocess_forecast
PRs and commits¶
| PR / commit | Relevance |
|---|---|
| PR-361 | Introduced CalibrationResult as a frozen, persistable dataclass; added save/load methods; defined four residual_mode values; added mode-keyed sidecars at run_dir/conformal/{mode}.parquet; dropped commodity_ prefix from sidecar filenames |
| PR-372 | Made forecast.residual_mode mandatory on ForecastConfig; added defensive ValueError guard in CalibrationResult.to_frame() for empty-row serialisation; extracted ResidualMode to types.py to avoid circular import |
Open questions¶
AGGREGATES.md:214erroneously describesCalibrationResultas transient. The P7 lint pass will correct this claim.per_yearmode hasNaNhalf-widths for the earliest CV fold (fold k=0 has no prior residuals). Consumers that query the earliest fold year receiveNaNbounds — thestrict=FalseAPI returns(nan, nan)silently. Whether this should be surfaced as a warning in delivery is undecided.MapieConformalRegressorin the same module provides a feature-conditioned calibration pathway; it is unused in the main hindcast/forecast pipeline and its relationship toCalibrationResultis not documented.