Skip to content

MetaModel

There is no single MetaModel ABC in the code. This page documents two related role hierarchies that together constitute the post-processing layer between raw fold predictions and the delivery boundary:

  1. BiasCorrector — shifts national yield estimates to account for counties excluded from the model universe.
  2. Conformaliser — attaches honest prediction intervals derived from walk-forward OOS residuals.

Both are fit during the POSTPROCESS stage (stages/run_meta_models.py) and persisted to disk so forecast, delivery, and diagnostic consumers can load without recomputing.

BiasCorrector hierarchy

Definition

AbstractBiasCorrector (models/meta_models/bias_correction.py:35) is the ABC for national-grain bias correction. A corrector is fitted once per fold from the NASS county panel and the model's included_geo_identifiers; it exposes a single scalar bias_kg_ha property that all downstream callers subtract from simulated yield.

Kind

ABC (AbstractBiasCorrector at models/meta_models/bias_correction.py:35).

Source of truth

market_insights_models/src/commodity_hindcast/models/meta_models/bias_correction.py:35

Required interface

Method / property Signature Notes
fit (nass_panel, included_geo_identifiers, test_year) → None Must set internal state.
bias_kg_ha @property → float Raises RuntimeError if fit not called (in CoverageBiasCorrector).
apply_national (sim_kg_ha: float) → float Returns sim - bias_kg_ha. bias_correction.py:57
apply_frame (df, *, sim_col="sim_yield_kg_ha") → DataFrame Copies df, subtracts bias_kg_ha from sim_col. bias_correction.py:62
save (path: Path) → None Pickles self; creates parent dirs. bias_correction.py:76
load classmethod (path: Path) → AbstractBiasCorrector Fails fast with FileNotFoundError if missing. bias_correction.py:83

Concrete implementations

Class Config kind File Formula When to use
NoBiasCorrector none bias_correction.py bias_kg_ha = 0.0; fit is no-op. When bias correction is disabled or as a baseline experiment.
CoverageBiasCorrector coverage bias_correction.py:107 bias_y = (area_out / area_total) × (yld_in − yld_out); reduced across n_lookback_years by reduction_method. When the model's county universe omits a material fraction of production and that gap creates a predictable national-level offset.

CoverageBiasCorrector parameters: n_lookback_years: int, reduction_method: ReductionMethod (only "median" is supported). Unfitted access of bias_kg_ha raises RuntimeError.

Factory: build_bias_corrector(config: BiasCorrectorConfig) at bias_correction.py:195. Reads config.postprocess.bias_corrector.kind.

BiasCorrector lifecycle

Instantiation: build_bias_corrector(config) during POSTPROCESS for each fold.

Fit: Called once per fold with the full NASS county panel and the fold's included_geo_identifiers.

Persist: Saved as bias_corrector.pkl at postprocessed/{commodity}/{fold_label}/bias_corrector.pkl. HindcastSlice.has_bias_corrector guards access.

Apply: residuals.build_rows (meta_models/residuals.py:22) calls bias_corrector.apply_national(sim_raw) before constructing interval columns.

Tear-down: None. Stateless after fitting; state lives in the pkl.

Conformal calibration

Definition

The conformal layer derives honest prediction intervals from walk-forward OOS residuals collected across CV folds. The fitted artefact is CalibrationResult — a frozen dataclass. There is no ABC for this layer; apply_conformal (conformalise.py:584) is the polymorphic entry point.

A separate class MapieConformalRegressor (conformalise.py:645) provides a feature-conditioned MAPIE-backed pathway that is not used in the main pipeline.

Source of truth

market_insights_models/src/commodity_hindcast/models/meta_models/conformalise.py:111CalibrationResult frozen dataclass.

CalibrationResult fields

Field Type Purpose
residual_mode ResidualMode Which recipe produced this calibration (conformalise.py:64).
method ConformalMethod "quantile" or "split_conformal".
experiment_key str Links back to the experiment.
levels tuple[float, ...] Confidence levels fitted (e.g. (0.5, 0.8, 0.9, 0.95)).
n_residuals int Total residuals consumed.
per_init_md dict[InitMD, dict[float, float]] \| None Half-widths keyed by "MM-DD". Set only in hindcast_oos_per_init_date mode.
per_year dict[int, dict[float, float]] \| None Half-widths keyed by test year. Set only in hindcast_oos_per_year mode.
pooled dict[float, float] \| None Single half-width set. Set by the two pooled modes.
per_year_fallback PerYearFallback "mean" or "max" — how to reduce across years on a miss in per_year mode.

Exactly one of per_init_md / per_year / pooled is non-None.

Four residual modes

Controlled by ResidualMode (models/meta_models/types.py:16); dispatched via _MODE_DISPATCH (conformalise.py:571).

Mode per_init_md/per_year/pooled Description
hindcast_oos_per_init_date (default) per_init_md One calibration per "MM-DD" init-date label; pools walk-forward residuals across CV years. Honest about the within-season uncertainty gradient.
hindcast_oos_per_year per_year Walk-forward bootstrap: fold k is calibrated from end-of-season residuals of folds with year < k. Reflects accumulating evidence.
hindcast_oos_fully_pooled pooled Every (year, init_date) walk-forward residual pooled into one calibration. Statistically loose baseline.
in_sample_pooled pooled Fallback when no CV folds exist; pools production fold train residuals. Explicitly labelled as biased narrow in the logged warning (conformalise.py:546).

predict_interval dispatch

CalibrationResult.predict_interval(sim_yield, *, fold_year, init_md) (conformalise.py:360) returns {level: (lower, upper)}. Half-width lookup branches on the populated field:

  • per_init_md — circular calendar interpolation by day-of-year; exact hits short-circuit.
  • per_year — exact-hit-or-reduce; on a miss collapses via per_year_fallback (nanmean or nanmax). Linear interpolation is intentionally absent.
  • pooled — broadcasts the single half-width set; ignores both keyword args.

Persistence

CalibrationResult.save / load (conformalise.py:215, conformalise.py:224) use a long-format parquet representation. Disk path: {run_dir}/conformal/{residual_mode}.parquet. Both methods accept local paths and S3 URIs via cloudpathlib.AnyPath. to_frame() raises ValueError if serialisation would produce zero rows.

MapieConformalRegressor

Sklearn-style wrapper around mapie.regression.SplitConformalRegressor (conformalise.py:645) for per-county conformal modelling where feature matrices (X, y) are in scope. Exposes fit, conformalize, predict, and predict_interval. Not used in the main hindcast/forecast pipeline — the residuals-only apply_conformal + CalibrationResult path is the active one.

Conformal lifecycle

Instantiation: apply_conformal(experiment, levels, *, residual_mode=..., ...) at conformalise.py:584 (experiment-level overload).

Fit: fit_and_save_all_configured(result, ci_levels) at run_meta_models.py:85 iterates over every mode in config.postprocess.conformalise and fits + saves each. Run before the fold loop so all calibrations are complete before build_rows is called.

Load-or-fit: get_or_fit_calibration(result, ci_levels, residual_mode) at run_meta_models.py:100 is used by forecast/delivery consumers. A forecast can request a mode not in postprocess.conformalise — it is fitted on demand.

Apply: residuals.build_rows calls calibration.predict_interval(sim_post_bias, fold_year=..., init_md=...) to emit lower_{pct} / upper_{pct} columns.

Tear-down: None.

Relationships

  • Regressor — raw fold predictions flow from runtime.predict into postprocess_experiment; the BiasCorrector and CalibrationResult both consume those predictions.
  • HindcastSlicehas_bias_corrector guards bias_corrector_path; calibration_path is separate.
  • ExperimentConfig — carries postprocess.bias_corrector (BiasCorrectorConfig) and postprocess.conformalise (tuple of modes).
  • DeliveryRowlower_* / upper_* interval columns are populated by build_rows and flow directly into the delivery CSV.
  • lib/reference_data/nass.pyload_nass_county_panel_yield_area is called by CoverageBiasCorrector.fit.

Concepts and pipelines

  • source_meta_models — detailed module-level breakdown.
  • Conformal prediction — split-conformal and quantile regression methods for honest uncertainty quantification.
  • Walk-forward cross-validation — CalibrationResult pools residuals across CV folds; it is experiment-scoped, not fold-scoped.

PRs and commits

  • PR #361 introduced the four residual_mode variants (hindcast_oos_per_init_date, hindcast_oos_per_year, hindcast_oos_fully_pooled, in_sample_pooled).

Open questions

  • postprocess.conformalise and forecast.residual_mode are decoupled: a forecast can request any mode, triggering on-demand fitting. This is convenient but means a forecast run can silently compute a calibration that was not validated at hindcast time.
  • in_sample_pooled is explicitly labelled biased narrow; there is no run-time guard preventing its use in production delivery.
  • MapieConformalRegressor is present in the codebase but unused in the main pipeline; it is unclear whether it will be integrated or removed.
  • CalibrationResult is experiment-scoped (pooled across all folds), while BiasCorrector is fold-scoped. This asymmetry means the two corrections capture different sources of variance and are not directly comparable.