MetaModel¶
There is no single MetaModel ABC in the code. This page documents two related role hierarchies that together constitute the post-processing layer between raw fold predictions and the delivery boundary:
- BiasCorrector — shifts national yield estimates to account for counties excluded from the model universe.
- Conformaliser — attaches honest prediction intervals derived from walk-forward OOS residuals.
Both are fit during the POSTPROCESS stage (stages/run_meta_models.py) and persisted to disk so forecast, delivery, and diagnostic consumers can load without recomputing.
BiasCorrector hierarchy¶
Definition¶
AbstractBiasCorrector (models/meta_models/bias_correction.py:35) is the ABC for national-grain bias correction. A corrector is fitted once per fold from the NASS county panel and the model's included_geo_identifiers; it exposes a single scalar bias_kg_ha property that all downstream callers subtract from simulated yield.
Kind¶
ABC (AbstractBiasCorrector at models/meta_models/bias_correction.py:35).
Source of truth¶
market_insights_models/src/commodity_hindcast/models/meta_models/bias_correction.py:35
Required interface¶
| Method / property | Signature | Notes |
|---|---|---|
fit |
(nass_panel, included_geo_identifiers, test_year) → None |
Must set internal state. |
bias_kg_ha |
@property → float |
Raises RuntimeError if fit not called (in CoverageBiasCorrector). |
apply_national |
(sim_kg_ha: float) → float |
Returns sim - bias_kg_ha. bias_correction.py:57 |
apply_frame |
(df, *, sim_col="sim_yield_kg_ha") → DataFrame |
Copies df, subtracts bias_kg_ha from sim_col. bias_correction.py:62 |
save |
(path: Path) → None |
Pickles self; creates parent dirs. bias_correction.py:76 |
load |
classmethod (path: Path) → AbstractBiasCorrector |
Fails fast with FileNotFoundError if missing. bias_correction.py:83 |
Concrete implementations¶
| Class | Config kind |
File | Formula | When to use |
|---|---|---|---|---|
NoBiasCorrector |
none |
bias_correction.py |
bias_kg_ha = 0.0; fit is no-op. |
When bias correction is disabled or as a baseline experiment. |
CoverageBiasCorrector |
coverage |
bias_correction.py:107 |
bias_y = (area_out / area_total) × (yld_in − yld_out); reduced across n_lookback_years by reduction_method. |
When the model's county universe omits a material fraction of production and that gap creates a predictable national-level offset. |
CoverageBiasCorrector parameters: n_lookback_years: int, reduction_method: ReductionMethod (only "median" is supported). Unfitted access of bias_kg_ha raises RuntimeError.
Factory: build_bias_corrector(config: BiasCorrectorConfig) at bias_correction.py:195. Reads config.postprocess.bias_corrector.kind.
BiasCorrector lifecycle¶
Instantiation: build_bias_corrector(config) during POSTPROCESS for each fold.
Fit: Called once per fold with the full NASS county panel and the fold's included_geo_identifiers.
Persist: Saved as bias_corrector.pkl at postprocessed/{commodity}/{fold_label}/bias_corrector.pkl. HindcastSlice.has_bias_corrector guards access.
Apply: residuals.build_rows (meta_models/residuals.py:22) calls bias_corrector.apply_national(sim_raw) before constructing interval columns.
Tear-down: None. Stateless after fitting; state lives in the pkl.
Conformal calibration¶
Definition¶
The conformal layer derives honest prediction intervals from walk-forward OOS residuals collected across CV folds. The fitted artefact is CalibrationResult — a frozen dataclass. There is no ABC for this layer; apply_conformal (conformalise.py:584) is the polymorphic entry point.
A separate class MapieConformalRegressor (conformalise.py:645) provides a feature-conditioned MAPIE-backed pathway that is not used in the main pipeline.
Source of truth¶
market_insights_models/src/commodity_hindcast/models/meta_models/conformalise.py:111 — CalibrationResult frozen dataclass.
CalibrationResult fields¶
| Field | Type | Purpose |
|---|---|---|
residual_mode |
ResidualMode |
Which recipe produced this calibration (conformalise.py:64). |
method |
ConformalMethod |
"quantile" or "split_conformal". |
experiment_key |
str |
Links back to the experiment. |
levels |
tuple[float, ...] |
Confidence levels fitted (e.g. (0.5, 0.8, 0.9, 0.95)). |
n_residuals |
int |
Total residuals consumed. |
per_init_md |
dict[InitMD, dict[float, float]] \| None |
Half-widths keyed by "MM-DD". Set only in hindcast_oos_per_init_date mode. |
per_year |
dict[int, dict[float, float]] \| None |
Half-widths keyed by test year. Set only in hindcast_oos_per_year mode. |
pooled |
dict[float, float] \| None |
Single half-width set. Set by the two pooled modes. |
per_year_fallback |
PerYearFallback |
"mean" or "max" — how to reduce across years on a miss in per_year mode. |
Exactly one of per_init_md / per_year / pooled is non-None.
Four residual modes¶
Controlled by ResidualMode (models/meta_models/types.py:16); dispatched via _MODE_DISPATCH (conformalise.py:571).
| Mode | per_init_md/per_year/pooled |
Description |
|---|---|---|
hindcast_oos_per_init_date (default) |
per_init_md |
One calibration per "MM-DD" init-date label; pools walk-forward residuals across CV years. Honest about the within-season uncertainty gradient. |
hindcast_oos_per_year |
per_year |
Walk-forward bootstrap: fold k is calibrated from end-of-season residuals of folds with year < k. Reflects accumulating evidence. |
hindcast_oos_fully_pooled |
pooled |
Every (year, init_date) walk-forward residual pooled into one calibration. Statistically loose baseline. |
in_sample_pooled |
pooled |
Fallback when no CV folds exist; pools production fold train residuals. Explicitly labelled as biased narrow in the logged warning (conformalise.py:546). |
predict_interval dispatch¶
CalibrationResult.predict_interval(sim_yield, *, fold_year, init_md) (conformalise.py:360) returns {level: (lower, upper)}. Half-width lookup branches on the populated field:
per_init_md— circular calendar interpolation by day-of-year; exact hits short-circuit.per_year— exact-hit-or-reduce; on a miss collapses viaper_year_fallback(nanmeanornanmax). Linear interpolation is intentionally absent.pooled— broadcasts the single half-width set; ignores both keyword args.
Persistence¶
CalibrationResult.save / load (conformalise.py:215, conformalise.py:224) use a long-format parquet representation. Disk path: {run_dir}/conformal/{residual_mode}.parquet. Both methods accept local paths and S3 URIs via cloudpathlib.AnyPath. to_frame() raises ValueError if serialisation would produce zero rows.
MapieConformalRegressor¶
Sklearn-style wrapper around mapie.regression.SplitConformalRegressor (conformalise.py:645) for per-county conformal modelling where feature matrices (X, y) are in scope. Exposes fit, conformalize, predict, and predict_interval. Not used in the main hindcast/forecast pipeline — the residuals-only apply_conformal + CalibrationResult path is the active one.
Conformal lifecycle¶
Instantiation: apply_conformal(experiment, levels, *, residual_mode=..., ...) at conformalise.py:584 (experiment-level overload).
Fit: fit_and_save_all_configured(result, ci_levels) at run_meta_models.py:85 iterates over every mode in config.postprocess.conformalise and fits + saves each. Run before the fold loop so all calibrations are complete before build_rows is called.
Load-or-fit: get_or_fit_calibration(result, ci_levels, residual_mode) at run_meta_models.py:100 is used by forecast/delivery consumers. A forecast can request a mode not in postprocess.conformalise — it is fitted on demand.
Apply: residuals.build_rows calls calibration.predict_interval(sim_post_bias, fold_year=..., init_md=...) to emit lower_{pct} / upper_{pct} columns.
Tear-down: None.
Relationships¶
- Regressor — raw fold predictions flow from
runtime.predictintopostprocess_experiment; the BiasCorrector and CalibrationResult both consume those predictions. - HindcastSlice —
has_bias_correctorguardsbias_corrector_path;calibration_pathis separate. ExperimentConfig— carriespostprocess.bias_corrector(BiasCorrectorConfig) andpostprocess.conformalise(tuple of modes).DeliveryRow—lower_*/upper_*interval columns are populated bybuild_rowsand flow directly into the delivery CSV.lib/reference_data/nass.py—load_nass_county_panel_yield_areais called byCoverageBiasCorrector.fit.
Concepts and pipelines¶
- source_meta_models — detailed module-level breakdown.
- Conformal prediction — split-conformal and quantile regression methods for honest uncertainty quantification.
- Walk-forward cross-validation —
CalibrationResultpools residuals across CV folds; it is experiment-scoped, not fold-scoped.
PRs and commits¶
- PR #361 introduced the four
residual_modevariants (hindcast_oos_per_init_date,hindcast_oos_per_year,hindcast_oos_fully_pooled,in_sample_pooled).
Open questions¶
postprocess.conformaliseandforecast.residual_modeare decoupled: a forecast can request any mode, triggering on-demand fitting. This is convenient but means a forecast run can silently compute a calibration that was not validated at hindcast time.in_sample_pooledis explicitly labelled biased narrow; there is no run-time guard preventing its use in production delivery.MapieConformalRegressoris present in the codebase but unused in the main pipeline; it is unclear whether it will be integrated or removed.CalibrationResultis experiment-scoped (pooled across all folds), whileBiasCorrectoris fold-scoped. This asymmetry means the two corrections capture different sources of variance and are not directly comparable.