Skip to content

Entity: PostprocessConfig

Definition

PostprocessConfig is the frozen Pydantic model that configures the POSTPROCESS pipeline stage. It holds two concerns: (1) the bias corrector configuration (BiasCorrectorConfig), and (2) an ordered tuple of conformal calibration mode strings (conformalise). The first element of conformalise is the primary mode whose half-widths populate delivery CSVs; subsequent elements are diagnostic sidecars persisted as separate parquets under run_dir/conformal/{mode}.parquet.

Kind

Pydantic BaseModel, frozen=True. Nested inside ExperimentConfig.

Source of truth

market_insights_models/src/commodity_hindcast/config.py:545

Key attributes

Field Type Default Meaning YAML example
bias_corrector BiasCorrectorConfig required Selects and parameterises the national-scale residual corrector. See BiasCorrectorConfig {kind: none}
conformalise tuple[Literal["hindcast_oos_per_init_date","hindcast_oos_per_year","hindcast_oos_fully_pooled","in_sample_pooled"], ...] ("hindcast_oos_per_init_date",) Ordered tuple of conformal calibration modes to fit and persist. Must be non-empty and contain no duplicates see below

conformalise field detail

The four valid mode strings, their data inputs, and output shapes:

Mode string Residuals used Output shape
"hindcast_oos_per_init_date" Walk-forward OOS residuals, pooled by init_md (MM-DD calendar day) One CI half-width per init_date pattern
"hindcast_oos_per_year" Walk-forward OOS residuals, pooled by fold year One CI half-width per fold year (walk-forward bootstrap; first fold has NaN)
"hindcast_oos_fully_pooled" All walk-forward OOS (year, init_date) residuals in one pool Single broadcast half-width
"in_sample_pooled" Production fold train_preds.parquet residuals Single broadcast half-width (requires only make fit-production; produces narrow intervals — model has seen these rows)

The conformalise literal values match ResidualMode defined at models/meta_models/types.py:16.

YAML example:

postprocess:
  bias_corrector:
    kind: none
  conformalise:
    - hindcast_oos_per_init_date      # primary — populates delivery CSVs
    - hindcast_oos_fully_pooled       # sidecar
    - hindcast_oos_per_year           # sidecar

Validator

_validate_conformalise_non_empty (config.py:569, mode=after): - Raises ValueError when conformalise is empty ("postprocess.conformalise must contain at least one mode"). - Raises ValueError when conformalise contains duplicates.

Why ConformalConfig does not exist

There is no separate ConformalConfig class anywhere in the codebase. The orchestrator's seed vocabulary mentioned "ConformalConfig" but it has no code backing. The conformalise field is a plain tuple[Literal[...], ...] directly on PostprocessConfig — no wrapper class wraps it. The first element of the tuple is the primary mode by convention, not by a dedicated field.

This design was intentional (PR #361): the calibration mode is a simple tuple of strings whose semantics are fully defined by the ResidualMode literal type in models/meta_models/types.py. Adding a dedicated class would add indirection without benefit. See PR-361.md for the full calibration architecture.

Lifecycle

  1. Constructed as part of ExperimentConfig; validated atomically with the rest of the config tree.
  2. During POSTPROCESS (stages/run_meta_models.py): a. Bias corrector is built from postprocess.bias_corrector, fitted on per-fold residuals, and saved as bias_corrector.pkl. b. For each mode in postprocess.conformalise, fit_and_save_calibration(result, ci_levels, mode) writes run_dir/conformal/{mode}.parquet.
  3. primary_calibration(result, ci_levels) reads the first-listed mode's parquet to populate postprocessed/national.parquet and delivery CSVs.
  4. Forecast and plot consumers call get_or_fit_calibration(result, ci_levels, mode) to load any sidecar on demand.

Relationships

  • Owned by ExperimentConfig.postprocess (1:1).
  • Contains BiasCorrectorConfig (1:1).
  • Drives CalibrationResult persistence: one parquet sidecar per mode under run_dir/conformal/.
  • Consumed by stages/run_meta_models.py and stages/run_hindcast.py for postprocess orchestration.
  • Drives ForecastConfig.residual_mode selection at forecast time (which sidecar to apply for interval calibration). Note: residual_mode on ForecastConfig is independent of this tuple — it declares which sidecar to read at forecast time, while conformalise declares which sidecars to write during postprocess.

Concepts and pipelines that touch this entity

PRs and commits

  • PR #361 (PR-361.md) — conformalise tuple added to PostprocessConfig; four mode strings introduced; CalibrationResult class and mode-keyed sidecars introduced. Before this PR the calibration was an inline step with no persistent artefact.
  • PR #372 (PR-372.md) — ForecastConfig.residual_mode made mandatory, decoupling "which sidecars to fit" (this config) from "which sidecar to apply at forecast time" (ForecastConfig).

Open questions

  • No validator checks that ForecastConfig.residual_mode (when set) is present in PostprocessConfig.conformalise. The validate_residual_mode gate in stages/run_forecast.py checks this at runtime, not at config-load time — so a mismatch is caught only when the forecast stage begins.