Skip to content

ForecastSlice

Definition

ForecastSlice is the per-(season_year, init_date) artefact handle for the commodity_hindcast forecast pipeline. It is a frozen dataclass that exposes all paths and loaders for one in-season forecast. It satisfies the AbstractSlice protocol alongside HindcastSlice. Its distinguishing design principle is artefact isolation: all forecast-specific files land under run_dir/forecast/{season_year}/{init_date}/ and never touch canonical hindcast paths. Trained artefacts (model, detrender, fill values) are delegated to the production HindcastSlice via the training property.

Kind: Frozen dataclass (@dataclass(frozen=True, kw_only=True)). Aggregate root within ExperimentResult; not independently addressable without its run_dir.

Source of truth: market_insights_models/src/commodity_hindcast/lib/results/results_slice.py:301.

Path layout (introduced in PR #369)

Before PR #369, forecast artefacts lived at run_dir/forecast/{init_date}/. Multiple season_year values at the same init_date overwrote each other. PR #369 restructured the layout to:

run_dir/forecast/{season_year}/{init_date}/
├── indices.zarr                          ← spliced obs+climo daily indices
├── features/
│   └── pred.parquet                      ← per-init feature matrix (never touches canonical hindcast copy)
├── preds/
│   ├── walk_forward_preds.parquet        ← county-level rolling predictions
│   └── year_data.parquet                 ← raw prediction slice for diagnostics
├── postprocessed/
│   └── national.parquet                  ← ADM0-aggregated predictions with CI
└── delivery/
    ├── Treefera_{key}_ADM0_Forecast_{init_date}.csv
    ├── Treefera_{key}_ADM1_Forecast_{init_date}.csv
    └── Treefera_{key}_ADM2_Forecast_{init_date}.csv

ForecastSlice.root is the canonical source of truth for this layout (results_slice.py:326):

@property
def root(self) -> Path | CloudPath:
    return self.run_dir / "forecast" / str(self.season_year) / f"{self.init_date:%Y-%m-%d}"

Key attributes

Identity fields

Field Type Notes
run_dir Path \| CloudPath Experiment root; must be an existing directory (validated in __post_init__)
experiment_key str Commodity key (e.g. "corn_usa")
season_year int The harvest year being forecasted (e.g. 2027)
init_date date Calendar date of forecast issuance

__post_init__ (results_slice.py:319) raises FileNotFoundError if run_dir is not an existing directory — the only upfront validation at construction.

Path properties (all Path | CloudPath)

Property Path Notes
root run_dir/forecast/{season_year}/{init_date}/ Per-(season_year, init_date) root (results_slice.py:326)
indices_zarr root/indices.zarr Spliced obs+climo daily indices zarr
features_dir root/features/ Directory for this init's features
features_parquet root/features/pred.parquet Per-init prediction feature matrix
preds_dir root/preds/ Directory for prediction outputs
walk_forward_preds_path root/preds/walk_forward_preds.parquet County-level rolling predictions
year_data_path root/preds/year_data.parquet Raw prediction slice for diagnostics
postprocessed_dir root/postprocessed/ Directory for postprocessed national output
postprocessed_national_path root/postprocessed/national.parquet ADM0 postprocessed parquet
delivery_dir root/delivery/ Directory for client-facing CSVs
bias_corrector_path root/postprocessed/{key}/production/bias_corrector.pkl Production bias corrector

delivery_csv(level) (results_slice.py:375) constructs Treefera_{key}_{level}_Forecast_{init_date:%Y-%m-%d}.csv.

Feature paths (shared, via lazy config load)

Property Returns Notes
features_fit_path Path \| CloudPath features_dir/{key}/fit.parquet — canonical hindcast fit matrix
features_pred_path Path \| CloudPath features_dir/{key}/pred.parquet — canonical hindcast pred matrix

These are read-only references to the canonical hindcast feature matrices (DESIGN.md: forecast SHALL NOT write to these).

Cutoff

@property
def cutoff(self) -> date:
    return self.init_date

results_slice.py:383. Symmetric with HindcastSlice.cutoff — the AbstractSlice protocol requires a uniform cutoff surface regardless of the slice type.

Trained artefact delegation

ForecastSlice does not own trained artefacts. It reaches them via the production HindcastSlice:

@property
def training(self) -> HindcastSlice:
    production = production_hindcast_slice(self.run_dir, self.experiment_key)
    if production is None:
        raise FileNotFoundError(
            f"No production hindcast slice found under {self.run_dir}; "
            "cannot access trained artefacts for this forecast."
        )
    return production

results_slice.py:411. The three delegation methods load_model(), load_detrender(config), and load_feature_fill_values() all call self.training.<method>().

The training property uses production_hindcast_slice() directly (the shared free function at results_slice.py:271) rather than routing through ExperimentResult.from_run_dir, avoiding the historical import cycle (PR #339, Phase 5).

Lifecycle

Created: By ForecastSlice(run_dir=..., experiment_key=..., season_year=..., init_date=...) inside run_forecast.run_features() and run_forecast.run_predict(). Also discovered lazily by ExperimentResult.from_run_dir at run_result.py:96–122.

Populated (stage by stage within run_forecast.run()): 1. indices_zarr — written by materialise_forecast_indices (obs+climo splice). 2. features_parquet — written by _build_forecast_features (feature assembly from zarr). 3. walk_forward_preds_path, year_data_path — written by run_predict_stage. 4. postprocessed_national_path — written by _postprocess_forecast. 5. delivery/ CSVs — written by _deliver_forecast.

Consumed: Delivery layer reads walk_forward_preds_path and postprocessed_national_path. Dashboard queries delivery_csv(level). Export stage globs forecast/*/*/delivery/... (delivery/export.py).

Destroyed: Never explicitly; the run_dir/forecast/{season_year}/{init_date}/ subtree persists until the operator prunes old run_dirs.

Multi-season_year support (PR #369)

The restructured root path means forecasts for 2027 and 2028 at the same init_date write to disjoint subtrees:

run_dir/forecast/2027/2026-05-05/  ← season 2027 forecast
run_dir/forecast/2028/2026-05-05/  ← season 2028 forecast

Before PR #369 both would have written to run_dir/forecast/2026-05-05/, with the second silently overwriting the first.

Relationships

Relationship Entity Notes
Child of ExperimentResult run_dir IS the identity anchor; not independently addressable
Implements AbstractSlice protocol Symmetric surface shared with HindcastSlice
Delegates trained artefacts to HindcastSlice (production) self.training property; raises if production model absent
Consumes CalibrationResult Loaded from conformal/{mode}.parquet by _postprocess_forecast
Generates HindcastDelivery rows _deliver_forecast builds delivery rows from walk_forward_preds

Concepts and pipelines (forward refs to P5)

  • Concept: Forecast path layout — (season_year, init_date) keying
  • Concept: Long-range climo stub — fills missing z-score features for future season_year values
  • Concept: AbstractSlice protocol — symmetric surface with HindcastSlice
  • Pipeline: Forecast pipeline — full run_forecast.run() walkthrough

PRs and commits

PR / commit Relevance
PR-339 Created lib/results/results_slice.py; co-located ForecastSlice with HindcastSlice to break import cycle; introduced production_hindcast_slice helper
PR-369 Restructured root from forecast/{init_date}/ to forecast/{season_year}/{init_date}/; enabled multi-season_year forecasting from a single init_date; updated run_result.py discovery to iterate two levels
PR-372 Made forecast.residual_mode mandatory on ForecastConfig; added validate_residual_mode gate at the start of run_forecast.run() to fail fast before any feature I/O

Open questions

  • The bias corrector path for ForecastSlice is root/postprocessed/{key}/production/bias_corrector.pkl, which differs in structure from HindcastSlice's postprocessed/{key}/{fold}/bias_corrector.pkl. This asymmetry is undocumented in DESIGN.md and may cause confusion when writing consumers that handle both slice types.
  • ForecastSlice.__post_init__ validates that run_dir is an existing directory but does not check that the production model exists. A consumer calling self.training before production is trained will get a FileNotFoundError at delegation time, not at construction time. validate_residual_mode in run_forecast.run() catches this for the CLI path, but library consumers bypass it.
  • Long-range forecasting (season_year beyond climo coverage) relies on a stub (forecast_long_range_stub.py) that is explicitly marked temporary; the slice itself has no field recording whether climo fill was synthetic.