ForecastSlice¶
Definition¶
ForecastSlice is the per-(season_year, init_date) artefact handle for the commodity_hindcast forecast pipeline. It is a frozen dataclass that exposes all paths and loaders for one in-season forecast. It satisfies the AbstractSlice protocol alongside HindcastSlice. Its distinguishing design principle is artefact isolation: all forecast-specific files land under run_dir/forecast/{season_year}/{init_date}/ and never touch canonical hindcast paths. Trained artefacts (model, detrender, fill values) are delegated to the production HindcastSlice via the training property.
Kind: Frozen dataclass (@dataclass(frozen=True, kw_only=True)). Aggregate root within ExperimentResult; not independently addressable without its run_dir.
Source of truth: market_insights_models/src/commodity_hindcast/lib/results/results_slice.py:301.
Path layout (introduced in PR #369)¶
Before PR #369, forecast artefacts lived at run_dir/forecast/{init_date}/. Multiple season_year values at the same init_date overwrote each other. PR #369 restructured the layout to:
run_dir/forecast/{season_year}/{init_date}/
├── indices.zarr ← spliced obs+climo daily indices
├── features/
│ └── pred.parquet ← per-init feature matrix (never touches canonical hindcast copy)
├── preds/
│ ├── walk_forward_preds.parquet ← county-level rolling predictions
│ └── year_data.parquet ← raw prediction slice for diagnostics
├── postprocessed/
│ └── national.parquet ← ADM0-aggregated predictions with CI
└── delivery/
├── Treefera_{key}_ADM0_Forecast_{init_date}.csv
├── Treefera_{key}_ADM1_Forecast_{init_date}.csv
└── Treefera_{key}_ADM2_Forecast_{init_date}.csv
ForecastSlice.root is the canonical source of truth for this layout (results_slice.py:326):
@property
def root(self) -> Path | CloudPath:
return self.run_dir / "forecast" / str(self.season_year) / f"{self.init_date:%Y-%m-%d}"
Key attributes¶
Identity fields¶
| Field | Type | Notes |
|---|---|---|
run_dir |
Path \| CloudPath |
Experiment root; must be an existing directory (validated in __post_init__) |
experiment_key |
str |
Commodity key (e.g. "corn_usa") |
season_year |
int |
The harvest year being forecasted (e.g. 2027) |
init_date |
date |
Calendar date of forecast issuance |
__post_init__ (results_slice.py:319) raises FileNotFoundError if run_dir is not an existing directory — the only upfront validation at construction.
Path properties (all Path | CloudPath)¶
| Property | Path | Notes |
|---|---|---|
root |
run_dir/forecast/{season_year}/{init_date}/ |
Per-(season_year, init_date) root (results_slice.py:326) |
indices_zarr |
root/indices.zarr |
Spliced obs+climo daily indices zarr |
features_dir |
root/features/ |
Directory for this init's features |
features_parquet |
root/features/pred.parquet |
Per-init prediction feature matrix |
preds_dir |
root/preds/ |
Directory for prediction outputs |
walk_forward_preds_path |
root/preds/walk_forward_preds.parquet |
County-level rolling predictions |
year_data_path |
root/preds/year_data.parquet |
Raw prediction slice for diagnostics |
postprocessed_dir |
root/postprocessed/ |
Directory for postprocessed national output |
postprocessed_national_path |
root/postprocessed/national.parquet |
ADM0 postprocessed parquet |
delivery_dir |
root/delivery/ |
Directory for client-facing CSVs |
bias_corrector_path |
root/postprocessed/{key}/production/bias_corrector.pkl |
Production bias corrector |
delivery_csv(level) (results_slice.py:375) constructs Treefera_{key}_{level}_Forecast_{init_date:%Y-%m-%d}.csv.
Feature paths (shared, via lazy config load)¶
| Property | Returns | Notes |
|---|---|---|
features_fit_path |
Path \| CloudPath |
features_dir/{key}/fit.parquet — canonical hindcast fit matrix |
features_pred_path |
Path \| CloudPath |
features_dir/{key}/pred.parquet — canonical hindcast pred matrix |
These are read-only references to the canonical hindcast feature matrices (DESIGN.md: forecast SHALL NOT write to these).
Cutoff¶
results_slice.py:383. Symmetric with HindcastSlice.cutoff — the AbstractSlice protocol requires a uniform cutoff surface regardless of the slice type.
Trained artefact delegation¶
ForecastSlice does not own trained artefacts. It reaches them via the production HindcastSlice:
@property
def training(self) -> HindcastSlice:
production = production_hindcast_slice(self.run_dir, self.experiment_key)
if production is None:
raise FileNotFoundError(
f"No production hindcast slice found under {self.run_dir}; "
"cannot access trained artefacts for this forecast."
)
return production
results_slice.py:411. The three delegation methods load_model(), load_detrender(config), and load_feature_fill_values() all call self.training.<method>().
The training property uses production_hindcast_slice() directly (the shared free function at results_slice.py:271) rather than routing through ExperimentResult.from_run_dir, avoiding the historical import cycle (PR #339, Phase 5).
Lifecycle¶
Created: By ForecastSlice(run_dir=..., experiment_key=..., season_year=..., init_date=...) inside run_forecast.run_features() and run_forecast.run_predict(). Also discovered lazily by ExperimentResult.from_run_dir at run_result.py:96–122.
Populated (stage by stage within run_forecast.run()):
1. indices_zarr — written by materialise_forecast_indices (obs+climo splice).
2. features_parquet — written by _build_forecast_features (feature assembly from zarr).
3. walk_forward_preds_path, year_data_path — written by run_predict_stage.
4. postprocessed_national_path — written by _postprocess_forecast.
5. delivery/ CSVs — written by _deliver_forecast.
Consumed: Delivery layer reads walk_forward_preds_path and postprocessed_national_path. Dashboard queries delivery_csv(level). Export stage globs forecast/*/*/delivery/... (delivery/export.py).
Destroyed: Never explicitly; the run_dir/forecast/{season_year}/{init_date}/ subtree persists until the operator prunes old run_dirs.
Multi-season_year support (PR #369)¶
The restructured root path means forecasts for 2027 and 2028 at the same init_date write to disjoint subtrees:
run_dir/forecast/2027/2026-05-05/ ← season 2027 forecast
run_dir/forecast/2028/2026-05-05/ ← season 2028 forecast
Before PR #369 both would have written to run_dir/forecast/2026-05-05/, with the second silently overwriting the first.
Relationships¶
| Relationship | Entity | Notes |
|---|---|---|
| Child of | ExperimentResult |
run_dir IS the identity anchor; not independently addressable |
| Implements | AbstractSlice protocol |
Symmetric surface shared with HindcastSlice |
| Delegates trained artefacts to | HindcastSlice (production) |
self.training property; raises if production model absent |
| Consumes | CalibrationResult |
Loaded from conformal/{mode}.parquet by _postprocess_forecast |
| Generates | HindcastDelivery rows |
_deliver_forecast builds delivery rows from walk_forward_preds |
Concepts and pipelines (forward refs to P5)¶
- Concept: Forecast path layout —
(season_year, init_date)keying - Concept: Long-range climo stub — fills missing z-score features for future
season_yearvalues - Concept: AbstractSlice protocol — symmetric surface with HindcastSlice
- Pipeline: Forecast pipeline — full
run_forecast.run()walkthrough
PRs and commits¶
| PR / commit | Relevance |
|---|---|
| PR-339 | Created lib/results/results_slice.py; co-located ForecastSlice with HindcastSlice to break import cycle; introduced production_hindcast_slice helper |
| PR-369 | Restructured root from forecast/{init_date}/ to forecast/{season_year}/{init_date}/; enabled multi-season_year forecasting from a single init_date; updated run_result.py discovery to iterate two levels |
| PR-372 | Made forecast.residual_mode mandatory on ForecastConfig; added validate_residual_mode gate at the start of run_forecast.run() to fail fast before any feature I/O |
Open questions¶
- The bias corrector path for
ForecastSliceisroot/postprocessed/{key}/production/bias_corrector.pkl, which differs in structure fromHindcastSlice'spostprocessed/{key}/{fold}/bias_corrector.pkl. This asymmetry is undocumented inDESIGN.mdand may cause confusion when writing consumers that handle both slice types. ForecastSlice.__post_init__validates thatrun_diris an existing directory but does not check that the production model exists. A consumer callingself.trainingbefore production is trained will get aFileNotFoundErrorat delegation time, not at construction time.validate_residual_modeinrun_forecast.run()catches this for the CLI path, but library consumers bypass it.- Long-range forecasting (
season_yearbeyond climo coverage) relies on a stub (forecast_long_range_stub.py) that is explicitly marked temporary; the slice itself has no field recording whether climo fill was synthetic.