Skip to content

ADR-004: Forecast outputs nest under forecast///

Status: Accepted (retroactively documented 2026-05-08) Date: 2026-05-05 (PR #369 merged via squash commit f5399b96, "feat(commodity_hindcast): forecast multiple season_years per init_date") Authors: [PLACEHOLDER: PR-369 author per git log f5399b96]

Context

The forecast pipeline writes per-init artefacts (spliced indices zarr, features parquet, walk-forward predictions, postprocessed nationals, delivery CSVs) under <run_dir>/forecast/.... The pre-PR-369 layout keyed those artefacts solely on init_date (forecast/<init_date>/), which sufficed when each run_dir carried exactly one season_year.

Multi-year forecasting — driven by client demand for outlooks beyond the current season (e.g. forecasting 2026 corn and 2027 corn from the same April-2026 init) — broke that assumption. Two season_years sharing an init_date collided on the same subtree, so the second write silently overwrote the first. The collision was symmetric across every artefact: indices.zarr, pred.parquet, walk_forward_preds.parquet, national.parquet, and the delivery CSV (market_insights_models/src/commodity_hindcast/lib/results/results_slice.py:333,343,353,358,368,378).

A second pressure: forecasting season_years beyond the observed climo zarr's coverage requires synthesising climo (and, for some commodities, stress) for unseen years, which only makes sense once season_year is a first-class addressable dimension on disk.

Decision

Forecast outputs nest under <run_dir>/forecast/<season_year>/<init_date>/{indices.zarr, features/, preds/, postprocessed/, delivery/} (lib/results/results_slice.py:328). ForecastSlice parameterises on (season_year, init_date) (lib/results/results_slice.py:316-317) and exposes one path property per artefact derived from self.root (lib/results/results_slice.py:333,343,353,358,368,378). Trained artefacts (model, detrender, fill values) remain on the production hindcast slice and are reached via self.training (lib/results/results_slice.py:412-427); only the per-forecast artefacts move into the new subtree.

Long-range climo and stress synthesis stubs run inside build_forecast_features to populate inputs for season_years beyond observed coverage; they no-op when the source already covers the season (stages/run_forecast.py:301, stages/run_forecast.py:310; implementations in features/forecast_long_range_stub.py:67,217).

Consequences

Positive

  • Multi-year forecast becomes embarrassingly parallel: distinct (season_year, init_date) pairs write to disjoint subtrees so concurrent runs never collide (lib/results/results_slice.py:303-308).
  • ForecastSlice cleanly addresses any historical forecast slice via (run_dir, experiment_key, season_year, init_date) (lib/results/results_slice.py:314-317).
  • Forecast season_years beyond the observed climo extent are handled by an explicit synthesis stub rather than an implicit failure (stages/run_forecast.py:297-310).

Negative

  • Hand-built paths in any notebook or one-off script keyed on forecast/<init_date>/ break and must rebase on ForecastSlice or the new forecast/<season_year>/<init_date>/ prefix.
  • The long-range stub adds a new code path that needs explicit testing under unseen-year scenarios; the in-source \TODO markers flag it as removable once the upstream zarr/source covers the horizon (stages/run_forecast.py:297-310).
  • Bias corrector path now composes off postprocessed_dir, so any external tool reading the corrector pickle must use ForecastSlice.bias_corrector_path rather than the old flat path (lib/results/results_slice.py:444-446).

Alternatives considered

  • Stamp init_date directories with a season_year suffix (e.g. forecast/2026-04-01_sy2027/). Rejected: harder to glob across inits within a season_year, and the suffix duplicates information already available as a parent directory.
  • Keep flat layout, require unique init_date per run. Rejected: rules out the legitimate workflow of forecasting multiple season_years from a single init_date, which is exactly the multi-year use case driving the change (wiki/commodity_hindcast/pipelines/multi_year_forecast.md).
  • Compose into a single (season_year, init_date) tuple-keyed filename rather than nested directories. Rejected: artefacts are multi-file (zarr stores, features/builders/, postprocessed/ subtree) and a flat filename scheme cannot represent them.

Verification

  • ForecastSlice path properties unit-tested for (season_year, init_date) rooting: [PLACEHOLDER: cite specific tests under tests/commodity_hindcast/lib/results/test_results_slice.py].
  • Smoke test: cli run forecast --season-year YYYY --init-date YYYY-MM-DD followed by ls <run_dir>/forecast/<YYYY>/<YYYY-MM-DD>/ confirms the nested layout at every artefact path.
  • Long-range stub no-op assertion: when the climo zarr covers the requested season, synthesise_long_range_climo_for_unseen_years must not mutate inputs (features/forecast_long_range_stub.py:67).

Migration

Existing run_dirs written under the flat forecast/<init_date>/ layout are not auto-migrated. The forecast subtree is fully regenerable from the production training slice + climo zarr; the recommended migration is to re-run forecast for the desired (season_year, init_date) pairs against the existing run_dir, which writes into the new layout without touching trained artefacts (lib/results/results_slice.py:412-427). Any caller still pointing at forecast/<init_date>/ paths must be rebased on ForecastSlice.

References

  • market_insights_models/src/commodity_hindcast/lib/results/results_slice.py:302-451ForecastSlice aggregate
  • lib/results/results_slice.py:328root = run_dir/forecast/<season_year>/<init_date>/
  • lib/results/results_slice.py:333,343,353,358,368,378 — per-artefact path properties
  • lib/results/results_slice.py:412-427training delegation to production hindcast slice
  • lib/results/results_slice.py:444-446bias_corrector_path composed off postprocessed_dir
  • market_insights_models/src/commodity_hindcast/stages/run_forecast.py:301 — long-range climo stub call site
  • stages/run_forecast.py:310 — long-range stress stub call site
  • market_insights_models/src/commodity_hindcast/features/forecast_long_range_stub.py:67,217 — stub implementations
  • wiki/commodity_hindcast/entities/ForecastSlice.md
  • wiki/commodity_hindcast/sources/prs/PR-369.md
  • wiki/commodity_hindcast/pipelines/multi_year_forecast.md
  • wiki/commodity_hindcast/pipelines/forecast.md
  • PR #369 (squash commit f5399b96, merged 2026-05-05)