ADR-004: Forecast outputs nest under forecast///¶
Status: Accepted (retroactively documented 2026-05-08)
Date: 2026-05-05 (PR #369 merged via squash commit f5399b96,
"feat(commodity_hindcast): forecast multiple season_years per init_date")
Authors: [PLACEHOLDER: PR-369 author per git log f5399b96]
Context¶
The forecast pipeline writes per-init artefacts (spliced indices zarr,
features parquet, walk-forward predictions, postprocessed nationals,
delivery CSVs) under <run_dir>/forecast/.... The pre-PR-369 layout
keyed those artefacts solely on init_date
(forecast/<init_date>/), which sufficed when each run_dir carried
exactly one season_year.
Multi-year forecasting — driven by client demand for outlooks beyond
the current season (e.g. forecasting 2026 corn and 2027 corn from the
same April-2026 init) — broke that assumption. Two season_years sharing
an init_date collided on the same subtree, so the second write
silently overwrote the first. The collision was symmetric across every
artefact: indices.zarr, pred.parquet, walk_forward_preds.parquet,
national.parquet, and the delivery CSV
(market_insights_models/src/commodity_hindcast/lib/results/results_slice.py:333,343,353,358,368,378).
A second pressure: forecasting season_years beyond the observed climo zarr's coverage requires synthesising climo (and, for some commodities, stress) for unseen years, which only makes sense once season_year is a first-class addressable dimension on disk.
Decision¶
Forecast outputs nest under
<run_dir>/forecast/<season_year>/<init_date>/{indices.zarr, features/,
preds/, postprocessed/, delivery/}
(lib/results/results_slice.py:328). ForecastSlice parameterises on
(season_year, init_date) (lib/results/results_slice.py:316-317)
and exposes one path property per artefact derived from
self.root (lib/results/results_slice.py:333,343,353,358,368,378).
Trained artefacts (model, detrender, fill values) remain on the
production hindcast slice and are reached via self.training
(lib/results/results_slice.py:412-427); only the per-forecast
artefacts move into the new subtree.
Long-range climo and stress synthesis stubs run inside
build_forecast_features to populate inputs for season_years beyond
observed coverage; they no-op when the source already covers the
season (stages/run_forecast.py:301, stages/run_forecast.py:310;
implementations in
features/forecast_long_range_stub.py:67,217).
Consequences¶
Positive¶
- Multi-year forecast becomes embarrassingly parallel: distinct
(season_year, init_date)pairs write to disjoint subtrees so concurrent runs never collide (lib/results/results_slice.py:303-308). ForecastSlicecleanly addresses any historical forecast slice via(run_dir, experiment_key, season_year, init_date)(lib/results/results_slice.py:314-317).- Forecast season_years beyond the observed climo extent are handled
by an explicit synthesis stub rather than an implicit failure
(
stages/run_forecast.py:297-310).
Negative¶
- Hand-built paths in any notebook or one-off script keyed on
forecast/<init_date>/break and must rebase onForecastSliceor the newforecast/<season_year>/<init_date>/prefix. - The long-range stub adds a new code path that needs explicit
testing under unseen-year scenarios; the in-source
\TODOmarkers flag it as removable once the upstream zarr/source covers the horizon (stages/run_forecast.py:297-310). - Bias corrector path now composes off
postprocessed_dir, so any external tool reading the corrector pickle must useForecastSlice.bias_corrector_pathrather than the old flat path (lib/results/results_slice.py:444-446).
Alternatives considered¶
- Stamp init_date directories with a season_year suffix (e.g.
forecast/2026-04-01_sy2027/). Rejected: harder to glob across inits within a season_year, and the suffix duplicates information already available as a parent directory. - Keep flat layout, require unique
init_dateper run. Rejected: rules out the legitimate workflow of forecasting multiple season_years from a single init_date, which is exactly the multi-year use case driving the change (wiki/commodity_hindcast/pipelines/multi_year_forecast.md). - Compose into a single
(season_year, init_date)tuple-keyed filename rather than nested directories. Rejected: artefacts are multi-file (zarr stores, features/builders/, postprocessed/ subtree) and a flat filename scheme cannot represent them.
Verification¶
ForecastSlicepath properties unit-tested for(season_year, init_date)rooting: [PLACEHOLDER: cite specific tests undertests/commodity_hindcast/lib/results/test_results_slice.py].- Smoke test:
cli run forecast --season-year YYYY --init-date YYYY-MM-DDfollowed byls <run_dir>/forecast/<YYYY>/<YYYY-MM-DD>/confirms the nested layout at every artefact path. - Long-range stub no-op assertion: when the climo zarr covers the
requested season,
synthesise_long_range_climo_for_unseen_yearsmust not mutate inputs (features/forecast_long_range_stub.py:67).
Migration¶
Existing run_dirs written under the flat forecast/<init_date>/
layout are not auto-migrated. The forecast subtree is fully
regenerable from the production training slice + climo zarr; the
recommended migration is to re-run forecast for the desired
(season_year, init_date) pairs against the existing run_dir,
which writes into the new layout without touching trained artefacts
(lib/results/results_slice.py:412-427). Any caller still pointing
at forecast/<init_date>/ paths must be rebased on ForecastSlice.
References¶
market_insights_models/src/commodity_hindcast/lib/results/results_slice.py:302-451—ForecastSliceaggregatelib/results/results_slice.py:328—root=run_dir/forecast/<season_year>/<init_date>/lib/results/results_slice.py:333,343,353,358,368,378— per-artefact path propertieslib/results/results_slice.py:412-427—trainingdelegation to production hindcast slicelib/results/results_slice.py:444-446—bias_corrector_pathcomposed offpostprocessed_dirmarket_insights_models/src/commodity_hindcast/stages/run_forecast.py:301— long-range climo stub call sitestages/run_forecast.py:310— long-range stress stub call sitemarket_insights_models/src/commodity_hindcast/features/forecast_long_range_stub.py:67,217— stub implementationswiki/commodity_hindcast/entities/ForecastSlice.mdwiki/commodity_hindcast/sources/prs/PR-369.mdwiki/commodity_hindcast/pipelines/multi_year_forecast.mdwiki/commodity_hindcast/pipelines/forecast.md- PR #369 (squash commit
f5399b96, merged 2026-05-05)