PR #369 — feat(commodity_hindcast): forecast multiple season_years per init_date¶

At a glance¶

Author: ai-tommytf
Merged: 2026-05-05
Branch: tl/extend-forecast-to-5-years
Net effect: Two coupled changes: (1) forecast artefacts moved from run_dir/forecast/{init_date}/ to run_dir/forecast/{season_year}/{init_date}/ so different season_years at the same init_date no longer overwrite each other; (2) a long-range climatology stub (forecast_long_range_stub.py) fills missing z-score features for season_years beyond the climo zarr's coverage, enabling --season-year YYYY+1..YYYY+N invocations from a single init_date.
Why this matters: This is the foundation for issuing multi-year yield outlooks from a single pipeline invocation; without it, calling forecast twice at the same init_date for different season_years silently reused stale artefacts and crashed.

PR body (faithful extract)¶

tl;dr¶

This branch teaches the commodity-hindcast forecast pipeline to issue **multiple yield forecasts**
(e.g. for the 2026, 2027, and 2028 wheat seasons) from a single `init_date` — the calendar day
on which the forecast is issued. Two coupled changes:

1. **Filesystem layout**: forecast outputs now live under `run_dir/forecast/{season_year}/{init_date}/`
   so two forecasts at the same `init_date` for different season_years no longer overwrite each other.
2. **Long-range climo stub**: when the upstream climatology data store does not yet cover a future
   season_year, a generalised **panel imputer** fills the missing rows (per-county trailing-3-year
   median by default) instead of the build crashing.

The problem¶

Bug A — single forecast directory per init_date. All per-call artefacts lived under run_dir/forecast/{init_date}/. Calling forecast twice at the same init_date for different season_years overwrote everything on the second call, then the predict step raised:

ValueError: No rows in pred.parquet for season_year=2027, init_date=2026-05-05

Bug B — climatology zarr stops at the latest observed year. For wheat 2027 the season touches calendar years {2026, 2027}. The 2027 portion is not in the zarr, so the climo builder emitted 0 rows and assembly failed with:

ValueError: Configured commodity.feature_cols missing from merged feature matrix:
  ['gdd_zscore_apr_jul', 'tavg_zscore_apr_jul', 'dtr_zscore_apr_jul', ...]

Change 1 — Path restructure (`2206b2ea`)¶

ForecastSlice.root is the single source of truth for where forecast artefacts live:

@property
def root(self) -> Path:
    """Per-(season_year, init_date) root: run_dir/forecast/{season_year}/{init_date}/."""
    return self.run_dir / 'forecast' / str(self.season_year) / f'{self.init_date:%Y-%m-%d}'

Four other places hard-coded the same path string and were updated: run/preflight.py, stages/run_predict.py, lib/results/run_result.py (now does two-level iteration), delivery/export.py (glob forecast/*/delivery/... → forecast/*/*/delivery/...).

Before / after layout (Mermaid)¶

BEFORE — keyed only on init_date          AFTER — keyed on (season_year, init_date)
run_dir/forecast/                          run_dir/forecast/
  2026-05-05/                                2026/
    indices.zarr                               2026-05-05/
    features/pred.parquet                        indices.zarr
  2026-06-12/                                2027/
    ...                                          2026-05-05/
                                                   indices.zarr
                                             2028/
                                                 2026-05-05/

Change 2 — Long-range climo stub¶

The stub lives in features/forecast_long_range_stub.py. Its filename announces it is temporary. It fires only when the climo zarr does not cover all calendar years needed by the requested season_year:

needed_cal_years = {
    cfg.commodity.season_start_date(season_year).year,
    cfg.commodity.harvest_date(season_year).year,
}
available_cal_years = {int(y) for y in ds['year'].values}

if needed_cal_years.issubset(available_cal_years):
    return   # zarr covers this season — let the normal climo builder run

When it fires it emits three logger.warning lines listing the missing years. Removal criteria are documented in the module docstring (delete when the climo zarr is extended to cover the horizon and no caller imports from the module).

Change 3 — Imputer generalisation (`0bc4ce64`)¶

The existing per-county trailing-median imputer was generalised into impute_missing_panel_columns with method dispatch (trailing_median, trailing_mean, zero). Two callers: _impute_forecast_area (area fill) and synthesise_long_range_climo_for_unseen_years (z-score fill).

Why long-range forecasts are trend-only by design¶

The wheat model carries a piecewise-linear season_doy_weather_weight schedule. For an init_date before the target season starts, season-DOY ≤ 1, so w = 0 and yield = trend(year, county) exactly. Different climo fill methods verify the imputer dispatch is wired correctly, but all collapse to the same prediction once the schedule is restored:

method	yield bu/ac	weather_corr
zero	47.675	−3.907
trailing_median	47.000	−4.581
trailing_mean	47.115	−4.467
spike_plus5 (+5 SD probe)	54.423	+2.841

With original schedule restored, all four methods → 51.581 bu/ac, weather_corr=0.0000 — proof that trend-only behaviour is the schedule, not the imputer.

Test status¶

......................................................................   [100%]
70 passed in 7.46s

All 70 targeted unit tests pass (imputation, edit-panel, results-slice, preflight, forecast-slice discovery).

Files / lines touched¶

Additions	Deletions	File
+319	-0	`market_insights_models/src/commodity_hindcast/features/forecast_long_range_stub.py`
+212	-0	`tests/unit/commodity_hindcast/features/test_forecast_long_range_stress_stub.py`
+139	-51	`market_insights_models/src/commodity_hindcast/lib/edit_and_imputation/imputation.py`
+45	-11	`market_insights_models/src/commodity_hindcast/stages/run_forecast.py`
+22	-18	`market_insights_models/src/commodity_hindcast/lib/results/run_result.py`
+26	-3	`market_insights_models/src/commodity_hindcast/delivery/schemas.py`
+16	-7	`market_insights_models/src/commodity_hindcast/run/preflight.py`
+10	-8	`market_insights_models/src/commodity_hindcast/lib/results/results_slice.py`
+12	-6	`market_insights_models/src/commodity_hindcast/stages/run_predict.py`
+8	-5	`market_insights_models/src/commodity_hindcast/delivery/export.py`

Cross-references¶

Related entity pages: ForecastSlice, ExperimentResult
Related concept pages: forecast path layout, long-range climo stub, imputation panel
Related code pages: lib (results_slice), stages (run_forecast)
Directly follows: PR-372 (which made residual_mode mandatory — also merged same day)

Glossary (from PR body)¶

season_year — the harvest year the forecast targets. For US wheat, the 2027 season_year covers October 2026 → August 2027.
init_date — the calendar day on which the forecast is issued.
season-DOY (sDOY) — day-of-year measured from season_start rather than 1 January.
z-score climo features — anomaly features (gdd_zscore_apr_jul etc.) relative to the long-run climatology. Population mean is 0 by definition.
canonical hindcast pred.parquet — the historical feature matrix consulted by the panel imputer for trailing values.

Lessons captured¶

Forecast artefacts are keyed by (season_year, init_date), not init_date alone; ForecastSlice.root is the single source of truth for the path.
lib/results/run_result.py discovery loop must iterate two levels (outer season_year, inner init_date).
The long-range climo stub fires automatically when the zarr does not cover the requested season; it logs three WARNING lines and writes a synthetic builders/climo.parquet using per-county trailing-3-year medians.
Long-range forecasts (init_date before target season start) collapse to trend-only output because season_doy_weather_weight = 0; this is a deliberate model behaviour, not a stub limitation.
impute_missing_panel_columns is the shared primitive for any panel-column fill; impute_missing_area is a thin single-column wrapper kept for backwards compatibility with existing callers.
Stress builder long-range support is not in this PR; stress_score has a non-trivial bounded range and needs explicit per-column method choices before a stub can be wired in.