Forecast for one (season_year, init_date)¶

Run a single point-in-time forecast against an existing hindcast run_dir. One
(season_year, init_date) tuple is one unit of work: it owns its own subtree
under run_dir/forecast/{season_year}/{init_date}/ and is independent of every
other tuple. This runbook covers the standalone forecast invocation only, not
the upstream make hindcast or cli run fit-production paths that produce the
run_dir itself.
Visual reference: /data/processing/tmp/tmi-explainers/output/ch_02_forecast.png.
1. When to use¶
- First-time forecast on a new
init_datefor an existing trainedrun_dir(e.g. a Tuesday refresh adding 2026-04-15 to a corn run dated 2026-04-08). - Weekly client refresh for an already-served
run_dir: each new init date produces a fresh delivery CSV without retraining the model. - Smoke test after a forecast-pipeline change (e.g. preflight, residual
calibration, or long-range stub edits) — pick a known-good
(run_dir, season_year, init_date)triple and re-run end-to-end.
Out of scope: training a new model (use the hindcast or fit-production
runbooks), back-filling many init dates at once (drive this runbook from a
loop), and any non-commodity_hindcast forecast (crop_yield has its own
pipeline).
2. Preconditions¶
The forecast pipeline only ever reads trained artefacts and writes into the per-init subtree, so these must already exist on disk before you start:
- An existing hindcast
run_dirproduced bycli run hindcast,cli run all, orcli run fit-production. The path you pass to--run-diris the experiment root that containsmodels/,config_resolved.yaml, and either hindcast walk-forward folds, a production fold, or both. - Production model artefacts at
<run_dir>/models/<experiment_key>/production/. Preflight asserts the presence ofdetrender.pklandfeature_fill_values.parquet— seemarket_insights_models/src/commodity_hindcast/run/preflight.py:147(preflight_paths_for_forecast_predict, lines 147–180). included_geo_identifiers.txtat<run_dir>/included_geo_identifiers.txt. Written by the hindcast stage and consumed by the bias corrector during forecast postprocess.- Canonical hindcast
pred.parquetat<features_dir>/<experiment_key>/pred.parquet. Used to impute NaNarea_harvested_hafor the freshly-built forecast features (NASS has not released the current season's harvested area yet). Preflight asserts this inpreflight_paths_for_forecast_featuresatmarket_insights_models/src/commodity_hindcast/run/preflight.py:132(lines 132–144). forecast.residual_modeset in the resolved config (mandatory since PR-372; seedrafts/decisions/ADR-003-mandatory-residual-mode.md). Permitted values:hindcast_oos_per_init_date,hindcast_oos_per_year,hindcast_oos_fully_pooled,in_sample_pooled.- Residual-mode requirements satisfied on the run_dir:
- any
hindcast_oos_*mode requires walk-forward CV fold predictions on disk (i.e. therun_dirwas produced by a hindcast, not just a fit-production); in_sample_pooledrequires a production foldtrain_preds.parquet.
The orchestrator validates this combination before any feature build via
validate_residual_mode at
market_insights_models/src/commodity_hindcast/stages/run_forecast.py:91,
invoked from run at line 161.
If any precondition fails, preflight raises SystemExit with the path that
was missing — see run_preflight at
market_insights_models/src/commodity_hindcast/run/preflight.py:42 (lines
42–51).
3. Procedure¶
3a. Single-call form (recommended)¶
Source: market_insights_models/src/commodity_hindcast/cli.py:450
(@run.command("forecast"), options at lines 451–475, handler
run_forecast_cmd at line 477). All three flags are required.
The handler delegates to the orchestrator at
market_insights_models/src/commodity_hindcast/stages/run_forecast.py:143
(run), which calls validate_residual_mode at line 161 before doing any
work, then runs features (line 163) and predict (line 164) in turn.
3b. Split form (features then predict)¶
When debugging a feature build or repredicting against an existing forecast features parquet, split the call:
cli run forecast-features \
--run-dir <run_root> \
--season-year YYYY \
--init-date YYYY-MM-DD
cli run forecast-predict \
--run-dir <run_root> \
--season-year YYYY \
--init-date YYYY-MM-DD
Sources: cli.py:355 (@run.command("forecast-features")) and cli.py:406
(@run.command("forecast-predict")); they delegate to run_features at
stages/run_forecast.py:167 and run_predict at stages/run_forecast.py:221
respectively.
forecast-features accepts --force/--no-force (cli.py:376) — when --force
is set, both indices.zarr and features/pred.parquet are rebuilt
unconditionally; otherwise the build is a no-op when both already exist (see
the existence guard at stages/run_forecast.py:207).
3c. What gets written¶
The orchestrator drives a ForecastSlice rooted at
<run_dir>/forecast/{season_year}/{init_date}/ — see
market_insights_models/src/commodity_hindcast/lib/results/results_slice.py:302
(class ForecastSlice, path properties at lines 325–378). The subtree layout
is fixed:
<run_dir>/forecast/<season_year>/<init_date>/
├── indices.zarr/ # spliced obs + climo daily indices
├── features/pred.parquet # per-init feature matrix
├── preds/walk_forward_preds.parquet # county-level predictions
├── preds/year_data.parquet # pre-sim diagnostics slice
├── postprocessed/national.parquet # ADM0 with bias + CI columns
└── delivery/Treefera_<key>_ADM{0,1,2}_Forecast_<init_date>.csv
4. Verification¶
Confirm three things after the run completes:
- Delivery CSVs on disk. All three ADM levels must exist:
<run_dir>/forecast/<season_year>/<init_date>/delivery/
Treefera_<experiment_key>_ADM0_Forecast_<init_date>.csv
Treefera_<experiment_key>_ADM1_Forecast_<init_date>.csv
Treefera_<experiment_key>_ADM2_Forecast_<init_date>.csv
Filename construction: results_slice.py:377 (delivery_csv).
-
Postprocessed national parquet. Path
<run_dir>/forecast/<season_year>/<init_date>/postprocessed/national.parquet. Expected shape: one row per(year, init_date)row in the production walk-forward predictions, columns includingsim_yield_kg_ha,obs_yield_kg_ha, the configured bias-corrected mean, and the configured CI columns. The writer is_postprocess_forecastatstages/run_forecast.py:362(writes at line 402). -
MLflow run is FINISHED. Inspect the latest run for the experiment in the tracking store and confirm the run status is
FINISHED, notRUNNINGorFAILED. AFAILEDstatus with delivery CSVs on disk indicates a partial write — re-run before serving.
5. Failure modes and recovery¶
| Symptom | Where it raises | Fix |
|---|---|---|
Pydantic validation: forecast.residual_mode missing on the resolved config |
Config validation, before any stage runs (mandatory since PR-372; see ADR-003) | Add forecast.residual_mode: <value> to the YAML and re-resolve. |
FileNotFoundError: OOS mode but run_dir has no walk-forward CV folds |
validate_residual_mode, stages/run_forecast.py:127 |
Either run make hindcast EXPERIMENT_KEY=<key> against a fresh run_dir, or set forecast.residual_mode: 'in_sample_pooled' in the YAML. |
FileNotFoundError: in_sample_pooled but no production fold train_preds.parquet |
validate_residual_mode, stages/run_forecast.py:135 |
Run cli run fit-production --config configs/<key>.yaml against the same run_dir. |
FileNotFoundError: neither CV folds nor production fold present |
validate_residual_mode, stages/run_forecast.py:118 |
Run make hindcast (preferred — honest CIs) or cli run fit-production (fast — narrow CIs). |
SystemExit: production model artefacts missing under models/<key>/production/ |
preflight_paths_for_forecast_predict, run/preflight.py:147 (asserts detrender.pkl etc.) |
Run cli run fit-production --config configs/<key>.yaml against the same run_dir; see ADR-005. |
SystemExit: canonical hindcast pred.parquet missing for area imputation |
preflight_paths_for_forecast_features, run/preflight.py:132 |
Run cli run features --config configs/<key>.yaml to produce the canonical features. |
SystemExit: per-init forecast features parquet absent at predict-time |
preflight_paths_for_forecast_predict, run/preflight.py:147 |
Run cli run forecast-features first, or use the single-call cli run forecast form. |
Out-of-zarr-extent season_year (e.g. 2030 against a 2027-bounded climo zarr) |
Long-range climo stub fires automatically — synthesise_long_range_climo_for_unseen_years, stages/run_forecast.py:301 |
No action required for routine runs. Known limitation per PR-369; the stub fills missing years from a trailing-3-year median per county. |
| Stale forecast features parquet from an earlier partial run | n/a — silent reuse | Re-run cli run forecast-features --force to rebuild indices.zarr and features/pred.parquet. Force flag at cli.py:376. |
For a deeper dive on the residual-mode gate, see
drafts/decisions/ADR-003-mandatory-residual-mode.md. For the path layout
and rationale (why per-(season_year, init_date) subtrees), see
drafts/decisions/ADR-004-forecast-path-restructure.md.
6. Rollback¶
Each (season_year, init_date) tuple is a disjoint subtree at
<run_dir>/forecast/<season_year>/<init_date>/. The unit of work is the
subdirectory; no shared state outside it is mutated by a forecast run.
To roll back a single init:
- Stop any consumer reading the delivery CSVs for the affected
(experiment_key, init_date). - Remove (or archive)
<run_dir>/forecast/<season_year>/<init_date>/in its entirety. - Re-run the procedure in section 3.
Rollback does not touch other init dates, other season years, or any training
artefact under <run_dir>/models/. There is no shared cache, no shared
pred.parquet write, and no MLflow side-effect that survives directory
removal beyond the original run record.
If a delivery CSV has already been served to a client, follow the data-correction
process owned by [PLACEHOLDER] before deleting. Re-running with the same
(season_year, init_date) overwrites the per-init subtree in place.
References¶
- Source:
market_insights_models/src/commodity_hindcast/stages/run_forecast.py(runat :143,validate_residual_modeat :91 and :161,run_featuresat :167,run_predictat :221, long-range climo stub call at :301). - Source:
market_insights_models/src/commodity_hindcast/run/preflight.py(preflight_paths_for_forecast_featuresat :132,preflight_paths_for_forecast_predictat :147,preflight_paths_for_forecastat :183). - Source:
market_insights_models/src/commodity_hindcast/lib/results/results_slice.py(ForecastSliceat :302). - CLI:
market_insights_models/src/commodity_hindcast/cli.py(@run.command("forecast")at :450,forecast-featuresat :355,forecast-predictat :406). - Wiki:
wiki/commodity_hindcast/pipelines/forecast.md,wiki/commodity_hindcast/pipelines/multi_year_forecast.md,wiki/commodity_hindcast/entities/ForecastSlice.md. - ADRs:
drafts/decisions/ADR-003-mandatory-residual-mode.md,drafts/decisions/ADR-004-forecast-path-restructure.md,drafts/decisions/ADR-005-fit-production-endpoint.md. - Visual:
/data/processing/tmp/tmi-explainers/output/ch_02_forecast.png.