Concept: Walk-Forward Cross-Validation¶
What it is¶
Walk-forward CV is the hindcast pipeline's out-of-sample evaluation strategy.
Unlike random k-fold splits, it respects temporal ordering: fold k trains on all
data strictly before season_year = k, then predicts only on year k. Each fold
sees a strictly growing training set — the "expanding window" property — which mirrors
how the model would have been deployed in production.
The held-out years are config.experiment_protocol.test_years (e.g.
[2018, 2019, 2020, 2021, 2022, 2023]). After all CV folds complete, a final
production fit trains on every available year (no holdout) — the model consumed
by the forecast pipeline.
Walk-forward CV serves two purposes:
-
Honest OOS scoring. Because each fold's predictions were never seen during training, resulting metrics (RMSE, skill score) are honest estimates of real-world skill.
-
Conformal calibration data. Per-fold OOS residuals (
obs − sim) feed the threehindcast_oos_*residual modes.in_sample_pooledis the fallback forfit_production-only run_dirs. Seeconformal_modes.md.
Where it lives in the code¶
| Component | Location | Purpose |
|---|---|---|
ExpandingFoldGenerator |
run/experiment_protocol.py:110 |
Yields (fold_label, train_data, test_data, year_data, references_fold) per fold |
run_walk_forward |
run/runner.py:27 |
Iterates folds; accumulates rolling preds in memory; writes once per fold |
run_experiment |
run/experiment_protocol.py:22 |
Fits detrender + model for one fold; writes train_preds.parquet |
_run_walk_forward_phase |
stages/run_hindcast.py |
Sets up generator, calls run_walk_forward |
_run_production_fit_phase |
stages/run_hindcast.py |
Trains with fold_label="production" on all data |
ExpandingFoldGenerator¶
ExpandingFoldGenerator.generate_folds (experiment_protocol.py:135) yields per test year:
train_data = self.fit_df[self.fit_df["year"] < ty] # strictly expanding
test_data = self.fit_df[self.fit_df["year"] == ty]
year_data = self.pred_df[self.pred_df["year"] == ty]
yield str(ty), train_data, test_data, year_data, references_fold
The fold label is the string representation of the test year (e.g. "2022").
year_data contains all init_date values in hindcast_init_season_doys and is
the input to the within-fold rolling prediction sweep.
AbstractFoldGenerator (experiment_protocol.py:92) is the ABC; only the expanding
strategy is used in production.
Rolling prediction sweep within each fold¶
After fitting via run_experiment, run_walk_forward calls _predict_fold_rolling
(runner.py:86) which:
- Loads the fold's artefacts from disk (
load_model,load_detrender,load_feature_fill_values). - Iterates
sorted(year_data['init_date'].unique()). - For each
init_date, detrends the slice, runs the regression kernel, fills positional masks on an in-memory copy ofyear_data. - Converts to the wide
walk_forward_predsschema and writes once per fold (runner.py:74).
The runner does NOT route through stages/run_predict.run_predict() because that
function has blind-overwrite semantics (run_predict.py:334): K sequential calls
would destroy K-1 init_dates on disk, leaving only the last 1/K of the rows.
The rationale is documented at runner.py:36–49.
Production fit¶
After all CV folds, _run_production_fit_phase calls run_experiment with
fold_label="production" and train_data = fit_df (all available years). The
production HindcastSlice:
- Is not in
ExperimentResult.hindcast_slices(the CV tuple); accessed viaExperimentResult.production. - Is required before any
ForecastSlicecan delegate trained artefacts viaForecastSlice.training. - Has
obs_yield_kg_ha = NaNinwalk_forward_preds(no ground-truth at prediction time).
The fit_production fast-path (run_hindcast.py:229) runs only the production fit —
no walk-forward CV, no postprocess, no evaluate, no deliver. It is the entry point
when only in_sample_pooled calibration is required.
Key invariants¶
- Training data for fold
kcontains only years< k— strict filter atexperiment_protocol.py:140. - Walk-forward rolling predictions are accumulated in memory and written to disk once
per fold (
runner.py:74). run_experimentfits the model and writestrain_preds.parquetbut does NOT run the per-init-date prediction sweep — that is_predict_fold_rolling.- DESIGN.md line 100: "no in-memory state crosses a stage boundary." The runner loop is single-stage; cross-stage handoff happens via disk.
How it interacts with the pipeline¶
Walk-forward CV is entirely a hindcast concern; the forecast pipeline does not iterate
folds. Artefacts from each CV fold are encapsulated in HindcastSlice.
The complete set forms the ExperimentResult aggregate root.
After all folds, postprocess_experiment pools CV residuals for conformal calibration
(fit_and_save_all_configured, run_meta_models.py:85).
See the hindcast pipeline (forward ref: ../pipelines/hindcast.md) for the end-to-end walkthrough.
Pitfalls and historical bugs¶
Blind-overwrite in write_walk_forward_outputs: run_predict.write_walk_forward_outputs
replaces the entire parquet on each call (run_predict.py:334). The runner avoids
routing through it for walk-forward precisely to prevent K sequential calls leaving only
the last init_date on disk (runner.py:36–49).
Related entities and concepts¶
HindcastSlice— per-fold artefact handleExperimentResult— discovers CV slices on loadconformal_modes.md— how CV residuals feed calibrationhindcast_vs_forecast.md— hindcast/forecast separation
PRs and commits¶
No single PR introduced walk-forward CV; the mechanism predates the PR history captured
here. runner.py and experiment_protocol.py were co-located in the restructure
surrounding the captured PRs.
Open questions¶
ExpandingFoldGeneratoris the only concrete implementation; fixed-window variants could be valuable but are not currently registered.ExpandingFoldGeneratoryieldsreferences_foldper fold; its schema and contract ({spec.name: DataFrame}) are not documented in the entity pages.