HindcastSlice¶
Definition¶
HindcastSlice is the per-fold artefact handle inside the commodity_hindcast pipeline. It is a frozen dataclass that holds all on-disk paths for one walk-forward cross-validation fold (or the production fold), and exposes typed loaders for every artefact relevant to that fold. It satisfies the AbstractSlice protocol alongside ForecastSlice.
Kind: Frozen dataclass (@dataclass(frozen=True, kw_only=True)). Aggregate root within ExperimentResult; not independently addressable without its run_dir.
Source of truth: market_insights_models/src/commodity_hindcast/lib/results/results_slice.py:112.
Key attributes¶
Identity fields¶
| Field | Type | Notes |
|---|---|---|
run_dir |
Path \| CloudPath |
Experiment root; primary identity anchor |
fold_label |
str |
Either a digit string (e.g. "2020") or "production" |
Path fields (all Path | CloudPath)¶
| Field | On-disk location | Written by |
|---|---|---|
train_preds_path |
preds/{key}/{fold}/train_preds.parquet |
FIT stage (run_fit.py) |
model_path |
models/{key}/{fold}/ (directory) |
FIT stage |
detrender_path |
models/{key}/{fold}/detrender.pkl |
FIT stage |
feature_fill_values_path |
models/{key}/{fold}/feature_fill_values.parquet |
FIT stage |
year_data_path |
preds/{key}/{fold}/year_data.parquet |
PREDICT stage |
walk_forward_preds_path |
preds/{key}/{fold}/walk_forward_preds.parquet |
PREDICT stage |
bias_corrector_path |
postprocessed/{key}/{fold}/bias_corrector.pkl |
POSTPROCESS stage |
Computed properties¶
| Property | Returns | Notes |
|---|---|---|
cutoff |
date |
date(int(fold_label), 1, 1) for numeric; date(feature_end_year+1, 1, 1) for production (results_slice.py:151) |
features_fit_path |
Path \| CloudPath |
{features_dir}/{key}/fit.parquet — shared across all folds (results_slice.py:164) |
features_pred_path |
Path \| CloudPath |
{features_dir}/{key}/pred.parquet — shared across all folds (results_slice.py:169) |
has_bias_corrector |
bool |
Existence check on bias_corrector_path (results_slice.py:174) |
Note: feature parquets are shared across all slices in a run — they live in cfg.features_dir/{key}/, not under run_dir/. Only trained artefacts (model, detrender, fill values) are fold-unique.
Loaders¶
| Method | Returns | Notes |
|---|---|---|
load_train_preds() |
pd.DataFrame |
Wide training prediction frame |
load_walk_forward_preds() |
pd.DataFrame |
9-column rolling predictions over all init_dates for the fold year |
load_year_data() |
pd.DataFrame |
Raw per-init prediction slice for diagnostics |
load_model() |
AbstractRegressionImpl |
Probes PcaRidgeRegressor, RidgeRegressor, XGBRegressor in sequence; raises RuntimeError if none succeeds (results_slice.py:182) |
load_detrender(config) |
AbstractDetrend |
Dispatches on config.model.detrend key; raises ValueError on unknown key (results_slice.py:243) |
load_feature_fill_values() |
pd.Series |
Indexed by feature name; fill values fitted at FIT time |
load_test_slice(fold_year) |
pd.DataFrame |
End-of-season rows (latest init_date) for the given fold year; used by diagnostics and benchmark grid |
Factory¶
results_slice.py:132. Used by stage code (FIT, PREDICT) that has the resolved ExperimentConfig in scope. from_run_dir-discovery in ExperimentResult builds slices directly without calling from_config.
Lifecycle¶
Created: Returned by run_fit.train(...) after artefacts are written (run_fit.py → HindcastSlice.from_config). Also discovered lazily by ExperimentResult.from_run_dir.
Populated:
1. train_preds_path, model_path, detrender_path, feature_fill_values_path — written atomically by FIT stage.
2. walk_forward_preds_path, year_data_path — written atomically by PREDICT stage.
3. bias_corrector_path — written by POSTPROCESS stage (optional; guarded by has_bias_corrector).
Consumed:
- PREDICT loads detrender, model, feature_fill_values.
- POSTPROCESS loads walk_forward_preds.
- EVALUATE reads predictions via load_walk_forward_preds, load_test_slice.
- DELIVER reads predictions via load_walk_forward_preds.
- FORECAST delegates trained artefacts from the production slice via ForecastSlice.training.
Destroyed: Never explicitly; artefacts are overwritten on stage re-run (atomic overwrite semantics).
Walk-forward preds schema¶
The walk_forward_preds.parquet written by PREDICT has 8 columns (from results_slice.py:219–226):
geo_identifier, year, init_date, sim_yield_kg_ha,
sim_yield_kg_ha_detrended, obs_yield_kg_ha, area_harvested_ha, crop_type
All yield values are in kg/ha — delivery-unit conversion happens only at the delivery/ boundary.
Production fold¶
The production HindcastSlice (fold_label="production") is special:
- Trained on all available data (no holdout).
- Its
obs_yield_kg_hacolumn inwalk_forward_predsisNaN(no ground-truth available at prediction time). - Not included in
ExperimentResult.hindcast_slices; accessed viaExperimentResult.productionor theproduction_hindcast_slicehelper (results_slice.py:271). - Required to exist before any
ForecastSlicecan access trained artefacts.
Relationships¶
| Relationship | Entity | Notes |
|---|---|---|
| Child of | ExperimentResult |
Not independently addressable; run_dir IS the identity anchor |
| Implements | AbstractSlice protocol |
Symmetric surface shared with ForecastSlice |
| Delegates to | Nothing | All artefacts are owned by this slice (contrast: ForecastSlice delegates to this) |
| Depended on by | ForecastSlice |
Production fold provides trained artefacts via .training |
| Carries | Detrender |
Loaded from detrender.pkl |
| Carries | Regressor |
Loaded from model.* |
Concepts and pipelines (forward refs to P5)¶
- Concept: Walk-forward CV — fold generation and the
"production"label - Concept: AbstractSlice protocol — symmetric surface with ForecastSlice
- Pipeline: Hindcast pipeline — FIT → PREDICT stages produce slice artefacts
PRs and commits¶
| PR / commit | Relevance |
|---|---|
| PR-339 | Moved HindcastSlice from steps/experiment_result.py to lib/results/results_slice.py; broke the import cycle by co-locating with ForecastSlice in the same module |
Open questions¶
load_model()tries all three regressor classes in sequence without consulting the config. An earlier version consultedconfig.model.regression; the probe-all approach was adopted to handle heterogeneous run directories. The trade-off is that a corrupted model file silently tries the next class before raising.load_test_sliceassumes "end-of-season" means the latestinit_daterow; this is correct for current usage but could be wrong if a fold has init_dates beyond harvest (e.g. long-range wheat folds).