commodity_hindcast — Overview¶
What it does¶
commodity_hindcast produces within-season yield forecasts for five agricultural commodities — US corn, US soybeans, US cotton, US wheat, and Brazilian soybeans — at three geographic admin levels (ADM0 country, ADM1 state, ADM2 county). The package operates in two modes that share a single artefact tree: hindcast mode runs a walk-forward cross-validation loop across past seasons, producing the audit-grade time series delivered to clients; forecast mode issues a point-in-time prediction for a specific (season_year, init_date) pair, reusing the production model fitted by the hindcast run. Every run is reproducible: configuration is validated atomically at load time, every artefact is written to a timestamped run_dir under INPUT_DATA_DIR/runs/, and MLflow tracks params and artefacts against each invocation.
Outputs are client-facing CSVs — three per run, one per ADM level — carrying predicted mean yield, conformal prediction intervals at five coverage levels (50/68/80/90/95), and benchmark columns sourced from NASS, WASDE, and CONAB. The schema is validated row-by-row by DeliveryRow on construction so corrupt deliveries are caught before they leave the pipeline.
How it does it (10-line tour)¶
- Config load —
ExperimentConfig(pydantic-settings) validates the full nested config atomically. AllResolvablePathfields resolve againstINPUT_DATA_DIR. See ExperimentConfig. - Preflight — stage-specific
preflight_paths_for_<stage>()checks abort the run before any compute if required inputs are absent. See Pipeline: preflight. - Features — five builder functions (yields, weather, climo, NDVI, stress) produce
fit.parquetandpred.parquetkeyed by(year, geo_identifier, init_date). See Pipeline: feature_build. - FIT (hindcast walk-forward) — an
ExpandingFoldGeneratorwalkstest_years; each fold fits aDetrender+Regressorand writes predictions topreds/{commodity}/{fold_label}/. The production fold is a separate no-holdout fit on all data. See Pipeline: fit. - PREDICT — fold models score
pred.parquetto producewalk_forward_preds.parquetper fold. See Pipeline: predict. - POSTPROCESS — per-fold bias correctors (NASS coverage gap) and conformal calibration produce
postprocessed/{commodity}_national.parquetand per-modeconformal/{mode}.parquetsidecars. See Pipeline: postprocess. - EVALUATE — metrics and plots are written to
reports/and logged as MLflow artefacts. See Pipeline: evaluate. - DELIVER —
HindcastDelivery/DeliveryRowassemble the three client CSVs per ADM level, converting internalkg/hatobu/acorlbs/ac. See Pipeline: deliver. - FORECAST (sibling) —
run_forecast.run()builds weather-spliced features, scores with the production model, postprocesses, and delivers toforecast/{season_year}/{init_date}/delivery/. See Pipeline: forecast. - Dashboard — a Streamlit app reads run-dir CSVs directly at startup; no API layer exists between the app and the artefact tree. See Pipeline: dashboard.
Architecture at a glance¶
flowchart LR
CONFIG[EXPERIMENT_CONFIG\npydantic-settings root]
PREFLIGHT[PREFLIGHT\nCheck list gate]
FEATURES[FEATURE_BUILD\nfit + pred parquets]
FIT[FIT\nDetrender + Regressor\nper fold]
PREDICT[PREDICT\nwalk_forward_preds\nper fold]
POSTPROCESS[POSTPROCESS\nbias_corrector + conformal\nnational.parquet]
EVALUATE[EVALUATE\nmetrics + plots]
DELIVER[DELIVER\nDeliveryRow CSVs]
FORECAST[FORECAST\nweather splice\nper season_year init_date]
RUNDIR[(RUN_DIR\nartefact tree)]
CONFIG --> PREFLIGHT
CONFIG --> FEATURES
PREFLIGHT --> FEATURES
FEATURES --> FIT
FIT --> PREDICT
PREDICT --> POSTPROCESS
POSTPROCESS --> EVALUATE
POSTPROCESS --> DELIVER
RUNDIR -. read-only .-> FORECAST
FORECAST --> RUNDIR
FIT --> RUNDIR
PREDICT --> RUNDIR
POSTPROCESS --> RUNDIR
DELIVER --> RUNDIR
The five aggregates¶
| Aggregate root | Owned children | Persistence | Entity page |
|---|---|---|---|
ExperimentConfig |
CommodityConfig, ModelConfig, ExperimentProtocolConfig, PostprocessConfig, DeliveryConfig, ForecastConfig (opt), list[ReferenceYieldSpec] |
run_dir/config_resolved.yaml |
ExperimentConfig |
ExperimentResult |
tuple[HindcastSlice, …], tuple[ForecastSlice, …] |
Filesystem tree at run_dir_base/<timestamp>_<name>/ |
ExperimentResult |
HindcastSlice |
detrender.pkl, model.*, feature_fill_values.parquet, train_preds.parquet, walk_forward_preds.parquet, year_data.parquet, optional bias_corrector.pkl |
run_dir/models/{commodity}/{fold_label}/ and run_dir/preds/{commodity}/{fold_label}/ |
HindcastSlice |
ForecastSlice |
indices.zarr, features/pred.parquet, walk_forward_preds.parquet, postprocessed/national.parquet, delivery/*.csv |
run_dir/forecast/{season_year}/{init_date}/ |
ForecastSlice |
HindcastDelivery |
list[DeliveryRow], generated_date |
delivery/Treefera_{commodity}_{ADM}_Hindcast_{YYYYMMDD}.csv (three files per run) |
HindcastDelivery |
Two further aggregates exist: ExperimentProtocolConfig + Fold schedule (the walk-forward CV schedule embedded in ExperimentConfig, persisted only via config_resolved.yaml) and Check list (ephemeral per-preflight-call gate, never persisted). See AGGREGATES.md.
The eleven bounded contexts¶
| # | Context | Subpackage(s) | Public surface |
|---|---|---|---|
| 1 | Configuration & Orchestration | config.py, cli.py, configs/*.yaml, lib/path_utils.py, lib/calendar.py |
ExperimentConfig, CommodityConfig, cli Click group |
| 2 | Preflight | run/preflight.py |
Check, run_preflight(), preflight_paths_for_* |
| 3 | Feature Engineering | features/, lib/edit_and_imputation/, lib/calendar.py |
build_features(), Builder protocol, assemble() |
| 4 | Experiment & Modelling | run/, stages/run_fit.py, stages/run_hindcast.py, stages/run_predict.py, models/detrend/, models/regression/, lib/results/ |
ExperimentResult, HindcastSlice, train(), run() |
| 5 | Post-processing | stages/run_meta_models.py, models/meta_models/ |
postprocess_experiment(), AbstractBiasCorrector, conformal helpers |
| 6 | Evaluation & Diagnostics | stages/run_diagnostics.py, diagnostics/ |
evaluate_experiment(), PlotGroup, PLOT_REGISTRY, gen_metrics |
| 7 | Delivery | delivery/, stages/run_deliver.py |
HindcastDelivery, DeliveryRow, deliver_experiment() |
| 8 | Forecast | stages/run_forecast.py, features/forecast_weather.py, features/forecast_long_range_stub.py |
ForecastSlice, run(), materialise_forecast_indices() |
| 9 | Experiment Tracking | lib/tracking/ |
MLflow run helpers, metadata_<stage>.yaml side-channel |
| 10a | Reference Data | lib/reference_data/ |
load_nass_panel(), load_wasde(), load_conab(), BaseReferenceYieldLoader |
| 10b | Geo & Identifiers | lib/geo/, delivery/geo_normalise.py |
GeoIdentifier NewType, make_geo_identifier(), area_weighted_mean() |
| 11 | Dashboard | app/ |
app.py Streamlit entry point, run_loader.py |
See BOUNDED_CONTEXTS.md for the Mermaid context map and full anti-corruption layer analysis.
What is distinctive about this package¶
Atomic config validation at load time. ExperimentConfig inherits pydantic_settings.BaseSettings and is the single config authority for every stage. All nested config blocks — commodity calendar, model selection, postprocess settings, delivery CI levels, forecast paths — are validated by chained pydantic model validators before any compute begins. Every ResolvablePath field anywhere in the nested tree is resolved against data_root (set from INPUT_DATA_DIR) by a single model_validator(mode="after"). A missing env var, an invalid CI level, or a mis-named ReferenceYieldSpec raises a RuntimeError or ValidationError at load time, never mid-run. This design choice is explained at length in AGGREGATES.md — the rationale for treating the whole config as one aggregate rather than seven.
ResolvablePath + AnyPath — transparent S3/local path handling. Every data path in the config is typed as ResolvablePath (a Pydantic custom type resolved at load time) and every downstream consumer uses cloudpathlib.AnyPath so that S3 URIs and local paths are handled identically. Wrapping an S3Path in a bare pathlib.Path() collapses the URI to a local-cache path and drops S3 semantics — the root cause of the QA failure in PR #345. See Concept: s3_path_safety and Concept: input_data_dir_contract.
Walk-forward CV that mirrors production scoring exactly. The hindcast loop uses ExpandingFoldGenerator to produce one fold per test year. Within each fold the model is fit once on harvest-time data and then reused — unchanged — at every init_date in the season; only the feature matrix changes. The "production" fold is a no-holdout fit on all available data. This means OOS hindcast accuracy, conformal calibration, and production forecasting all use the identical code path and artefact schema, so the confidence intervals reported on the delivery CSVs are directly grounded in walk-forward residuals. See Concept: walk_forward_cv.
Multi-mode conformal calibration. PR #361 introduced CalibrationResult and four residual_mode values: hindcast_oos_per_init_date, hindcast_oos_per_year, hindcast_oos_fully_pooled, and in_sample_pooled. The first element of PostprocessConfig.conformalise is the primary mode used for delivery interval bands; additional elements are saved as diagnostic sidecars under run_dir/conformal/{mode}.parquet. The _OOS_MODES frozenset in run_meta_models.py controls which modes require CV-fold artefacts to be present — it is hand-maintained rather than derived from the ResidualMode Literal, which is a known drift risk. See Concept: conformal_modes and Concept: residual_modes.
Multi-year forecasting per init_date. PR #369 extended the forecast pipeline so that a single init_date can produce forecasts for multiple season_year values. The artefact tree was restructured from forecast/{init_date}/ to forecast/{season_year}/{init_date}/ to avoid path collisions. For season_year values beyond the climatology zarr's coverage the long-range stub (forecast_long_range_stub.py) fills with trailing medians, causing the forecast to collapse to trend-only output — this is a documented stub, not a complete implementation. See Pipeline: multi_year_forecast.
Key reading paths¶
As a newcomer¶
- This page
- domain_model/BOUNDED_CONTEXTS.md
- pipelines/forecast.md or pipelines/predict.md for depth
- wiki/sources/docs/DESIGN.md
- Then dig into specific entity pages
As an LLM ingesting a new source¶
- wiki/AGENTS.md
- domain_model/ENTITIES.md for the canonical name vocabulary
- The relevant pipeline page
- Append to index.md and log.md
As a maintainer making structural changes¶
- domain_model/AGGREGATES.md
- domain_model/BOUNDED_CONTEXTS.md
- wiki/concepts/s3_path_safety.md if touching paths
- wiki/concepts/input_data_dir_contract.md if touching configs
Open questions¶
marketing_yearvsseason_year— WASDE uses a marketing year (Oct–Sep for US grains) that is not fully collapsed intoseason_year. The Reference Data → Post-processing seam carries an implicit translation; surfaces in domain_model/BOUNDED_CONTEXTS.md open question 1.- Conformal helpers in
stages/vslib/—delivery/conversions.pyimports conformal half-width computation fromstages/run_meta_models.py, creating an upward layer edge from Delivery into Experiment orchestration. The correct home islib/; surfaces in domain_model/BOUNDED_CONTEXTS.md open question 2. included_geo_identifiersownership — computed in the FIT stage, persisted torun_dir/included_geo_identifiers.txt, and consumed as a required kwarg by three downstream contexts. Whether it should be a first-class property onExperimentResultis unresolved; surfaces in domain_model/BOUNDED_CONTEXTS.md open question 4._OOS_MODESfrozenset drift risk — the set of residual modes that require CV folds is hand-maintained instages/run_meta_models.pyrather than derived from theResidualModeLiteral. Adding a new mode without updating this frozenset would produce a silent incorrect result.- Long-range stub completeness — beyond the climatology zarr's coverage, forecasts collapse to trend-only. This is acknowledged as a stub in PR #369 but not tracked as a formal open issue. Surfaces in pipelines/multi_year_forecast.md.
CalibrationResultpersistence vs transience —AGGREGATES.mdmarks it transient;ENTITIES.mddocuments itssave/loadmethods. The discrepancy is between a conservative hedge and what the source code actually does; surfaces in domain_model/delta_vs_existing.md.build_detrender()/build_regressor()onExperimentConfig— factory methods on the config class are a known mis-placement flagged by a TODO atconfig.py:719. Surfaces in entities/ExperimentConfig.md open questions.- Forecast vs hindcast static mode boundary — mode is determined solely by whether
ExperimentConfig.forecastis set; a single run cannot produce both withoutrun all. Surfaces in domain_model/BOUNDED_CONTEXTS.md open question 3.
Cross-references¶
- domain_model/ — formal entity-relationship model (entities, aggregates, bounded contexts, ER diagrams)
- wiki/AGENTS.md — maintenance schema; read before writing any wiki page
- wiki/synthesis/thesis.md — editorial thesis on design choices and tensions
- wiki/sources/ — immutable source pages (code, configs, docs, PRs, commits)
- wiki/entities/ — domain entity pages
- wiki/concepts/ — cross-cutting concept pages
- wiki/pipelines/ — stage-by-stage pipeline walkthroughs