Skip to content

Relationships — commodity_hindcast

Naming convention

Each relationship is recorded as: Source-Entity --verb--> Target-Entity (cardinality) with a short description and a code citation. Cardinality notation:

  • 1:1 — one-to-one
  • 1:N — one-to-many
  • 0..1:1 — optional on the source side
  • 0..N — zero-or-more on the source side
  • N:1 — many-to-one (reference)

Configuration relationships

  • ExperimentConfig --has-one--> CommodityConfig (1:1) — the experiment is parameterised by exactly one commodity spec; CommodityConfig is a nested required field. config.py:639

  • ExperimentConfig --has-one--> ModelConfig (1:1) — detrend strategy and regression estimator; required with a default. config.py:647

  • ExperimentConfig --has-one--> ExperimentProtocolConfig (1:1) — CV strategy and test-year list; required. config.py:644

  • ExperimentConfig --has-one--> PostprocessConfig (1:1) — bias-corrector and conformal-mode selection; required. config.py:657

  • ExperimentConfig --has-one--> DeliveryConfig (1:1) — CI levels, public model name, narrowing and frozen-tail options; required. config.py:663

  • ExperimentConfig --has-one--> EvaluationConfig (0..1:1) — WASDE benchmark path used only by EVALUATE; embedded inside ExperimentConfig via field. domain-modelling/schema.yaml:308

  • ExperimentConfig --has-many--> ReferenceYieldSpec (0..N) — ordered list of reference-yield specs; empty list is the no-reference sentinel. config.py:654

  • ExperimentConfig --has-one--> ForecastConfig (0..1:1) — optional; None means hindcast-only mode, set means forecast mode. config.py:666

  • PostprocessConfig --has-one--> BiasCorrectorConfig (1:1) — bias-corrector kind and lookback window embedded inside PostprocessConfig. config.py:523

  • CommodityConfig --has-many--> Builder (1:N) — dict of named builders; type discriminator auto-injected from dict key. config.py:337

  • CommodityConfig --has-many--> SeasonWindow (1:N) — separate lists for climo windows and weather windows, each expressing season-DOY aggregation bounds. config.py:311

  • YieldsBuilder --has-many--> EditRuleConfig (0..N) — declarative Fellegi-Holt edits applied before the yields pivot; validated as a discriminated union at YAML parse time. config.py:197

  • ModelConfig --references--> FitAggregationPolicy (derived, 1:1)weather_correction_fit_level on ModelConfig drives the FitAggregationPolicy constructed by ExperimentConfig.build_fit_aggregation_policy(). config.py:707

  • ReferenceYieldSpec --resolves-to--> ReferenceYieldLoader (N:1, polymorphic) — each spec subclass is registered against a concrete loader; dispatch via ReferenceYieldLoader.from_spec(). lib/reference_data/base_reference_yield_loader.py:97,101

Pipeline relationships

  • ExperimentConfig --produces--> ExperimentResult (1:1 per run)run_hindcast.run() creates a timestamped run_root, writes config_resolved.yaml, and the subsequent ExperimentResult.from_run_dir() call discovers slices. stages/run_hindcast.py:77,91

  • ExperimentResult --composes--> HindcastSlice (1:N) — frozen tuple of per-fold handles; discovered from preds/{commodity}/{fold}/train_preds.parquet on disk. lib/results/run_result.py:65,76

  • ExperimentResult --composes--> ForecastSlice (0..N) — discovered from forecast/{season_year}/{init_date}/preds/walk_forward_preds.parquet. lib/results/run_result.py:96,108

  • ExperimentResult --exposes--> HindcastSlice["production"] (0..1:1) — the production property returns the no-holdout fold via production_hindcast_slice(); None before fit_production runs. lib/results/run_result.py:175

  • Hindcast (run) --produces-many--> HindcastSlice (1:N)run_walk_forward() generates one slice per test year plus one "production" slice. run/runner.py:52

  • Forecast (run) --produces-many--> ForecastSlice (1:N)run_forecast.run() creates one ForecastSlice per (season_year, init_date) pair. stages/run_forecast.py:80

  • ForecastSlice --reuses--> HindcastSlice["production"] (N:1)ForecastSlice.training calls production_hindcast_slice() to reach the production model, detrender and fill values; cannot exist without production. lib/results/results_slice.py:410

  • HindcastSlice --persisted-in--> RunDir/models/{commodity}/{fold}/ (1:1)model_path, detrender_path, feature_fill_values_path all point under this directory. lib/results/results_slice.py:134

  • HindcastSlice --persisted-in--> RunDir/preds/{commodity}/{fold}/ (1:1)train_preds_path, walk_forward_preds_path, year_data_path point under this directory. lib/results/results_slice.py:140

  • ForecastSlice --persisted-in--> RunDir/forecast/{season_year}/{init_date}/ (1:1)root property resolves the per-init subtree; all forecast artefacts sit under it. lib/results/results_slice.py:324

  • HindcastSlice --has-one--> bias_corrector.pkl (0..1:1) — optional; path from _bias_corrector_path(); has_bias_corrector existence-checks before loading. lib/results/results_slice.py:41,173

  • ForecastSlice --has-one--> bias_corrector.pkl (0..1:1) — postprocessed at forecast time, lives under ForecastSlice.postprocessed_dir. lib/results/results_slice.py:442

  • CalibrationResult --calibrates--> ExperimentResult (N:1)primary_calibration(experiment, ci_levels) reads all fold residuals from the ExperimentResult's hindcast slices to compute conformal half-widths. stages/run_forecast.py:343

  • ExperimentResult --produces--> HindcastDelivery (1:N)deliver_experiment() calls walk_forward_preds_to_delivery_rows() for each of the three ADM levels, producing one HindcastDelivery per level. stages/run_hindcast.py:224

Reference-data relationships

  • ExperimentConfig --has-many--> ReferenceYieldSpec (0..N)reference_data list; each spec carries a unique name used as a metric/column prefix. config.py:654

  • ReferenceYieldSpec --is-a--> _ReferenceYieldSpecBase — all three concrete spec classes (WasdeRefSpec, ConabFinalRefSpec, ConabLevantamentoRefSpec) subclass the frozen pydantic base. lib/reference_data/base_reference_yield_loader.py:36

  • WasdeRefSpec --registered-to--> WasdeLoader (1:1) — explicit registry entry; ReferenceYieldLoader.from_spec() dispatches to WasdeLoader for this spec type. lib/reference_data/loader.py:51

  • ConabFinalRefSpec --registered-to--> ConabFinalLoader (1:1) — registered at module import. lib/reference_data/loader.py:52

  • ConabLevantamentoRefSpec --registered-to--> ConabLevantamentoLoader (1:1) — registered at module import. lib/reference_data/loader.py:53

  • ReferenceYieldLoader --loads--> Yield (N:M) — each loader reads and returns yield data (kg/ha) keyed by (harvest_year, geo_identifier) for benchmarking. lib/reference_data/base_reference_yield_loader.py:77

  • DeliveryRow --references--> NASS benchmarks (N:1 by ID)nass_actual, nass_actual_area_weighted_all, nass_actual_prod_div_area_all fields populated at delivery time. delivery/schemas.py:140

  • DeliveryRow --references--> WASDE/CONAB benchmark (N:1 by ID)wasde_in_season, conab_final_in_season, conab_lev_in_season optional fields populated from the corresponding ReferenceYieldLoader. delivery/schemas.py:143

Feature-assembly relationships

  • CommodityConfig --drives--> FeatureBuilder (1:N) — each entry in CommodityConfig.builders instantiates a concrete builder module in features/builders/. config.py:337

  • FeatureBuilder --reads--> Yield (N:M) — YieldsBuilder reads the NASS parquet and produces rows keyed by (year, geo_identifier, init_date). config.py:187

  • FeatureBuilder --reads--> fit.parquet/pred.parquet (1:1 per run) — each builder writes to features_dir/{commodity}/builders/{name}.parquet; assemble inner/left-joins into fit and pred. domain-modelling/DOMAIN_MODEL2.md §8.1

  • YieldsBuilder --applies--> EditRuleConfig (0..N, ordered) — edits applied sequentially in YAML order; each rule produces or consumes from EditReport. lib/edit_and_imputation/edit.py:383

  • EditRuleConfig --produces--> EditReport (1:1 per rule fire) — fire counts and boolean flag frame accumulated across sequential rule application. lib/edit_and_imputation/edit.py:370

  • ExperimentConfig --gates-via--> Check (1:N, per stage)run_preflight() runs a list of Check VOs before each stage; critical failure raises SystemExit. run/preflight.py:42

Delivery relationships

  • HindcastDelivery --composes--> DeliveryRow (1:N)rows: list[DeliveryRow]; no duplicates on (year, init_date, geo_identifier). delivery/schemas.py:236

  • HindcastSlice --aggregates-to--> DeliveryRow (N:M, per ADM level)walk_forward_preds_to_delivery_rows() loads non-production fold predictions and converts them to DeliveryRows at each ADM level. domain-modelling/DOMAIN_MODEL2.md §8.4

  • ForecastSlice --aggregates-to--> DeliveryRow (N:M, per ADM level)_deliver_forecast() calls walk_forward_preds_to_delivery_rows(mode="forecast") and writes one CSV per level. stages/run_forecast.py:357

  • DeliveryRow --persisted-in--> delivery/Treefera_{commodity}{level}_Hindcast.csv (N:1) — three files per run (ADM0, ADM1, ADM2); written by deliver_experiment(). stages/run_hindcast.py:224

  • DeliveryRow --persisted-in--> forecast/{season_year}/{init_date}/delivery/*.csv (N:1) — one file per ADM level; written by _deliver_forecast(). stages/run_forecast.py:398

  • FoldSchedule --drives--> DeliveryRow (N:M, dashboard only) — the Streamlit dashboard uses FoldSchedule to map fold labels to init_dates for display; does not affect pipeline delivery. app/_dashboard_config.py:199

Behavioural-role relationships

  • ModelConfig --instantiates--> Detrender (1:1, per fold)ExperimentConfig.build_detrender() dispatches on model.detrend literal to the concrete AbstractDetrend subclass. config.py:682

  • ModelConfig --instantiates--> Regressor (1:1, per fold)ExperimentConfig.build_regressor() dispatches on model.regression literal to the concrete AbstractRegressionImpl subclass. config.py:695

  • BiasCorrectorConfig --instantiates--> MetaModel (1:1, per fold)build_bias_corrector(config.postprocess.bias_corrector) dispatches on kind to NoBiasCorrector or CoverageBiasCorrector. stages/run_forecast.py:339

  • HindcastSlice --loads--> Detrender (1:1)load_detrender(config) uses model.detrend to select the class and calls cls.load(detrender_path, config). lib/results/results_slice.py:243

  • HindcastSlice --loads--> Regressor (1:1)load_model() tries each concrete AbstractRegressionImpl class against model_path until one succeeds. lib/results/results_slice.py:182

  • ForecastSlice --delegates-to--> HindcastSlice["production"] for trained artefactsload_model(), load_detrender(), load_feature_fill_values() all proxy to self.training. lib/results/results_slice.py:427

  • AbstractSlice --is-a (protocol)-- HindcastSliceisinstance(slice, AbstractSlice) is true at runtime; HindcastSlice satisfies the runtime-checkable protocol. lib/results/results_slice.py:72

  • AbstractSlice --is-a (protocol)-- ForecastSlice — ForecastSlice also satisfies AbstractSlice; shared surface enables downstream consumers to handle both uniformly. lib/results/results_slice.py:72

  • ExperimentProtocolConfig --drives--> Fold (1:N)ExpandingFoldGenerator consumes test_years from ExperimentProtocolConfig to yield one fold per test year. run/experiment_protocol.py (referenced in run/runner.py:21)

  • Fold --produces--> HindcastSlice (1:1) — each fold label corresponds to exactly one HindcastSlice written during walk-forward. run/runner.py:63

Cross-cutting relationships

  • ExperimentResult --tagged-in--> MLflow Run (1:1) — one MLflow run per pipeline invocation; run_dir is stored as a tag. domain-modelling/DOMAIN_MODEL2.md §3

  • ExperimentConfig --preflight-checked-by--> Check (1:N, per stage)preflight_paths_for_hindcast(), preflight_paths_for_forecast_features(), etc. emit Check lists from ExperimentConfig. run/preflight.py:59

  • Region --referenced-by--> geo_identifier (N:M, via NewType)GeoIdentifier is a NewType("GeoIdentifier", str) alias; no standalone entity class. domain-modelling/DOMAIN_MODEL.md §3.1

  • Yield --keyed-by--> (year, geo_identifier, init_date) (N:M) — every feature parquet, prediction parquet and delivery row uses this triple as the canonical join key. domain-modelling/DOMAIN_MODEL.md §1

  • ExperimentConfig --resolves--> ResolvablePath (1:N)_iter_resolvable_fields() walks all nested config fields carrying ResolvablePath annotations and resolves them against data_root at load time. config.py:785

  • ExperimentResult --persists--> included_geo_identifiers (1:1)save_included_geo_identifiers() writes the top-N county frozenset to run_dir/included_geo_identifiers.txt during FIT; load_included_geo_identifiers() reads it back during PREDICT. lib/results/run_result.py:157,164

  • Detrender --transforms--> Yield (N:M) — every Detrender implementation removes temporal trend from the training yield panel before the Regressor fits residuals. Referenced via AbstractDetrend protocol used in run/runner.py:116.

  • Regressor --fits--> Yield residuals (N:M)AbstractRegressionImpl implementations (RidgeRegressor, PcaRidgeRegressor, XGBRegressor) fit on detrended residuals and score against feature columns. lib/results/results_slice.py:182

Artefact-schema relationships

  • HindcastSlice --writes--> walk_forward_preds.parquet (1:1 per fold) — schema: (geo_identifier, year, init_date, sim_yield_kg_ha, sim_yield_kg_ha_detrended, obs_yield_kg_ha, area_harvested_ha, crop_type). run/runner.py:74

  • HindcastSlice --writes--> train_preds.parquet (1:1 per fold) — training-set predictions used for bias correction and conformal calibration. run/runner.py and run/experiment_protocol.py

  • HindcastSlice --writes--> year_data.parquet (1:1 per fold) — raw pre-simulation feature snapshot for (year, init_date) — diagnostic output preserved per fold. run/runner.py:71

  • ForecastSlice --writes--> indices.zarr (1:1) — daily spliced observed-plus-climatology weather indices produced by materialise_forecast_indices(). stages/run_forecast.py:152

  • ForecastSlice --writes--> features/pred.parquet (1:1) — per-init forecast features; independent from the canonical run-level pred.parquet. stages/run_forecast.py:252

  • ForecastSlice --writes--> postprocessed/national.parquet (1:1) — national aggregated postprocessed frame with bias correction and conformal CIs applied. stages/run_forecast.py:347

  • ExperimentResult --reads--> features_dir/{commodity}/fit.parquet (N:1) — run-level fit feature matrix, shared across all HindcastSlices; resolved via _features_path(run_dir, "fit"). lib/results/results_slice.py:60

  • ExperimentResult --reads--> features_dir/{commodity}/pred.parquet (N:1) — run-level pred feature matrix, shared across all HindcastSlices; resolved via _features_path(run_dir, "pred"). lib/results/results_slice.py:60

  • metadata.json --describes--> fit.parquet/pred.parquet (1:1 each) — carries index_cols, feature_cols, target_col; written by features/assemble.py alongside each parquet. domain-modelling/DOMAIN_MODEL2.md §8.1

Temporal-vocabulary relationships

  • SeasonYear --partitioned-by--> InitDate (1:N) — a single crop year has many in-season init_dates; CommodityConfig.hindcast_init_dates(season_year) returns the full weekly grid. config.py:369

  • InitDate --derived-from--> SeasonDOY (N:1)CommodityConfig.to_date(season_doy, season_year) converts a season DOY to a calendar date; to_season_doy() inverts. config.py:363,366

  • SeasonWindow --aggregates--> feature columns (1:N) — each SeasonWindow defines (sdoy_start, sdoy_end) bounds for weather and climo accumulation; multiple feature columns per window depending on weather_vars / climo_zscore_vars. config.py:267,311

  • Fold --identified-by--> fold_label (1:1) — fold_label is the string filesystem key; numeric labels ("2020") for walk-forward folds, literal "production" for the no-holdout fit. Cutoff is date(int(fold_label), 1, 1) for numeric folds. lib/results/results_slice.py:151

Summary counts

Section Relationship count
Configuration 14
Pipeline 14
Reference-data 8
Feature-assembly 6
Delivery 6
Behavioural-role 8
Cross-cutting 5
Artefact-schema 8
Temporal-vocabulary 4
Total 73