Relationships — commodity_hindcast¶

Naming convention¶

Each relationship is recorded as: Source-Entity --verb--> Target-Entity (cardinality) with a short description and a code citation. Cardinality notation:

1:1 — one-to-one
1:N — one-to-many
0..1:1 — optional on the source side
0..N — zero-or-more on the source side
N:1 — many-to-one (reference)

Configuration relationships¶

ExperimentConfig --has-one--> CommodityConfig (1:1) — the experiment is parameterised by exactly one commodity spec; CommodityConfig is a nested required field. config.py:639
ExperimentConfig --has-one--> ModelConfig (1:1) — detrend strategy and regression estimator; required with a default. config.py:647
ExperimentConfig --has-one--> ExperimentProtocolConfig (1:1) — CV strategy and test-year list; required. config.py:644
ExperimentConfig --has-one--> PostprocessConfig (1:1) — bias-corrector and conformal-mode selection; required. config.py:657
ExperimentConfig --has-one--> DeliveryConfig (1:1) — CI levels, public model name, narrowing and frozen-tail options; required. config.py:663
ExperimentConfig --has-one--> EvaluationConfig (0..1:1) — WASDE benchmark path used only by EVALUATE; embedded inside ExperimentConfig via field. domain-modelling/schema.yaml:308
ExperimentConfig --has-many--> ReferenceYieldSpec (0..N) — ordered list of reference-yield specs; empty list is the no-reference sentinel. config.py:654
ExperimentConfig --has-one--> ForecastConfig (0..1:1) — optional; None means hindcast-only mode, set means forecast mode. config.py:666
PostprocessConfig --has-one--> BiasCorrectorConfig (1:1) — bias-corrector kind and lookback window embedded inside PostprocessConfig. config.py:523
CommodityConfig --has-many--> Builder (1:N) — dict of named builders; type discriminator auto-injected from dict key. config.py:337
CommodityConfig --has-many--> SeasonWindow (1:N) — separate lists for climo windows and weather windows, each expressing season-DOY aggregation bounds. config.py:311
YieldsBuilder --has-many--> EditRuleConfig (0..N) — declarative Fellegi-Holt edits applied before the yields pivot; validated as a discriminated union at YAML parse time. config.py:197
ModelConfig --references--> FitAggregationPolicy (derived, 1:1) — weather_correction_fit_level on ModelConfig drives the FitAggregationPolicy constructed by ExperimentConfig.build_fit_aggregation_policy(). config.py:707
ReferenceYieldSpec --resolves-to--> ReferenceYieldLoader (N:1, polymorphic) — each spec subclass is registered against a concrete loader; dispatch via ReferenceYieldLoader.from_spec(). lib/reference_data/base_reference_yield_loader.py:97,101

Pipeline relationships¶

ExperimentConfig --produces--> ExperimentResult (1:1 per run) — run_hindcast.run() creates a timestamped run_root, writes config_resolved.yaml, and the subsequent ExperimentResult.from_run_dir() call discovers slices. stages/run_hindcast.py:77,91
ExperimentResult --composes--> HindcastSlice (1:N) — frozen tuple of per-fold handles; discovered from preds/{commodity}/{fold}/train_preds.parquet on disk. lib/results/run_result.py:65,76
ExperimentResult --composes--> ForecastSlice (0..N) — discovered from forecast/{season_year}/{init_date}/preds/walk_forward_preds.parquet. lib/results/run_result.py:96,108
ExperimentResult --exposes--> HindcastSlice["production"] (0..1:1) — the production property returns the no-holdout fold via production_hindcast_slice(); None before fit_production runs. lib/results/run_result.py:175
Hindcast (run) --produces-many--> HindcastSlice (1:N) — run_walk_forward() generates one slice per test year plus one "production" slice. run/runner.py:52
Forecast (run) --produces-many--> ForecastSlice (1:N) — run_forecast.run() creates one ForecastSlice per (season_year, init_date) pair. stages/run_forecast.py:80
ForecastSlice --reuses--> HindcastSlice["production"] (N:1) — ForecastSlice.training calls production_hindcast_slice() to reach the production model, detrender and fill values; cannot exist without production. lib/results/results_slice.py:410
HindcastSlice --persisted-in--> RunDir/models/{commodity}/{fold}/ (1:1) — model_path, detrender_path, feature_fill_values_path all point under this directory. lib/results/results_slice.py:134
HindcastSlice --persisted-in--> RunDir/preds/{commodity}/{fold}/ (1:1) — train_preds_path, walk_forward_preds_path, year_data_path point under this directory. lib/results/results_slice.py:140
ForecastSlice --persisted-in--> RunDir/forecast/{season_year}/{init_date}/ (1:1) — root property resolves the per-init subtree; all forecast artefacts sit under it. lib/results/results_slice.py:324
HindcastSlice --has-one--> bias_corrector.pkl (0..1:1) — optional; path from _bias_corrector_path(); has_bias_corrector existence-checks before loading. lib/results/results_slice.py:41,173
ForecastSlice --has-one--> bias_corrector.pkl (0..1:1) — postprocessed at forecast time, lives under ForecastSlice.postprocessed_dir. lib/results/results_slice.py:442
CalibrationResult --calibrates--> ExperimentResult (N:1) — primary_calibration(experiment, ci_levels) reads all fold residuals from the ExperimentResult's hindcast slices to compute conformal half-widths. stages/run_forecast.py:343
ExperimentResult --produces--> HindcastDelivery (1:N) — deliver_experiment() calls walk_forward_preds_to_delivery_rows() for each of the three ADM levels, producing one HindcastDelivery per level. stages/run_hindcast.py:224

Reference-data relationships¶

ExperimentConfig --has-many--> ReferenceYieldSpec (0..N) — reference_data list; each spec carries a unique name used as a metric/column prefix. config.py:654
ReferenceYieldSpec --is-a--> _ReferenceYieldSpecBase — all three concrete spec classes (WasdeRefSpec, ConabFinalRefSpec, ConabLevantamentoRefSpec) subclass the frozen pydantic base. lib/reference_data/base_reference_yield_loader.py:36
WasdeRefSpec --registered-to--> WasdeLoader (1:1) — explicit registry entry; ReferenceYieldLoader.from_spec() dispatches to WasdeLoader for this spec type. lib/reference_data/loader.py:51
ConabFinalRefSpec --registered-to--> ConabFinalLoader (1:1) — registered at module import. lib/reference_data/loader.py:52
ConabLevantamentoRefSpec --registered-to--> ConabLevantamentoLoader (1:1) — registered at module import. lib/reference_data/loader.py:53
ReferenceYieldLoader --loads--> Yield (N:M) — each loader reads and returns yield data (kg/ha) keyed by (harvest_year, geo_identifier) for benchmarking. lib/reference_data/base_reference_yield_loader.py:77
DeliveryRow --references--> NASS benchmarks (N:1 by ID) — nass_actual, nass_actual_area_weighted_all, nass_actual_prod_div_area_all fields populated at delivery time. delivery/schemas.py:140
DeliveryRow --references--> WASDE/CONAB benchmark (N:1 by ID) — wasde_in_season, conab_final_in_season, conab_lev_in_season optional fields populated from the corresponding ReferenceYieldLoader. delivery/schemas.py:143

Feature-assembly relationships¶

CommodityConfig --drives--> FeatureBuilder (1:N) — each entry in CommodityConfig.builders instantiates a concrete builder module in features/builders/. config.py:337
FeatureBuilder --reads--> Yield (N:M) — YieldsBuilder reads the NASS parquet and produces rows keyed by (year, geo_identifier, init_date). config.py:187
FeatureBuilder --reads--> fit.parquet/pred.parquet (1:1 per run) — each builder writes to features_dir/{commodity}/builders/{name}.parquet; assemble inner/left-joins into fit and pred. domain-modelling/DOMAIN_MODEL2.md §8.1
YieldsBuilder --applies--> EditRuleConfig (0..N, ordered) — edits applied sequentially in YAML order; each rule produces or consumes from EditReport. lib/edit_and_imputation/edit.py:383
EditRuleConfig --produces--> EditReport (1:1 per rule fire) — fire counts and boolean flag frame accumulated across sequential rule application. lib/edit_and_imputation/edit.py:370
ExperimentConfig --gates-via--> Check (1:N, per stage) — run_preflight() runs a list of Check VOs before each stage; critical failure raises SystemExit. run/preflight.py:42

Delivery relationships¶

HindcastDelivery --composes--> DeliveryRow (1:N) — rows: list[DeliveryRow]; no duplicates on (year, init_date, geo_identifier). delivery/schemas.py:236
HindcastSlice --aggregates-to--> DeliveryRow (N:M, per ADM level) — walk_forward_preds_to_delivery_rows() loads non-production fold predictions and converts them to DeliveryRows at each ADM level. domain-modelling/DOMAIN_MODEL2.md §8.4
ForecastSlice --aggregates-to--> DeliveryRow (N:M, per ADM level) — _deliver_forecast() calls walk_forward_preds_to_delivery_rows(mode="forecast") and writes one CSV per level. stages/run_forecast.py:357
DeliveryRow --persisted-in--> delivery/Treefera_{commodity}{level}_Hindcast.csv (N:1) — three files per run (ADM0, ADM1, ADM2); written by deliver_experiment(). stages/run_hindcast.py:224
DeliveryRow --persisted-in--> forecast/{season_year}/{init_date}/delivery/*.csv (N:1) — one file per ADM level; written by _deliver_forecast(). stages/run_forecast.py:398
FoldSchedule --drives--> DeliveryRow (N:M, dashboard only) — the Streamlit dashboard uses FoldSchedule to map fold labels to init_dates for display; does not affect pipeline delivery. app/_dashboard_config.py:199

Behavioural-role relationships¶

ModelConfig --instantiates--> Detrender (1:1, per fold) — ExperimentConfig.build_detrender() dispatches on model.detrend literal to the concrete AbstractDetrend subclass. config.py:682
ModelConfig --instantiates--> Regressor (1:1, per fold) — ExperimentConfig.build_regressor() dispatches on model.regression literal to the concrete AbstractRegressionImpl subclass. config.py:695
BiasCorrectorConfig --instantiates--> MetaModel (1:1, per fold) — build_bias_corrector(config.postprocess.bias_corrector) dispatches on kind to NoBiasCorrector or CoverageBiasCorrector. stages/run_forecast.py:339
HindcastSlice --loads--> Detrender (1:1) — load_detrender(config) uses model.detrend to select the class and calls cls.load(detrender_path, config). lib/results/results_slice.py:243
HindcastSlice --loads--> Regressor (1:1) — load_model() tries each concrete AbstractRegressionImpl class against model_path until one succeeds. lib/results/results_slice.py:182
ForecastSlice --delegates-to--> HindcastSlice["production"] for trained artefacts — load_model(), load_detrender(), load_feature_fill_values() all proxy to self.training. lib/results/results_slice.py:427
AbstractSlice --is-a (protocol)-- HindcastSlice — isinstance(slice, AbstractSlice) is true at runtime; HindcastSlice satisfies the runtime-checkable protocol. lib/results/results_slice.py:72
AbstractSlice --is-a (protocol)-- ForecastSlice — ForecastSlice also satisfies AbstractSlice; shared surface enables downstream consumers to handle both uniformly. lib/results/results_slice.py:72
ExperimentProtocolConfig --drives--> Fold (1:N) — ExpandingFoldGenerator consumes test_years from ExperimentProtocolConfig to yield one fold per test year. run/experiment_protocol.py (referenced in run/runner.py:21)
Fold --produces--> HindcastSlice (1:1) — each fold label corresponds to exactly one HindcastSlice written during walk-forward. run/runner.py:63

Cross-cutting relationships¶

ExperimentResult --tagged-in--> MLflow Run (1:1) — one MLflow run per pipeline invocation; run_dir is stored as a tag. domain-modelling/DOMAIN_MODEL2.md §3
ExperimentConfig --preflight-checked-by--> Check (1:N, per stage) — preflight_paths_for_hindcast(), preflight_paths_for_forecast_features(), etc. emit Check lists from ExperimentConfig. run/preflight.py:59
Region --referenced-by--> geo_identifier (N:M, via NewType) — GeoIdentifier is a NewType("GeoIdentifier", str) alias; no standalone entity class. domain-modelling/DOMAIN_MODEL.md §3.1
Yield --keyed-by--> (year, geo_identifier, init_date) (N:M) — every feature parquet, prediction parquet and delivery row uses this triple as the canonical join key. domain-modelling/DOMAIN_MODEL.md §1
ExperimentConfig --resolves--> ResolvablePath (1:N) — _iter_resolvable_fields() walks all nested config fields carrying ResolvablePath annotations and resolves them against data_root at load time. config.py:785
ExperimentResult --persists--> included_geo_identifiers (1:1) — save_included_geo_identifiers() writes the top-N county frozenset to run_dir/included_geo_identifiers.txt during FIT; load_included_geo_identifiers() reads it back during PREDICT. lib/results/run_result.py:157,164
Detrender --transforms--> Yield (N:M) — every Detrender implementation removes temporal trend from the training yield panel before the Regressor fits residuals. Referenced via AbstractDetrend protocol used in run/runner.py:116.
Regressor --fits--> Yield residuals (N:M) — AbstractRegressionImpl implementations (RidgeRegressor, PcaRidgeRegressor, XGBRegressor) fit on detrended residuals and score against feature columns. lib/results/results_slice.py:182

Artefact-schema relationships¶

HindcastSlice --writes--> walk_forward_preds.parquet (1:1 per fold) — schema: (geo_identifier, year, init_date, sim_yield_kg_ha, sim_yield_kg_ha_detrended, obs_yield_kg_ha, area_harvested_ha, crop_type). run/runner.py:74
HindcastSlice --writes--> train_preds.parquet (1:1 per fold) — training-set predictions used for bias correction and conformal calibration. run/runner.py and run/experiment_protocol.py
HindcastSlice --writes--> year_data.parquet (1:1 per fold) — raw pre-simulation feature snapshot for (year, init_date) — diagnostic output preserved per fold. run/runner.py:71
ForecastSlice --writes--> indices.zarr (1:1) — daily spliced observed-plus-climatology weather indices produced by materialise_forecast_indices(). stages/run_forecast.py:152
ForecastSlice --writes--> features/pred.parquet (1:1) — per-init forecast features; independent from the canonical run-level pred.parquet. stages/run_forecast.py:252
ForecastSlice --writes--> postprocessed/national.parquet (1:1) — national aggregated postprocessed frame with bias correction and conformal CIs applied. stages/run_forecast.py:347
ExperimentResult --reads--> features_dir/{commodity}/fit.parquet (N:1) — run-level fit feature matrix, shared across all HindcastSlices; resolved via _features_path(run_dir, "fit"). lib/results/results_slice.py:60
ExperimentResult --reads--> features_dir/{commodity}/pred.parquet (N:1) — run-level pred feature matrix, shared across all HindcastSlices; resolved via _features_path(run_dir, "pred"). lib/results/results_slice.py:60
metadata.json --describes--> fit.parquet/pred.parquet (1:1 each) — carries index_cols, feature_cols, target_col; written by features/assemble.py alongside each parquet. domain-modelling/DOMAIN_MODEL2.md §8.1

Temporal-vocabulary relationships¶

SeasonYear --partitioned-by--> InitDate (1:N) — a single crop year has many in-season init_dates; CommodityConfig.hindcast_init_dates(season_year) returns the full weekly grid. config.py:369
InitDate --derived-from--> SeasonDOY (N:1) — CommodityConfig.to_date(season_doy, season_year) converts a season DOY to a calendar date; to_season_doy() inverts. config.py:363,366
SeasonWindow --aggregates--> feature columns (1:N) — each SeasonWindow defines (sdoy_start, sdoy_end) bounds for weather and climo accumulation; multiple feature columns per window depending on weather_vars / climo_zscore_vars. config.py:267,311
Fold --identified-by--> fold_label (1:1) — fold_label is the string filesystem key; numeric labels ("2020") for walk-forward folds, literal "production" for the no-holdout fit. Cutoff is date(int(fold_label), 1, 1) for numeric folds. lib/results/results_slice.py:151

Summary counts¶

Section	Relationship count
Configuration	14
Pipeline	14
Reference-data	8
Feature-assembly	6
Delivery	6
Behavioural-role	8
Cross-cutting	5
Artefact-schema	8
Temporal-vocabulary	4
Total	73