Relationships — commodity_hindcast¶
Naming convention¶
Each relationship is recorded as: Source-Entity --verb--> Target-Entity (cardinality) with a short description and a code citation. Cardinality notation:
1:1— one-to-one1:N— one-to-many0..1:1— optional on the source side0..N— zero-or-more on the source sideN:1— many-to-one (reference)
Configuration relationships¶
-
ExperimentConfig --has-one--> CommodityConfig (1:1) — the experiment is parameterised by exactly one commodity spec; CommodityConfig is a nested required field.
config.py:639 -
ExperimentConfig --has-one--> ModelConfig (1:1) — detrend strategy and regression estimator; required with a default.
config.py:647 -
ExperimentConfig --has-one--> ExperimentProtocolConfig (1:1) — CV strategy and test-year list; required.
config.py:644 -
ExperimentConfig --has-one--> PostprocessConfig (1:1) — bias-corrector and conformal-mode selection; required.
config.py:657 -
ExperimentConfig --has-one--> DeliveryConfig (1:1) — CI levels, public model name, narrowing and frozen-tail options; required.
config.py:663 -
ExperimentConfig --has-one--> EvaluationConfig (0..1:1) — WASDE benchmark path used only by EVALUATE; embedded inside ExperimentConfig via field.
domain-modelling/schema.yaml:308 -
ExperimentConfig --has-many--> ReferenceYieldSpec (0..N) — ordered list of reference-yield specs; empty list is the no-reference sentinel.
config.py:654 -
ExperimentConfig --has-one--> ForecastConfig (0..1:1) — optional;
Nonemeans hindcast-only mode, set means forecast mode.config.py:666 -
PostprocessConfig --has-one--> BiasCorrectorConfig (1:1) — bias-corrector kind and lookback window embedded inside PostprocessConfig.
config.py:523 -
CommodityConfig --has-many--> Builder (1:N) — dict of named builders;
typediscriminator auto-injected from dict key.config.py:337 -
CommodityConfig --has-many--> SeasonWindow (1:N) — separate lists for climo windows and weather windows, each expressing season-DOY aggregation bounds.
config.py:311 -
YieldsBuilder --has-many--> EditRuleConfig (0..N) — declarative Fellegi-Holt edits applied before the yields pivot; validated as a discriminated union at YAML parse time.
config.py:197 -
ModelConfig --references--> FitAggregationPolicy (derived, 1:1) —
weather_correction_fit_levelon ModelConfig drives the FitAggregationPolicy constructed byExperimentConfig.build_fit_aggregation_policy().config.py:707 -
ReferenceYieldSpec --resolves-to--> ReferenceYieldLoader (N:1, polymorphic) — each spec subclass is registered against a concrete loader; dispatch via
ReferenceYieldLoader.from_spec().lib/reference_data/base_reference_yield_loader.py:97,101
Pipeline relationships¶
-
ExperimentConfig --produces--> ExperimentResult (1:1 per run) —
run_hindcast.run()creates a timestampedrun_root, writesconfig_resolved.yaml, and the subsequentExperimentResult.from_run_dir()call discovers slices.stages/run_hindcast.py:77,91 -
ExperimentResult --composes--> HindcastSlice (1:N) — frozen tuple of per-fold handles; discovered from
preds/{commodity}/{fold}/train_preds.parqueton disk.lib/results/run_result.py:65,76 -
ExperimentResult --composes--> ForecastSlice (0..N) — discovered from
forecast/{season_year}/{init_date}/preds/walk_forward_preds.parquet.lib/results/run_result.py:96,108 -
ExperimentResult --exposes--> HindcastSlice["production"] (0..1:1) — the
productionproperty returns the no-holdout fold viaproduction_hindcast_slice();Nonebeforefit_productionruns.lib/results/run_result.py:175 -
Hindcast (run) --produces-many--> HindcastSlice (1:N) —
run_walk_forward()generates one slice per test year plus one"production"slice.run/runner.py:52 -
Forecast (run) --produces-many--> ForecastSlice (1:N) —
run_forecast.run()creates one ForecastSlice per(season_year, init_date)pair.stages/run_forecast.py:80 -
ForecastSlice --reuses--> HindcastSlice["production"] (N:1) —
ForecastSlice.trainingcallsproduction_hindcast_slice()to reach the production model, detrender and fill values; cannot exist without production.lib/results/results_slice.py:410 -
HindcastSlice --persisted-in--> RunDir/models/{commodity}/{fold}/ (1:1) —
model_path,detrender_path,feature_fill_values_pathall point under this directory.lib/results/results_slice.py:134 -
HindcastSlice --persisted-in--> RunDir/preds/{commodity}/{fold}/ (1:1) —
train_preds_path,walk_forward_preds_path,year_data_pathpoint under this directory.lib/results/results_slice.py:140 -
ForecastSlice --persisted-in--> RunDir/forecast/{season_year}/{init_date}/ (1:1) —
rootproperty resolves the per-init subtree; all forecast artefacts sit under it.lib/results/results_slice.py:324 -
HindcastSlice --has-one--> bias_corrector.pkl (0..1:1) — optional; path from
_bias_corrector_path();has_bias_correctorexistence-checks before loading.lib/results/results_slice.py:41,173 -
ForecastSlice --has-one--> bias_corrector.pkl (0..1:1) — postprocessed at forecast time, lives under
ForecastSlice.postprocessed_dir.lib/results/results_slice.py:442 -
CalibrationResult --calibrates--> ExperimentResult (N:1) —
primary_calibration(experiment, ci_levels)reads all fold residuals from the ExperimentResult's hindcast slices to compute conformal half-widths.stages/run_forecast.py:343 -
ExperimentResult --produces--> HindcastDelivery (1:N) —
deliver_experiment()callswalk_forward_preds_to_delivery_rows()for each of the three ADM levels, producing one HindcastDelivery per level.stages/run_hindcast.py:224
Reference-data relationships¶
-
ExperimentConfig --has-many--> ReferenceYieldSpec (0..N) —
reference_datalist; each spec carries a uniquenameused as a metric/column prefix.config.py:654 -
ReferenceYieldSpec --is-a--> _ReferenceYieldSpecBase — all three concrete spec classes (
WasdeRefSpec,ConabFinalRefSpec,ConabLevantamentoRefSpec) subclass the frozen pydantic base.lib/reference_data/base_reference_yield_loader.py:36 -
WasdeRefSpec --registered-to--> WasdeLoader (1:1) — explicit registry entry;
ReferenceYieldLoader.from_spec()dispatches to WasdeLoader for this spec type.lib/reference_data/loader.py:51 -
ConabFinalRefSpec --registered-to--> ConabFinalLoader (1:1) — registered at module import.
lib/reference_data/loader.py:52 -
ConabLevantamentoRefSpec --registered-to--> ConabLevantamentoLoader (1:1) — registered at module import.
lib/reference_data/loader.py:53 -
ReferenceYieldLoader --loads--> Yield (N:M) — each loader reads and returns yield data (kg/ha) keyed by
(harvest_year, geo_identifier)for benchmarking.lib/reference_data/base_reference_yield_loader.py:77 -
DeliveryRow --references--> NASS benchmarks (N:1 by ID) —
nass_actual,nass_actual_area_weighted_all,nass_actual_prod_div_area_allfields populated at delivery time.delivery/schemas.py:140 -
DeliveryRow --references--> WASDE/CONAB benchmark (N:1 by ID) —
wasde_in_season,conab_final_in_season,conab_lev_in_seasonoptional fields populated from the corresponding ReferenceYieldLoader.delivery/schemas.py:143
Feature-assembly relationships¶
-
CommodityConfig --drives--> FeatureBuilder (1:N) — each entry in
CommodityConfig.buildersinstantiates a concrete builder module infeatures/builders/.config.py:337 -
FeatureBuilder --reads--> Yield (N:M) — YieldsBuilder reads the NASS parquet and produces rows keyed by
(year, geo_identifier, init_date).config.py:187 -
FeatureBuilder --reads--> fit.parquet/pred.parquet (1:1 per run) — each builder writes to
features_dir/{commodity}/builders/{name}.parquet; assemble inner/left-joins into fit and pred.domain-modelling/DOMAIN_MODEL2.md §8.1 -
YieldsBuilder --applies--> EditRuleConfig (0..N, ordered) — edits applied sequentially in YAML order; each rule produces or consumes from EditReport.
lib/edit_and_imputation/edit.py:383 -
EditRuleConfig --produces--> EditReport (1:1 per rule fire) — fire counts and boolean flag frame accumulated across sequential rule application.
lib/edit_and_imputation/edit.py:370 -
ExperimentConfig --gates-via--> Check (1:N, per stage) —
run_preflight()runs a list ofCheckVOs before each stage; critical failure raises SystemExit.run/preflight.py:42
Delivery relationships¶
-
HindcastDelivery --composes--> DeliveryRow (1:N) —
rows: list[DeliveryRow]; no duplicates on(year, init_date, geo_identifier).delivery/schemas.py:236 -
HindcastSlice --aggregates-to--> DeliveryRow (N:M, per ADM level) —
walk_forward_preds_to_delivery_rows()loads non-production fold predictions and converts them to DeliveryRows at each ADM level.domain-modelling/DOMAIN_MODEL2.md §8.4 -
ForecastSlice --aggregates-to--> DeliveryRow (N:M, per ADM level) —
_deliver_forecast()callswalk_forward_preds_to_delivery_rows(mode="forecast")and writes one CSV per level.stages/run_forecast.py:357 -
DeliveryRow --persisted-in--> delivery/Treefera_{commodity}{level}_Hindcast.csv (N:1) — three files per run (ADM0, ADM1, ADM2); written by
deliver_experiment().stages/run_hindcast.py:224 -
DeliveryRow --persisted-in--> forecast/{season_year}/{init_date}/delivery/*.csv (N:1) — one file per ADM level; written by
_deliver_forecast().stages/run_forecast.py:398 -
FoldSchedule --drives--> DeliveryRow (N:M, dashboard only) — the Streamlit dashboard uses FoldSchedule to map fold labels to init_dates for display; does not affect pipeline delivery.
app/_dashboard_config.py:199
Behavioural-role relationships¶
-
ModelConfig --instantiates--> Detrender (1:1, per fold) —
ExperimentConfig.build_detrender()dispatches onmodel.detrendliteral to the concreteAbstractDetrendsubclass.config.py:682 -
ModelConfig --instantiates--> Regressor (1:1, per fold) —
ExperimentConfig.build_regressor()dispatches onmodel.regressionliteral to the concreteAbstractRegressionImplsubclass.config.py:695 -
BiasCorrectorConfig --instantiates--> MetaModel (1:1, per fold) —
build_bias_corrector(config.postprocess.bias_corrector)dispatches onkindtoNoBiasCorrectororCoverageBiasCorrector.stages/run_forecast.py:339 -
HindcastSlice --loads--> Detrender (1:1) —
load_detrender(config)usesmodel.detrendto select the class and callscls.load(detrender_path, config).lib/results/results_slice.py:243 -
HindcastSlice --loads--> Regressor (1:1) —
load_model()tries each concreteAbstractRegressionImplclass againstmodel_pathuntil one succeeds.lib/results/results_slice.py:182 -
ForecastSlice --delegates-to--> HindcastSlice["production"] for trained artefacts —
load_model(),load_detrender(),load_feature_fill_values()all proxy toself.training.lib/results/results_slice.py:427 -
AbstractSlice --is-a (protocol)-- HindcastSlice —
isinstance(slice, AbstractSlice)is true at runtime; HindcastSlice satisfies the runtime-checkable protocol.lib/results/results_slice.py:72 -
AbstractSlice --is-a (protocol)-- ForecastSlice — ForecastSlice also satisfies AbstractSlice; shared surface enables downstream consumers to handle both uniformly.
lib/results/results_slice.py:72 -
ExperimentProtocolConfig --drives--> Fold (1:N) —
ExpandingFoldGeneratorconsumestest_yearsfrom ExperimentProtocolConfig to yield one fold per test year.run/experiment_protocol.py(referenced inrun/runner.py:21) -
Fold --produces--> HindcastSlice (1:1) — each fold label corresponds to exactly one HindcastSlice written during walk-forward.
run/runner.py:63
Cross-cutting relationships¶
-
ExperimentResult --tagged-in--> MLflow Run (1:1) — one MLflow run per pipeline invocation;
run_diris stored as a tag.domain-modelling/DOMAIN_MODEL2.md §3 -
ExperimentConfig --preflight-checked-by--> Check (1:N, per stage) —
preflight_paths_for_hindcast(),preflight_paths_for_forecast_features(), etc. emit Check lists from ExperimentConfig.run/preflight.py:59 -
Region --referenced-by--> geo_identifier (N:M, via NewType) —
GeoIdentifieris aNewType("GeoIdentifier", str)alias; no standalone entity class.domain-modelling/DOMAIN_MODEL.md §3.1 -
Yield --keyed-by--> (year, geo_identifier, init_date) (N:M) — every feature parquet, prediction parquet and delivery row uses this triple as the canonical join key.
domain-modelling/DOMAIN_MODEL.md §1 -
ExperimentConfig --resolves--> ResolvablePath (1:N) —
_iter_resolvable_fields()walks all nested config fields carryingResolvablePathannotations and resolves them againstdata_rootat load time.config.py:785 -
ExperimentResult --persists--> included_geo_identifiers (1:1) —
save_included_geo_identifiers()writes the top-N county frozenset torun_dir/included_geo_identifiers.txtduring FIT;load_included_geo_identifiers()reads it back during PREDICT.lib/results/run_result.py:157,164 -
Detrender --transforms--> Yield (N:M) — every Detrender implementation removes temporal trend from the training yield panel before the Regressor fits residuals. Referenced via
AbstractDetrendprotocol used inrun/runner.py:116. -
Regressor --fits--> Yield residuals (N:M) —
AbstractRegressionImplimplementations (RidgeRegressor,PcaRidgeRegressor,XGBRegressor) fit on detrended residuals and score against feature columns.lib/results/results_slice.py:182
Artefact-schema relationships¶
-
HindcastSlice --writes--> walk_forward_preds.parquet (1:1 per fold) — schema:
(geo_identifier, year, init_date, sim_yield_kg_ha, sim_yield_kg_ha_detrended, obs_yield_kg_ha, area_harvested_ha, crop_type).run/runner.py:74 -
HindcastSlice --writes--> train_preds.parquet (1:1 per fold) — training-set predictions used for bias correction and conformal calibration.
run/runner.pyandrun/experiment_protocol.py -
HindcastSlice --writes--> year_data.parquet (1:1 per fold) — raw pre-simulation feature snapshot for
(year, init_date)— diagnostic output preserved per fold.run/runner.py:71 -
ForecastSlice --writes--> indices.zarr (1:1) — daily spliced observed-plus-climatology weather indices produced by
materialise_forecast_indices().stages/run_forecast.py:152 -
ForecastSlice --writes--> features/pred.parquet (1:1) — per-init forecast features; independent from the canonical run-level
pred.parquet.stages/run_forecast.py:252 -
ForecastSlice --writes--> postprocessed/national.parquet (1:1) — national aggregated postprocessed frame with bias correction and conformal CIs applied.
stages/run_forecast.py:347 -
ExperimentResult --reads--> features_dir/{commodity}/fit.parquet (N:1) — run-level fit feature matrix, shared across all HindcastSlices; resolved via
_features_path(run_dir, "fit").lib/results/results_slice.py:60 -
ExperimentResult --reads--> features_dir/{commodity}/pred.parquet (N:1) — run-level pred feature matrix, shared across all HindcastSlices; resolved via
_features_path(run_dir, "pred").lib/results/results_slice.py:60 -
metadata.json --describes--> fit.parquet/pred.parquet (1:1 each) — carries
index_cols,feature_cols,target_col; written byfeatures/assemble.pyalongside each parquet.domain-modelling/DOMAIN_MODEL2.md §8.1
Temporal-vocabulary relationships¶
-
SeasonYear --partitioned-by--> InitDate (1:N) — a single crop year has many in-season init_dates;
CommodityConfig.hindcast_init_dates(season_year)returns the full weekly grid.config.py:369 -
InitDate --derived-from--> SeasonDOY (N:1) —
CommodityConfig.to_date(season_doy, season_year)converts a season DOY to a calendar date;to_season_doy()inverts.config.py:363,366 -
SeasonWindow --aggregates--> feature columns (1:N) — each SeasonWindow defines
(sdoy_start, sdoy_end)bounds for weather and climo accumulation; multiple feature columns per window depending onweather_vars/climo_zscore_vars.config.py:267,311 -
Fold --identified-by--> fold_label (1:1) — fold_label is the string filesystem key; numeric labels (
"2020") for walk-forward folds, literal"production"for the no-holdout fit. Cutoff isdate(int(fold_label), 1, 1)for numeric folds.lib/results/results_slice.py:151
Summary counts¶
| Section | Relationship count |
|---|---|
| Configuration | 14 |
| Pipeline | 14 |
| Reference-data | 8 |
| Feature-assembly | 6 |
| Delivery | 6 |
| Behavioural-role | 8 |
| Cross-cutting | 5 |
| Artefact-schema | 8 |
| Temporal-vocabulary | 4 |
| Total | 73 |