Entity-Relationship Diagrams — commodity_hindcast¶
The domain is split across three diagrams because a single combined view would be unreadable. See ENTITIES for entity definitions and RELATIONSHIPS for relationship rationale.
Diagram 1 — Configuration¶
erDiagram
EXPERIMENT_CONFIG ||--|| COMMODITY_CONFIG : "is parameterised by"
EXPERIMENT_CONFIG ||--|| MODEL_CONFIG : "uses"
EXPERIMENT_CONFIG ||--|| EXPERIMENT_PROTOCOL_CONFIG : "governed by"
EXPERIMENT_CONFIG ||--|| POSTPROCESS_CONFIG : "postprocessed via"
EXPERIMENT_CONFIG ||--|| DELIVERY_CONFIG : "delivered via"
EXPERIMENT_CONFIG ||--o{ REFERENCE_YIELD_SPEC : "anchors against"
EXPERIMENT_CONFIG ||--o| FORECAST_CONFIG : "may include"
POSTPROCESS_CONFIG ||--|| BIAS_CORRECTOR_CONFIG : "embeds"
COMMODITY_CONFIG ||--o{ SEASON_WINDOW : "defines windows for"
COMMODITY_CONFIG ||--o{ BASE_BUILDER_CONFIG : "has builders"
BASE_BUILDER_CONFIG ||--o| EDIT_RULE_CONFIG : "applies edits via"
EXPERIMENT_CONFIG {
string experiment_name PK
string data_root
int feature_start_year
int feature_end_year
int random_seed
string mlflow_tracking_uri
}
COMMODITY_CONFIG {
string commodity PK
string country_code
string delivery_unit
float bushel_weight_lbs
int harvest_season_doy
string target_col
string target_detrended_col
string yield_col
string area_col
}
MODEL_CONFIG {
string detrend
string regression
string weather_correction_fit_level
boolean use_sample_weights
string weight_column
}
POSTPROCESS_CONFIG {
string conformalise
}
BIAS_CORRECTOR_CONFIG {
string kind
int n_lookback_years
string reduction_method
}
DELIVERY_CONFIG {
string model_public_name
boolean enforce_ci_narrowing
boolean drop_frozen_tail
}
FORECAST_CONFIG {
string raw_obs_filepath
string materialised_climo_filepath
date init_date
}
REFERENCE_YIELD_SPEC {
string name PK
string kind
string filepath
string commodity
string geography
string unit
}
SEASON_WINDOW {
string name PK
int sdoy_start
int sdoy_end
}
BASE_BUILDER_CONFIG {
string type PK
string filepath
string geo_id_col
boolean required_for_pred_parquet
}
EDIT_RULE_CONFIG {
string kind PK
string name
string target
string on_fail
}
EXPERIMENT_PROTOCOL_CONFIG {
string cv_strategy
float production_cumulative_threshold
int production_recent_years
}
ExperimentConfig is the aggregate root for all configuration. CommodityConfig is a child aggregate with its own builders dict (a discriminated union over five concrete BaseBuilderConfig subclasses: YieldsBuilder, WeatherBuilder, ClimoBuilder, StressBuilder, NDVIBuilder) and two tuples of SeasonWindow (climo and weather). PostprocessConfig embeds BiasCorrectorConfig and a conformalise tuple of mode strings. ReferenceYieldSpec is a discriminated union (WasdeRefSpec | ConabFinalRefSpec | ConabLevantamentoRefSpec); each spec carries a unique name (metric/column prefix downstream). ForecastConfig is optional — hindcast-only runs omit it. EditRuleConfig is itself a discriminated union used only inside YieldsBuilder.edits.
Diagram 2 — Pipeline artefacts¶
erDiagram
EXPERIMENT_RESULT ||--|| EXPERIMENT_CONFIG : "loaded from"
EXPERIMENT_RESULT ||--o{ HINDCAST_SLICE : "composes"
EXPERIMENT_RESULT ||--o{ FORECAST_SLICE : "composes"
FORECAST_SLICE }o--o| HINDCAST_SLICE : "delegates training to"
HINDCAST_SLICE ||--o| CALIBRATION_RESULT : "may produce"
HINDCAST_SLICE ||--o{ DELIVERY_ROW : "exports as"
FORECAST_SLICE ||--o{ DELIVERY_ROW : "exports as"
HINDCAST_DELIVERY ||--o{ DELIVERY_ROW : "contains"
HINDCAST_SLICE ||--|| HINDCAST_DELIVERY : "assembled into"
FORECAST_SLICE ||--o| HINDCAST_DELIVERY : "may produce"
HINDCAST_SLICE ||--|| FOLD : "represents"
EXPERIMENT_RESULT {
path run_dir PK
boolean has_walk_forward_preds
boolean has_postprocessed
}
HINDCAST_SLICE {
string fold_label PK
path run_dir
path train_preds_path
path model_path
path detrender_path
path feature_fill_values_path
path walk_forward_preds_path
path year_data_path
path bias_corrector_path
}
FORECAST_SLICE {
path run_dir PK
string commodity PK
int season_year PK
date init_date PK
path indices_zarr
path features_parquet
path walk_forward_preds_path
path postprocessed_national_path
}
FOLD {
string fold_label PK
date cutoff
}
CALIBRATION_RESULT {
string fold_label PK
string conformal_mode
path bias_corrector_path
}
HINDCAST_DELIVERY {
string generated_date PK
}
DELIVERY_ROW {
string commodity PK
int year PK
string init_date PK
string geo_identifier PK
string variable
string model
float mean
float weather_correction_bu_ac
float nass_actual
float wasde_in_season
float lower_95
float lower_80
float upper_80
float upper_95
}
ExperimentResult is the aggregate root for a run directory — the sole hand-off contract between pipeline stages. It carries two optional collections of slices: HindcastSlice (one per walk-forward fold plus the "production" no-holdout fit) and ForecastSlice (one per (season_year, init_date) pair). ForecastSlice delegates trained artefacts to the production HindcastSlice via its training property. Fold is modelled separately to make explicit that fold_label plus cutoff (Jan 1 of the test year, or a real init_date for forecasts) are the temporal identity concept shared by both slice types. CalibrationResult represents the per-fold bias corrector plus conformal sidecar written by POSTPROCESS. HindcastDelivery wraps all DeliveryRows with structural validators (no duplicate keys, fold consistency, CI ordering).
Diagram 3 — Behavioural roles¶
erDiagram
ABSTRACT_SLICE ||..|| HINDCAST_SLICE : "realised by"
ABSTRACT_SLICE ||..|| FORECAST_SLICE : "realised by"
FEATURE_BUILDER ||..|| YIELDS_BUILDER : "realised by"
FEATURE_BUILDER ||..|| WEATHER_BUILDER : "realised by"
FEATURE_BUILDER ||..|| CLIMO_BUILDER : "realised by"
FEATURE_BUILDER ||..|| NDVI_BUILDER : "realised by"
FEATURE_BUILDER ||..|| STRESS_BUILDER : "realised by"
ABSTRACT_DETREND ||..|| LINEAR_STATE_DETREND : "realised by"
ABSTRACT_DETREND ||..|| GAUSSIAN_WINDOW_STATE_DETREND : "realised by"
ABSTRACT_DETREND ||..|| PARTIAL_POOLING_DETREND : "realised by"
ABSTRACT_REGRESSION_IMPL ||..|| RIDGE_REGRESSOR : "realised by"
ABSTRACT_REGRESSION_IMPL ||..|| PCA_RIDGE_REGRESSOR : "realised by"
ABSTRACT_REGRESSION_IMPL ||..|| XGB_REGRESSOR : "realised by"
REFERENCE_YIELD_LOADER ||..|| WASDE_LOADER : "realised by"
REFERENCE_YIELD_LOADER ||..|| CONAB_FINAL_LOADER : "realised by"
REFERENCE_YIELD_LOADER ||..|| CONAB_LEVANTAMENTO_LOADER : "realised by"
ABSTRACT_SLICE {
path run_dir
date cutoff
path walk_forward_preds_path
path features_fit_path
path features_pred_path
boolean has_bias_corrector
}
FEATURE_BUILDER {
string type PK
string filepath
boolean required_for_pred_parquet
}
ABSTRACT_DETREND {
string kind PK
}
ABSTRACT_REGRESSION_IMPL {
string kind PK
}
REFERENCE_YIELD_LOADER {
string name PK
string kind
string filepath
string geography
}
AbstractSlice is a @runtime_checkable Protocol (defined at lib/results/results_slice.py:73) that both HindcastSlice and ForecastSlice satisfy, providing a symmetric path/loader surface so that consumers typed against the protocol work for both pipeline modes without branching. FeatureBuilder represents the BuilderFn callable protocol dispatched through BUILDER_REGISTRY in features/builders/registry.py; each of the five concrete builders (yields, weather, climo, ndvi, stress) maps to a registered function and a corresponding BaseBuilderConfig subclass from the config layer. The three detrender implementations (linear_state, gaussian_state, partial_pooling) all extend AbstractDetrend and are loaded lazily by HindcastSlice.load_detrender. Similarly, the three regressors (ridge, pca_ridge, xgboost) extend AbstractRegressionImpl. ReferenceYieldLoader is the ABC dispatched from _ReferenceYieldSpecBase; WasdeLoader, ConabFinalLoader, and ConabLevantamentoLoader are the three concrete implementations registered at import time in lib/reference_data/loader.py.
Cross-diagram references¶
The following entities appear in more than one diagram:
- ExperimentConfig — Diagram 1 (full detail) and Diagram 2 (
ExperimentResultloads it from disk). - HindcastSlice / ForecastSlice — Diagram 2 (pipeline artefact structure) and Diagram 3 (realise
AbstractSliceprotocol). - BaseBuilderConfig / concrete builders — Diagram 1 (
CommodityConfig.buildersdiscriminated union) and Diagram 3 (behaviouralFeatureBuilderrole). - ReferenceYieldSpec / ReferenceYieldLoader — Diagram 1 (
ExperimentConfig.reference_datalist) and Diagram 3 (ReferenceYieldLoaderABC hierarchy). - Fold — Diagram 2 (pipeline artefact identity for
HindcastSlice) and implied byAbstractSlice.cutoffin Diagram 3. - DeliveryRow — Diagram 2 (exported from both slice types and collected by
HindcastDelivery) and indirectly Diagram 1 (DeliveryConfiggoverns CI levels and column schema).