Skip to content

Entity-Relationship Diagrams — commodity_hindcast

The domain is split across three diagrams because a single combined view would be unreadable. See ENTITIES for entity definitions and RELATIONSHIPS for relationship rationale.

Diagram 1 — Configuration

erDiagram
  EXPERIMENT_CONFIG ||--|| COMMODITY_CONFIG : "is parameterised by"
  EXPERIMENT_CONFIG ||--|| MODEL_CONFIG : "uses"
  EXPERIMENT_CONFIG ||--|| EXPERIMENT_PROTOCOL_CONFIG : "governed by"
  EXPERIMENT_CONFIG ||--|| POSTPROCESS_CONFIG : "postprocessed via"
  EXPERIMENT_CONFIG ||--|| DELIVERY_CONFIG : "delivered via"
  EXPERIMENT_CONFIG ||--o{ REFERENCE_YIELD_SPEC : "anchors against"
  EXPERIMENT_CONFIG ||--o| FORECAST_CONFIG : "may include"
  POSTPROCESS_CONFIG ||--|| BIAS_CORRECTOR_CONFIG : "embeds"
  COMMODITY_CONFIG ||--o{ SEASON_WINDOW : "defines windows for"
  COMMODITY_CONFIG ||--o{ BASE_BUILDER_CONFIG : "has builders"
  BASE_BUILDER_CONFIG ||--o| EDIT_RULE_CONFIG : "applies edits via"

  EXPERIMENT_CONFIG {
    string experiment_name PK
    string data_root
    int feature_start_year
    int feature_end_year
    int random_seed
    string mlflow_tracking_uri
  }
  COMMODITY_CONFIG {
    string commodity PK
    string country_code
    string delivery_unit
    float bushel_weight_lbs
    int harvest_season_doy
    string target_col
    string target_detrended_col
    string yield_col
    string area_col
  }
  MODEL_CONFIG {
    string detrend
    string regression
    string weather_correction_fit_level
    boolean use_sample_weights
    string weight_column
  }
  POSTPROCESS_CONFIG {
    string conformalise
  }
  BIAS_CORRECTOR_CONFIG {
    string kind
    int n_lookback_years
    string reduction_method
  }
  DELIVERY_CONFIG {
    string model_public_name
    boolean enforce_ci_narrowing
    boolean drop_frozen_tail
  }
  FORECAST_CONFIG {
    string raw_obs_filepath
    string materialised_climo_filepath
    date init_date
  }
  REFERENCE_YIELD_SPEC {
    string name PK
    string kind
    string filepath
    string commodity
    string geography
    string unit
  }
  SEASON_WINDOW {
    string name PK
    int sdoy_start
    int sdoy_end
  }
  BASE_BUILDER_CONFIG {
    string type PK
    string filepath
    string geo_id_col
    boolean required_for_pred_parquet
  }
  EDIT_RULE_CONFIG {
    string kind PK
    string name
    string target
    string on_fail
  }
  EXPERIMENT_PROTOCOL_CONFIG {
    string cv_strategy
    float production_cumulative_threshold
    int production_recent_years
  }

ExperimentConfig is the aggregate root for all configuration. CommodityConfig is a child aggregate with its own builders dict (a discriminated union over five concrete BaseBuilderConfig subclasses: YieldsBuilder, WeatherBuilder, ClimoBuilder, StressBuilder, NDVIBuilder) and two tuples of SeasonWindow (climo and weather). PostprocessConfig embeds BiasCorrectorConfig and a conformalise tuple of mode strings. ReferenceYieldSpec is a discriminated union (WasdeRefSpec | ConabFinalRefSpec | ConabLevantamentoRefSpec); each spec carries a unique name (metric/column prefix downstream). ForecastConfig is optional — hindcast-only runs omit it. EditRuleConfig is itself a discriminated union used only inside YieldsBuilder.edits.

Diagram 2 — Pipeline artefacts

erDiagram
  EXPERIMENT_RESULT ||--|| EXPERIMENT_CONFIG : "loaded from"
  EXPERIMENT_RESULT ||--o{ HINDCAST_SLICE : "composes"
  EXPERIMENT_RESULT ||--o{ FORECAST_SLICE : "composes"
  FORECAST_SLICE }o--o| HINDCAST_SLICE : "delegates training to"
  HINDCAST_SLICE ||--o| CALIBRATION_RESULT : "may produce"
  HINDCAST_SLICE ||--o{ DELIVERY_ROW : "exports as"
  FORECAST_SLICE ||--o{ DELIVERY_ROW : "exports as"
  HINDCAST_DELIVERY ||--o{ DELIVERY_ROW : "contains"
  HINDCAST_SLICE ||--|| HINDCAST_DELIVERY : "assembled into"
  FORECAST_SLICE ||--o| HINDCAST_DELIVERY : "may produce"
  HINDCAST_SLICE ||--|| FOLD : "represents"

  EXPERIMENT_RESULT {
    path run_dir PK
    boolean has_walk_forward_preds
    boolean has_postprocessed
  }
  HINDCAST_SLICE {
    string fold_label PK
    path run_dir
    path train_preds_path
    path model_path
    path detrender_path
    path feature_fill_values_path
    path walk_forward_preds_path
    path year_data_path
    path bias_corrector_path
  }
  FORECAST_SLICE {
    path run_dir PK
    string commodity PK
    int season_year PK
    date init_date PK
    path indices_zarr
    path features_parquet
    path walk_forward_preds_path
    path postprocessed_national_path
  }
  FOLD {
    string fold_label PK
    date cutoff
  }
  CALIBRATION_RESULT {
    string fold_label PK
    string conformal_mode
    path bias_corrector_path
  }
  HINDCAST_DELIVERY {
    string generated_date PK
  }
  DELIVERY_ROW {
    string commodity PK
    int year PK
    string init_date PK
    string geo_identifier PK
    string variable
    string model
    float mean
    float weather_correction_bu_ac
    float nass_actual
    float wasde_in_season
    float lower_95
    float lower_80
    float upper_80
    float upper_95
  }

ExperimentResult is the aggregate root for a run directory — the sole hand-off contract between pipeline stages. It carries two optional collections of slices: HindcastSlice (one per walk-forward fold plus the "production" no-holdout fit) and ForecastSlice (one per (season_year, init_date) pair). ForecastSlice delegates trained artefacts to the production HindcastSlice via its training property. Fold is modelled separately to make explicit that fold_label plus cutoff (Jan 1 of the test year, or a real init_date for forecasts) are the temporal identity concept shared by both slice types. CalibrationResult represents the per-fold bias corrector plus conformal sidecar written by POSTPROCESS. HindcastDelivery wraps all DeliveryRows with structural validators (no duplicate keys, fold consistency, CI ordering).

Diagram 3 — Behavioural roles

erDiagram
  ABSTRACT_SLICE ||..|| HINDCAST_SLICE : "realised by"
  ABSTRACT_SLICE ||..|| FORECAST_SLICE : "realised by"
  FEATURE_BUILDER ||..|| YIELDS_BUILDER : "realised by"
  FEATURE_BUILDER ||..|| WEATHER_BUILDER : "realised by"
  FEATURE_BUILDER ||..|| CLIMO_BUILDER : "realised by"
  FEATURE_BUILDER ||..|| NDVI_BUILDER : "realised by"
  FEATURE_BUILDER ||..|| STRESS_BUILDER : "realised by"
  ABSTRACT_DETREND ||..|| LINEAR_STATE_DETREND : "realised by"
  ABSTRACT_DETREND ||..|| GAUSSIAN_WINDOW_STATE_DETREND : "realised by"
  ABSTRACT_DETREND ||..|| PARTIAL_POOLING_DETREND : "realised by"
  ABSTRACT_REGRESSION_IMPL ||..|| RIDGE_REGRESSOR : "realised by"
  ABSTRACT_REGRESSION_IMPL ||..|| PCA_RIDGE_REGRESSOR : "realised by"
  ABSTRACT_REGRESSION_IMPL ||..|| XGB_REGRESSOR : "realised by"
  REFERENCE_YIELD_LOADER ||..|| WASDE_LOADER : "realised by"
  REFERENCE_YIELD_LOADER ||..|| CONAB_FINAL_LOADER : "realised by"
  REFERENCE_YIELD_LOADER ||..|| CONAB_LEVANTAMENTO_LOADER : "realised by"

  ABSTRACT_SLICE {
    path run_dir
    date cutoff
    path walk_forward_preds_path
    path features_fit_path
    path features_pred_path
    boolean has_bias_corrector
  }
  FEATURE_BUILDER {
    string type PK
    string filepath
    boolean required_for_pred_parquet
  }
  ABSTRACT_DETREND {
    string kind PK
  }
  ABSTRACT_REGRESSION_IMPL {
    string kind PK
  }
  REFERENCE_YIELD_LOADER {
    string name PK
    string kind
    string filepath
    string geography
  }

AbstractSlice is a @runtime_checkable Protocol (defined at lib/results/results_slice.py:73) that both HindcastSlice and ForecastSlice satisfy, providing a symmetric path/loader surface so that consumers typed against the protocol work for both pipeline modes without branching. FeatureBuilder represents the BuilderFn callable protocol dispatched through BUILDER_REGISTRY in features/builders/registry.py; each of the five concrete builders (yields, weather, climo, ndvi, stress) maps to a registered function and a corresponding BaseBuilderConfig subclass from the config layer. The three detrender implementations (linear_state, gaussian_state, partial_pooling) all extend AbstractDetrend and are loaded lazily by HindcastSlice.load_detrender. Similarly, the three regressors (ridge, pca_ridge, xgboost) extend AbstractRegressionImpl. ReferenceYieldLoader is the ABC dispatched from _ReferenceYieldSpecBase; WasdeLoader, ConabFinalLoader, and ConabLevantamentoLoader are the three concrete implementations registered at import time in lib/reference_data/loader.py.

Cross-diagram references

The following entities appear in more than one diagram:

  • ExperimentConfig — Diagram 1 (full detail) and Diagram 2 (ExperimentResult loads it from disk).
  • HindcastSlice / ForecastSlice — Diagram 2 (pipeline artefact structure) and Diagram 3 (realise AbstractSlice protocol).
  • BaseBuilderConfig / concrete builders — Diagram 1 (CommodityConfig.builders discriminated union) and Diagram 3 (behavioural FeatureBuilder role).
  • ReferenceYieldSpec / ReferenceYieldLoader — Diagram 1 (ExperimentConfig.reference_data list) and Diagram 3 (ReferenceYieldLoader ABC hierarchy).
  • Fold — Diagram 2 (pipeline artefact identity for HindcastSlice) and implied by AbstractSlice.cutoff in Diagram 3.
  • DeliveryRow — Diagram 2 (exported from both slice types and collected by HindcastDelivery) and indirectly Diagram 1 (DeliveryConfig governs CI levels and column schema).