Skip to content

Domain Model — commodity_hindcast

This directory holds the formal entity-relationship model of the commodity_hindcast domain. It complements (does not replace) the auto-generated LinkML schema living inside the package at market_insights_models/src/commodity_hindcast/domain-modelling/. See delta_vs_existing.md for what is new here vs. what is already there.

Reading order

  1. ENTITIES.md — canonical entity catalogue (~30 entities across 5 tiers)
  2. RELATIONSHIPS.md — inter-entity relationships (~73 relationships across 9 sections)
  3. AGGREGATES.md — DDD aggregates and consistency boundaries (7 aggregates)
  4. BOUNDED_CONTEXTS.md — DDD bounded contexts (11 contexts) + Mermaid context map
  5. ER_DIAGRAM.md — three Mermaid ER diagrams (configuration / pipeline artefacts / behavioural roles)
  6. delta_vs_existing.md — relation to the existing in-package domain model

At a glance

The commodity_hindcast pipeline produces walk-forward yield predictions for agricultural commodities (corn, soybean, wheat, cotton, brazil_soybean) at three admin levels (ADM0 / ADM1 / ADM2). Every run is parameterised by a frozen ExperimentConfig — the root aggregate validated atomically at load time — and materialises its artefacts under a timestamped run_dir. The sole hand-off contract between pipeline stages is the filesystem: ExperimentResult is a lazy handle to that directory tree, and no in-memory objects cross stage boundaries. The two modes — hindcast (walk-forward CV producing the audit-grade client time series) and forecast (single init_date prediction reusing the production model) — share one artefact tree and one delivery schema (HindcastDelivery / DeliveryRow). The package has 11 bounded contexts; the one real layering violation currently tracked is delivery/conversions.py importing conformal helpers from stages/run_meta_models.py rather than from lib/. Value objects such as Commodity, SeasonYear, InitDate, Fold, and RunDir are not Python classes — they are string/int/path values and NewType aliases — so the auto-generated LinkML schema omits them; this kb model names them explicitly. The five behavioural roles (FeatureBuilder, Detrender, Regressor, MetaModel, ReferenceYieldLoader) are Protocols and ABCs that the LinkML generator similarly skips.

Quick-reference tables

Aggregate roots

Aggregate root Owned children Persistence location
ExperimentConfig CommodityConfig, ModelConfig, ExperimentProtocolConfig, PostprocessConfig (→ BiasCorrectorConfig), DeliveryConfig, EvaluationConfig, ForecastConfig (optional), list[ReferenceYieldSpec], Builder union, SeasonWindow list, EditRuleConfig list <run_dir>/config_resolved.yaml
ExperimentResult tuple[HindcastSlice, ...], tuple[ForecastSlice, ...] Filesystem directory <run_dir_base>/<timestamp>_<experiment_name>/
HindcastSlice detrender.pkl, model.*, feature_fill_values.parquet, train_preds.parquet, walk_forward_preds.parquet, year_data.parquet, optional bias_corrector.pkl <run_dir>/models/{commodity}/{fold_label}/ and <run_dir>/preds/{commodity}/{fold_label}/
ForecastSlice indices.zarr, features/pred.parquet, walk_forward_preds.parquet, year_data.parquet, postprocessed/national.parquet, delivery/*.csv <run_dir>/forecast/{season_year}/{init_date}/
HindcastDelivery list[DeliveryRow], generated_date delivery/Treefera_{commodity}_{ADM}_Hindcast_{YYYYMMDD}.csv (one per ADM level)
ExperimentProtocolConfig + Fold schedule cv_strategy, test_years, thresholds Embedded in config_resolved.yaml
Check list list[Check] (per preflight call) Not persisted; ephemeral gate output

Bounded contexts

Context Subpackage(s) Public surface
1. Configuration & Orchestration config.py, cli.py, configs/*.yaml, lib/path_utils.py, lib/calendar.py ExperimentConfig, CommodityConfig, cli Click group
2. Preflight run/preflight.py Check, run_preflight(), preflight_paths_for_*
3. Feature Engineering features/, lib/edit_and_imputation/, lib/calendar.py build_features(), Builder protocol, assemble()
4. Experiment & Modelling run/, stages/run_fit.py, stages/run_hindcast.py, stages/run_predict.py, models/detrend/, models/regression/, lib/results/ ExperimentResult, HindcastSlice, AbstractSlice, train(), run()
5. Post-processing stages/run_meta_models.py, models/meta_models/ postprocess_experiment(), AbstractBiasCorrector, conformal half-width helpers
6. Evaluation & Diagnostics stages/run_diagnostics.py, diagnostics/ evaluate_experiment(), PlotGroup, PlotSpec, PLOT_REGISTRY, gen_metrics
7. Delivery delivery/, stages/run_deliver.py HindcastDelivery, DeliveryRow, deliver_experiment(), walk_forward_preds_to_delivery_rows()
8. Forecast stages/run_forecast.py, features/forecast_weather.py, features/forecast_long_range_stub.py ForecastSlice, run(), run_features(), run_predict(), materialise_forecast_indices()
9. Experiment Tracking lib/tracking/ MLflow run helpers, metadata_<stage>.yaml side-channel
10a. Reference Data lib/reference_data/ load_nass_panel(), load_wasde(), load_conab(), BaseReferenceYieldLoader
10b. Geo & Identifiers lib/geo/, delivery/geo_normalise.py GeoIdentifier NewType, make_geo_identifier(), area_weighted_mean(), select_included_geos()
11. Dashboard app/ app.py Streamlit entry point, run_loader.py

Top-tier entities

Entity Kind Source-of-truth file path
Commodity Value (identity string on CommodityConfig) market_insights_models/src/commodity_hindcast/config.py:271
SeasonYear Value (int field) market_insights_models/src/commodity_hindcast/config.py:285
InitDate Value (date field) market_insights_models/src/commodity_hindcast/config.py:668
Region Value (GeoIdentifier NewType) market_insights_models/src/commodity_hindcast/lib/geo/identifiers.py
Yield Value (float columns) market_insights_models/src/commodity_hindcast/delivery/schemas.py:109
Fold Value (string key fold_label) market_insights_models/src/commodity_hindcast/lib/results/results_slice.py:151
ExperimentConfig Aggregate root market_insights_models/src/commodity_hindcast/config.py:573
CommodityConfig Aggregate child (config) market_insights_models/src/commodity_hindcast/config.py:271
ModelConfig Aggregate child (config) market_insights_models/src/commodity_hindcast/config.py:464
FeatureBuilderConfig Aggregate child (config, discriminated union) market_insights_models/src/commodity_hindcast/config.py:165
ReferenceYieldSpec Aggregate child (config, discriminated union) market_insights_models/src/commodity_hindcast/lib/reference_data/loader.py:59
BiasCorrectorConfig Aggregate child (config) market_insights_models/src/commodity_hindcast/config.py:501
ConformalConfig Aggregate child (config tuple) market_insights_models/src/commodity_hindcast/config.py:532
EditRuleConfig Aggregate child (config, discriminated union) market_insights_models/src/commodity_hindcast/lib/edit_and_imputation/edit.py:361
ForecastConfig Aggregate child (optional config) market_insights_models/src/commodity_hindcast/config.py:552
RunDir Value (Path field on ExperimentResult) market_insights_models/src/commodity_hindcast/lib/results/run_result.py:40
ExperimentResult Aggregate root market_insights_models/src/commodity_hindcast/lib/results/run_result.py:31
HindcastSlice Aggregate child (artefact) market_insights_models/src/commodity_hindcast/lib/results/results_slice.py:112
ForecastSlice Aggregate child (artefact) market_insights_models/src/commodity_hindcast/lib/results/results_slice.py:299
CalibrationResult Aggregate child (artefact) market_insights_models/src/commodity_hindcast/models/meta_models/conformalise.py:111
FeatureBuilder Protocol (behavioural role) market_insights_models/src/commodity_hindcast/features/builders/interface.py:25
Detrender ABC (behavioural role) market_insights_models/src/commodity_hindcast/models/detrend/base.py:21
Regressor ABC (behavioural role) market_insights_models/src/commodity_hindcast/models/regression/base.py:9
MetaModel — BiasCorrector ABC (behavioural role) market_insights_models/src/commodity_hindcast/config.py:501
MetaModel — Conformaliser Module-level function (behavioural role) market_insights_models/src/commodity_hindcast/models/meta_models/conformalise.py:1
ReferenceYieldLoader ABC (behavioural role) market_insights_models/src/commodity_hindcast/lib/reference_data/loader.py:68
HindcastDelivery Aggregate root (delivery) market_insights_models/src/commodity_hindcast/delivery/schemas.py:227
DeliveryRow Aggregate child (delivery) market_insights_models/src/commodity_hindcast/delivery/schemas.py:109
Check Value object (preflight) market_insights_models/src/commodity_hindcast/run/preflight.py:20
FoldSchedule Value object (dashboard, out of core scope) market_insights_models/src/commodity_hindcast/app/_dashboard_config.py:198

How to use this domain model

As a reader new to commodity_hindcast — start with this README, then read BOUNDED_CONTEXTS.md for the big-picture map of the 11 contexts and the Mermaid context map. Next, read ER_DIAGRAM.md to see the three structure views (configuration, pipeline artefacts, behavioural roles). Then dip into ENTITIES.md and RELATIONSHIPS.md as you encounter unfamiliar names in code. The delta_vs_existing.md page will tell you how this model relates to the in-package DOMAIN_MODEL.md and DOMAIN_MODEL2.md docs you may have already read.

As an LLM ingesting new sources — use ENTITIES.md as the canonical name vocabulary. When reading a new source file that introduces a class, check ENTITIES first: if the class is already catalogued, use the canonical name and tier exactly; if it is absent, propose an addition with the five-field template (Definition, Cardinality, Key attributes, Source citations, Notes). Do not invent names — every entity name in any wiki page must trace back to a source citation in ENTITIES.md.

As a maintainer making structural changes — consult AGGREGATES.md first: does your change cross a consistency boundary (e.g. adding a cross-field validator that spans two aggregates)? Then consult BOUNDED_CONTEXTS.md: does your change introduce coupling across a context boundary, or add a new import edge that would violate the single-direction import DAG rule? The open questions section of BOUNDED_CONTEXTS lists the known boundary ambiguities (conformal helpers in stages/ vs lib/, marketing_year vs season_year, included_geo_identifiers ownership) where changes are most likely to touch a seam.

Open questions

The following open questions were flagged by the actors writing AGGREGATES.md and BOUNDED_CONTEXTS.md.

  • marketing_year vs season_year — WASDE uses a marketing year (Oct–Sep for US grains) that is not fully collapsed into season_year. The Reference Data → Post-processing seam carries an implicit translation that is currently ad hoc; formalising it as an explicit value object would remove the ambiguity.
  • Conformal helpers in stages/ vs lib/delivery/conversions.py imports conformal half-width computation from stages/run_meta_models.py, creating an upward edge from the Delivery context into the Experiment orchestration layer. The correct home is lib/ (e.g. a new lib/conformal/). Until moved, this is the single real layering violation in the package.
  • included_geo_identifiers ownership — computed in the Experiment context (FIT stage), persisted to run_dir, and consumed by Post-processing, Evaluation, and Delivery. It travels as a required kwarg rather than being encapsulated as a first-class property on ExperimentResult.
  • Forecast vs Hindcast static mode boundary — mode is determined statically by ExperimentConfig.forecast being set. If a single workflow ever needs both hindcast and forecast artefacts in one pass beyond run all, the static config split would need rethinking.
  • Dashboard as diagnostic vs operational tool — if app/ ever needs to trigger pipeline stages (e.g. re-deliver from the UI), it would cross from a leaf context into the Experiment orchestration layer, violating the current import DAG rule.