Domain Model — commodity_hindcast¶
This directory holds the formal entity-relationship model of the commodity_hindcast domain. It complements (does not replace) the auto-generated LinkML schema living inside the package at market_insights_models/src/commodity_hindcast/domain-modelling/. See delta_vs_existing.md for what is new here vs. what is already there.
Reading order¶
- ENTITIES.md — canonical entity catalogue (~30 entities across 5 tiers)
- RELATIONSHIPS.md — inter-entity relationships (~73 relationships across 9 sections)
- AGGREGATES.md — DDD aggregates and consistency boundaries (7 aggregates)
- BOUNDED_CONTEXTS.md — DDD bounded contexts (11 contexts) + Mermaid context map
- ER_DIAGRAM.md — three Mermaid ER diagrams (configuration / pipeline artefacts / behavioural roles)
- delta_vs_existing.md — relation to the existing in-package domain model
At a glance¶
The commodity_hindcast pipeline produces walk-forward yield predictions for agricultural commodities (corn, soybean, wheat, cotton, brazil_soybean) at three admin levels (ADM0 / ADM1 / ADM2). Every run is parameterised by a frozen ExperimentConfig — the root aggregate validated atomically at load time — and materialises its artefacts under a timestamped run_dir. The sole hand-off contract between pipeline stages is the filesystem: ExperimentResult is a lazy handle to that directory tree, and no in-memory objects cross stage boundaries. The two modes — hindcast (walk-forward CV producing the audit-grade client time series) and forecast (single init_date prediction reusing the production model) — share one artefact tree and one delivery schema (HindcastDelivery / DeliveryRow). The package has 11 bounded contexts; the one real layering violation currently tracked is delivery/conversions.py importing conformal helpers from stages/run_meta_models.py rather than from lib/. Value objects such as Commodity, SeasonYear, InitDate, Fold, and RunDir are not Python classes — they are string/int/path values and NewType aliases — so the auto-generated LinkML schema omits them; this kb model names them explicitly. The five behavioural roles (FeatureBuilder, Detrender, Regressor, MetaModel, ReferenceYieldLoader) are Protocols and ABCs that the LinkML generator similarly skips.
Quick-reference tables¶
Aggregate roots¶
| Aggregate root | Owned children | Persistence location |
|---|---|---|
ExperimentConfig |
CommodityConfig, ModelConfig, ExperimentProtocolConfig, PostprocessConfig (→ BiasCorrectorConfig), DeliveryConfig, EvaluationConfig, ForecastConfig (optional), list[ReferenceYieldSpec], Builder union, SeasonWindow list, EditRuleConfig list |
<run_dir>/config_resolved.yaml |
ExperimentResult |
tuple[HindcastSlice, ...], tuple[ForecastSlice, ...] |
Filesystem directory <run_dir_base>/<timestamp>_<experiment_name>/ |
HindcastSlice |
detrender.pkl, model.*, feature_fill_values.parquet, train_preds.parquet, walk_forward_preds.parquet, year_data.parquet, optional bias_corrector.pkl |
<run_dir>/models/{commodity}/{fold_label}/ and <run_dir>/preds/{commodity}/{fold_label}/ |
ForecastSlice |
indices.zarr, features/pred.parquet, walk_forward_preds.parquet, year_data.parquet, postprocessed/national.parquet, delivery/*.csv |
<run_dir>/forecast/{season_year}/{init_date}/ |
HindcastDelivery |
list[DeliveryRow], generated_date |
delivery/Treefera_{commodity}_{ADM}_Hindcast_{YYYYMMDD}.csv (one per ADM level) |
ExperimentProtocolConfig + Fold schedule |
cv_strategy, test_years, thresholds |
Embedded in config_resolved.yaml |
| Check list | list[Check] (per preflight call) |
Not persisted; ephemeral gate output |
Bounded contexts¶
| Context | Subpackage(s) | Public surface |
|---|---|---|
| 1. Configuration & Orchestration | config.py, cli.py, configs/*.yaml, lib/path_utils.py, lib/calendar.py |
ExperimentConfig, CommodityConfig, cli Click group |
| 2. Preflight | run/preflight.py |
Check, run_preflight(), preflight_paths_for_* |
| 3. Feature Engineering | features/, lib/edit_and_imputation/, lib/calendar.py |
build_features(), Builder protocol, assemble() |
| 4. Experiment & Modelling | run/, stages/run_fit.py, stages/run_hindcast.py, stages/run_predict.py, models/detrend/, models/regression/, lib/results/ |
ExperimentResult, HindcastSlice, AbstractSlice, train(), run() |
| 5. Post-processing | stages/run_meta_models.py, models/meta_models/ |
postprocess_experiment(), AbstractBiasCorrector, conformal half-width helpers |
| 6. Evaluation & Diagnostics | stages/run_diagnostics.py, diagnostics/ |
evaluate_experiment(), PlotGroup, PlotSpec, PLOT_REGISTRY, gen_metrics |
| 7. Delivery | delivery/, stages/run_deliver.py |
HindcastDelivery, DeliveryRow, deliver_experiment(), walk_forward_preds_to_delivery_rows() |
| 8. Forecast | stages/run_forecast.py, features/forecast_weather.py, features/forecast_long_range_stub.py |
ForecastSlice, run(), run_features(), run_predict(), materialise_forecast_indices() |
| 9. Experiment Tracking | lib/tracking/ |
MLflow run helpers, metadata_<stage>.yaml side-channel |
| 10a. Reference Data | lib/reference_data/ |
load_nass_panel(), load_wasde(), load_conab(), BaseReferenceYieldLoader |
| 10b. Geo & Identifiers | lib/geo/, delivery/geo_normalise.py |
GeoIdentifier NewType, make_geo_identifier(), area_weighted_mean(), select_included_geos() |
| 11. Dashboard | app/ |
app.py Streamlit entry point, run_loader.py |
Top-tier entities¶
| Entity | Kind | Source-of-truth file path |
|---|---|---|
Commodity |
Value (identity string on CommodityConfig) |
market_insights_models/src/commodity_hindcast/config.py:271 |
SeasonYear |
Value (int field) | market_insights_models/src/commodity_hindcast/config.py:285 |
InitDate |
Value (date field) | market_insights_models/src/commodity_hindcast/config.py:668 |
Region |
Value (GeoIdentifier NewType) |
market_insights_models/src/commodity_hindcast/lib/geo/identifiers.py |
Yield |
Value (float columns) | market_insights_models/src/commodity_hindcast/delivery/schemas.py:109 |
Fold |
Value (string key fold_label) |
market_insights_models/src/commodity_hindcast/lib/results/results_slice.py:151 |
ExperimentConfig |
Aggregate root | market_insights_models/src/commodity_hindcast/config.py:573 |
CommodityConfig |
Aggregate child (config) | market_insights_models/src/commodity_hindcast/config.py:271 |
ModelConfig |
Aggregate child (config) | market_insights_models/src/commodity_hindcast/config.py:464 |
FeatureBuilderConfig |
Aggregate child (config, discriminated union) | market_insights_models/src/commodity_hindcast/config.py:165 |
ReferenceYieldSpec |
Aggregate child (config, discriminated union) | market_insights_models/src/commodity_hindcast/lib/reference_data/loader.py:59 |
BiasCorrectorConfig |
Aggregate child (config) | market_insights_models/src/commodity_hindcast/config.py:501 |
ConformalConfig |
Aggregate child (config tuple) | market_insights_models/src/commodity_hindcast/config.py:532 |
EditRuleConfig |
Aggregate child (config, discriminated union) | market_insights_models/src/commodity_hindcast/lib/edit_and_imputation/edit.py:361 |
ForecastConfig |
Aggregate child (optional config) | market_insights_models/src/commodity_hindcast/config.py:552 |
RunDir |
Value (Path field on ExperimentResult) |
market_insights_models/src/commodity_hindcast/lib/results/run_result.py:40 |
ExperimentResult |
Aggregate root | market_insights_models/src/commodity_hindcast/lib/results/run_result.py:31 |
HindcastSlice |
Aggregate child (artefact) | market_insights_models/src/commodity_hindcast/lib/results/results_slice.py:112 |
ForecastSlice |
Aggregate child (artefact) | market_insights_models/src/commodity_hindcast/lib/results/results_slice.py:299 |
CalibrationResult |
Aggregate child (artefact) | market_insights_models/src/commodity_hindcast/models/meta_models/conformalise.py:111 |
FeatureBuilder |
Protocol (behavioural role) | market_insights_models/src/commodity_hindcast/features/builders/interface.py:25 |
Detrender |
ABC (behavioural role) | market_insights_models/src/commodity_hindcast/models/detrend/base.py:21 |
Regressor |
ABC (behavioural role) | market_insights_models/src/commodity_hindcast/models/regression/base.py:9 |
MetaModel — BiasCorrector |
ABC (behavioural role) | market_insights_models/src/commodity_hindcast/config.py:501 |
MetaModel — Conformaliser |
Module-level function (behavioural role) | market_insights_models/src/commodity_hindcast/models/meta_models/conformalise.py:1 |
ReferenceYieldLoader |
ABC (behavioural role) | market_insights_models/src/commodity_hindcast/lib/reference_data/loader.py:68 |
HindcastDelivery |
Aggregate root (delivery) | market_insights_models/src/commodity_hindcast/delivery/schemas.py:227 |
DeliveryRow |
Aggregate child (delivery) | market_insights_models/src/commodity_hindcast/delivery/schemas.py:109 |
Check |
Value object (preflight) | market_insights_models/src/commodity_hindcast/run/preflight.py:20 |
FoldSchedule |
Value object (dashboard, out of core scope) | market_insights_models/src/commodity_hindcast/app/_dashboard_config.py:198 |
How to use this domain model¶
As a reader new to commodity_hindcast — start with this README, then read BOUNDED_CONTEXTS.md for the big-picture map of the 11 contexts and the Mermaid context map. Next, read ER_DIAGRAM.md to see the three structure views (configuration, pipeline artefacts, behavioural roles). Then dip into ENTITIES.md and RELATIONSHIPS.md as you encounter unfamiliar names in code. The delta_vs_existing.md page will tell you how this model relates to the in-package DOMAIN_MODEL.md and DOMAIN_MODEL2.md docs you may have already read.
As an LLM ingesting new sources — use ENTITIES.md as the canonical name vocabulary. When reading a new source file that introduces a class, check ENTITIES first: if the class is already catalogued, use the canonical name and tier exactly; if it is absent, propose an addition with the five-field template (Definition, Cardinality, Key attributes, Source citations, Notes). Do not invent names — every entity name in any wiki page must trace back to a source citation in ENTITIES.md.
As a maintainer making structural changes — consult AGGREGATES.md first: does your change cross a consistency boundary (e.g. adding a cross-field validator that spans two aggregates)? Then consult BOUNDED_CONTEXTS.md: does your change introduce coupling across a context boundary, or add a new import edge that would violate the single-direction import DAG rule? The open questions section of BOUNDED_CONTEXTS lists the known boundary ambiguities (conformal helpers in stages/ vs lib/, marketing_year vs season_year, included_geo_identifiers ownership) where changes are most likely to touch a seam.
Open questions¶
The following open questions were flagged by the actors writing AGGREGATES.md and BOUNDED_CONTEXTS.md.
marketing_yearvsseason_year— WASDE uses a marketing year (Oct–Sep for US grains) that is not fully collapsed intoseason_year. The Reference Data → Post-processing seam carries an implicit translation that is currently ad hoc; formalising it as an explicit value object would remove the ambiguity.- Conformal helpers in
stages/vslib/—delivery/conversions.pyimports conformal half-width computation fromstages/run_meta_models.py, creating an upward edge from the Delivery context into the Experiment orchestration layer. The correct home islib/(e.g. a newlib/conformal/). Until moved, this is the single real layering violation in the package. included_geo_identifiersownership — computed in the Experiment context (FIT stage), persisted torun_dir, and consumed by Post-processing, Evaluation, and Delivery. It travels as a required kwarg rather than being encapsulated as a first-class property onExperimentResult.- Forecast vs Hindcast static mode boundary — mode is determined statically by
ExperimentConfig.forecastbeing set. If a single workflow ever needs both hindcast and forecast artefacts in one pass beyondrun all, the static config split would need rethinking. - Dashboard as diagnostic vs operational tool — if
app/ever needs to trigger pipeline stages (e.g. re-deliver from the UI), it would cross from a leaf context into the Experiment orchestration layer, violating the current import DAG rule.