Skip to content

Bounded Contexts — commodity_hindcast

Overview

A bounded context is a region of the domain within which a particular ubiquitous language and model is internally consistent. The same word can mean different things in different contexts ("yield" in feature engineering is a raw target column; "yield" in delivery is a validated DeliveryRow.mean field in client units). This document identifies the natural context seams in commodity_hindcast/, derived from three complementary lenses: the subpackage layout under market_insights_models/src/commodity_hindcast/, the CLI subcommand structure in cli.py, and the EARS-format stage contracts in DESIGN.md. The existing editorial domain model (domain-modelling/DOMAIN_MODEL2.md) identifies seven contexts; this document refines and extends that list to ten, splitting out Reference Data, Geo & Identifiers, and the Dashboard as first-class contexts and adding an explicit Anti-corruption layer section.

Context map

flowchart LR
  CFG[1 · Configuration]
  PFL[2 · Preflight]
  FEAT[3 · Feature Engineering]
  EXP[4 · Experiment & Modelling]
  POST[5 · Post-processing]
  EVAL[6 · Evaluation & Diagnostics]
  DELIV[7 · Delivery]
  FCST[8 · Forecast]
  TRK[9 · Tracking]
  REF[10a · Reference Data]
  GEO[10b · Geo & Identifiers]
  DASH[11 · Dashboard]

  CFG --> PFL
  CFG --> FEAT
  CFG --> EXP
  CFG --> FCST
  FEAT --> EXP
  FEAT --> FCST
  EXP --> POST
  EXP --> EVAL
  EXP --> FCST
  POST --> EVAL
  POST --> DELIV
  EXP --> DELIV
  FCST --> POST
  FCST --> DELIV
  EXP --> TRK
  FCST --> TRK
  REF --> POST
  REF --> EVAL
  REF --> DELIV
  GEO --> FEAT
  GEO --> EXP
  GEO --> POST
  GEO --> DELIV
  DASH --> EXP
  DASH --> CFG

Dependency direction is left-to-right or top-to-bottom; leaf contexts (DASH, TRK, PFL) consume but are not consumed by pipeline contexts.

Contexts

1. Configuration & Orchestration

Subpackages: config.py, cli.py, Makefile, configs/*.yaml, lib/path_utils.py, lib/calendar.py

Ubiquitous language: ExperimentConfig, CommodityConfig, ResolvablePath, INPUT_DATA_DIR, data_root, experiment_name, run_dir_base, commodity, season_start, MonthDay, SeasonWindow, feature_cols, ci_levels, ForecastConfig, BuilderConfig, ModelConfig, ExperimentProtocolConfig

Public surface: ExperimentConfig (pydantic-settings root, config.py:547); CommodityConfig (config.py:284); the cli Click group entry point (cli.py).

Internal vs published model: The config classes are pure data — no build_* factories, no I/O. Factories live in sibling build.py files beside the types they construct (DESIGN.md line 13). config.py is the DAG root; every other context imports from it, never the reverse.

Boundary contracts: - Input: INPUT_DATA_DIR env var; per-commodity configs/<commodity>_experiment.yaml YAML. - Output: a frozen ExperimentConfig instance consumed by every downstream context. - ResolvablePath fields are resolved against data_root at load time; the require_input_data_dir() helper (lib/path_utils.py) is the sole env-var reader.

Notes: The single-direction import rule (DESIGN.md line 49) makes this context the root of the entire import DAG. No other module may create a back-edge into config.py. cli.py is a leaf that reads from this context but is never imported by it.

2. Preflight

Subpackages: run/preflight.py

Ubiquitous language: Check, run_preflight(), passed, critical, message, preflight_paths_for_<stage>, has_X existence flag, _iter_resolvable_fields

Public surface: Check dataclass (run/preflight.py:20); run_preflight(checks) -> None (run/preflight.py:42); per-stage preflight_paths_for_* functions.

Internal vs published model: Check is a value object (no identity, immutable). run_preflight is a pure gate — it either exits or returns. No persistent state.

Boundary contracts: - Input: list of Check objects built by the calling stage from _iter_resolvable_fields(config). - Output: none on success; SystemExit on any critical failure. - Every stage entry point calls preflight before any compute (DESIGN.md line 69).

Notes: Deliberately kept as a cross-cutting slice. Its only dependency is config.py (for ResolvablePath field traversal). The DESIGN.md rule (line 69) forbids deferring existence validation to I/O helpers — preflight is the only place FileNotFoundError is acceptable.

3. Feature Engineering

Subpackages: features/ (run.py, assemble.py, builders/, forecast_weather.py), lib/edit_and_imputation/, lib/calendar.py

Ubiquitous language: Builder, FeatureFactory, fit.parquet, pred.parquet, metadata.json, assemble(), EditRule, EditReport, gstd (growing-season-to-date), init_date, season_doy, harvest_season_doy, index_cols, feature_cols, target_col, builders_dir, required_for_pred_parquet

Public surface: build_features(cfg, force) (features/run.py:33); Builder protocol with interface.py and registry.py (features/builders/interface.py, features/builders/registry.py); assemble() (features/assemble.py:171).

Internal vs published model: Individual builder modules (yields.py, weather.py, climo.py, ndvi.py, stress.py) are internal. The public contract is the parquet schema: rows keyed by (year, geo_identifier, init_date); metadata.json sidecar declaring column roles.

Boundary contracts: - Input: per-commodity source files (NASS parquet, ERA5 zarr, climo zarr, NDVI, stress) at paths declared in CommodityConfig.builders[*].filepath. - Output: features/{commodity}/fit.parquet, features/{commodity}/pred.parquet, features/{commodity}/metadata.json (README.md pipeline diagram). - The assemble() step writes INDEX_COLS first, then target_col, then feature columns — row alignment is guaranteed by construction (DESIGN.md line 82).

Notes: The EditRule system (lib/edit_and_imputation/edit.py) is an anti-corruption layer within this context — it translates raw external data into the internal clean panel before pivoting. Each Builder applies its own edits: list[EditRuleConfig] in YAML order. The gstd prefix convention (DESIGN.md line 45) is enforced here, not downstream.

4. Experiment & Modelling

Subpackages: run/ (runner.py, experiment_protocol.py), stages/run_fit.py, stages/run_hindcast.py, stages/run_predict.py, models/detrend/, models/regression/, lib/results/

Ubiquitous language: ExperimentResult, HindcastSlice, AbstractSlice, fold_label, cutoff, ExpandingFoldGenerator, train_preds.parquet, walk_forward_preds.parquet, year_data.parquet, detrender, regressor, run_dir, included_geo_identifiers, production fold, sim_yield_kg_ha, TrendAxis, AbstractDetrend, AbstractRegressionImpl

Public surface: ExperimentResult (lib/results/run_result.py:32); HindcastSlice (lib/results/results_slice.py:112); AbstractSlice protocol (lib/results/results_slice.py:73); train(train_data, fold_label, config) (stages/run_fit.py); run(config_path) + fit_production(config_path) (stages/run_hindcast.py).

Internal vs published model: The run_dir is the published model — a filesystem layout consumed by downstream contexts. The in-memory ExperimentResult is a lazy handle to that layout; it carries no computed data (DESIGN.md line 99). Detrender and regressor registries (models/detrend/, models/regression/) are internal dispatch details.

Boundary contracts: - Input: features/{commodity}/{fit,pred}.parquet from Feature Engineering. - Output: run_dir/models/{commodity}/{fold_label}/ (detrender, regressor, fill values); run_dir/preds/{commodity}/{fold_label}/ (train_preds, walk_forward_preds, year_data). - included_geo_identifiers persisted as run_dir/included_geo_identifiers.json and threaded as a required kwarg through the entire evaluation chain (DESIGN.md line 113). - No stage imports another stage's internals — cross-stage reuse is via lib/ helpers only (DESIGN.md line 99).

Notes: The FIT stage is deliberately zero-metrics, zero-plots (DESIGN.md line 101). All four phases (FIT, POSTPROCESS, EVALUATE, DELIVER) are independently re-runnable against the same run_dir. The walk-forward loop fits the model once per season_year and reuses frozen coefficients across all init_dates for that year (DESIGN.md lines 91–92).

5. Post-processing

Subpackages: stages/run_meta_models.py, models/meta_models/ (bias_correction.py, conformalise.py)

Ubiquitous language: postprocess_experiment, bias_corrector, NoBiasCorrector, CoverageBiasCorrector, conformal interval, conformal half-width, coverage, selection bias correction, postprocessed/{commodity}_national.parquet, SUPPORTED_CI_LEVELS, area-weighted mean

Public surface: postprocess_experiment(run_root, included_geo_identifiers) (stages/run_meta_models.py:89); AbstractBiasCorrector protocol; compute_conformal_half_widths_from_training, compute_pooled_conformal_half_widths_from_training (consumed by Delivery context — see Anti-corruption layers below).

Internal vs published model: The bias corrector pickle and the postprocessed/{commodity}_national.parquet are published. Internal implementation of conformal calibration (residual quantile accumulation, fold sorting) is opaque to downstream contexts.

Boundary contracts: - Input: HindcastSlice.walk_forward_preds_path per fold from ExperimentResult; NASS reference panel (loaded once via lib/reference_data/nass.py). - Output: run_dir/postprocessed/{commodity}_national.parquet; per-fold bias_corrector.pkl at models/{commodity}/{fold_label}/bias_corrector.pkl.

Notes: Aggregation from ADM2 to ADM0 happens here using area-weighted means (lib/geo/aggregation.py); unweighted averaging is explicitly forbidden (DESIGN.md line 122). CI levels must be members of SUPPORTED_CI_LEVELS and monotonically ordered (DOMAIN_MODEL2.md §6).

6. Evaluation & Diagnostics

Subpackages: stages/run_diagnostics.py, diagnostics/ (metrics.py, runners.py, plots/)

Ubiquitous language: evaluate_experiment, PlotRunner, PlotGroup, PlotSpec, PLOT_REGISTRY, prepare_data, resolve_kwargs, gen_metrics, compute_marginal_pdp, reports/, metrics_table.csv, rolling_forecast.png, improvement_heatmap.png

Public surface: evaluate_experiment(run_root) (stages/run_diagnostics.py:12); PlotGroup / PlotSpec / PLOT_REGISTRY (diagnostics/plots/registry.py); gen_metrics (diagnostics/metrics.py).

Internal vs published model: Individual plot functions are internal — they accept a flat DataFrame and return a Figure (no I/O, no config access). Only PlotRunner owns side-effects (disk writes, MLflow logging). Plot functions are read-only consumers; they never call predict() except for PDPs that genuinely require perturbed inputs (DESIGN.md line 106).

Boundary contracts: - Input: ExperimentResult (lazy handle); postprocessed/{commodity}_national.parquet; reference data via lib/reference_data/. - Output: run_dir/reports/**/*.png; run_dir/reports/metrics_table.csv and related tabular artefacts; all logged as MLflow artefacts. - Read-only with respect to run_dir model artefacts — evaluate_experiment is a consumer, not a producer of predictions.

Notes: gen_metrics is the only consumer whose parameter is widened to AbstractSlice (polymorphic over hindcast folds and forecast slices). All other consumers in this context stay typed against HindcastSlice (DOMAIN_MODEL.md §7.2).

7. Delivery

Subpackages: delivery/ (schemas.py, conversions.py, geo_normalise.py, export.py), stages/run_deliver.py

Ubiquitous language: HindcastDelivery, DeliveryRow, deliver_experiment, walk_forward_preds_to_delivery_rows, delivery_to_dataframe, enforce_ci_narrowing, drop_frozen_tail, mean, lower_95, upper_95, nass_actual, wasde_in_season, weather_correction_bu_ac, variable, model, generated_date, ADM level, Treefera_{commodity}_{ADM}_Hindcast_{YYYYMMDD}.csv

Public surface: HindcastDelivery and DeliveryRow (delivery/schemas.py); deliver_experiment(run_root) (stages/run_deliver.py:40); walk_forward_preds_to_delivery_rows(result, level, ci_levels, mode) (delivery/conversions.py:136).

Internal vs published model: DeliveryRow is the published model — a pydantic model with explicit schema invariants enforced at construction. Column names and units in the CSV are the client-facing contract; internal _kg_ha columns are never written to delivery files.

Boundary contracts: - Input: walk_forward_preds.parquet from each fold; postprocessed/{commodity}_national.parquet; NASS and WASDE benchmarks via lib/reference_data/. - Output: per-ADM-level delivery/Treefera_*.csv; also run_dir/forecast/{init_date}/Treefera_*_Forecast_*.csv for the forecast path. - Invariants on DeliveryRow (delivery/schemas.py): CI-band ordering (lower_95 ≤ … ≤ mean ≤ … ≤ upper_95), ISO-8601 dates, init_date year within ±1 of harvest year, no duplicate (year, init_date, geo_identifier), equal init_date count per (year, geo). - Unit conversion from kg/ha to bu/ac or lbs/ac happens only here, nowhere else (DESIGN.md line 116).

Notes: ADM1/ADM2 aggregations from county-level predictions happen in deliver_experiment, not in Post-processing. This means the Delivery context is the only place that resolves what "state-level yield" means for the client.

8. Forecast

Subpackages: stages/run_forecast.py, features/forecast_weather.py, features/forecast_long_range_stub.py, run/ (shared with Experiment context)

Ubiquitous language: ForecastSlice, materialise_forecast_indices, build_forecast_features, run_forecast_features, run_forecast_predict, indices.zarr, raw_obs_filepath, materialised_climo_filepath, splice, init_date, season_year, forecast/{init_date}/, run_features, run_predict

Public surface: ForecastSlice (lib/results/results_slice.py:299); run(run_dir, season_year, init_date), run_features(...), run_predict(...) in stages/run_forecast.py; materialise_forecast_indices(config, results) (features/forecast_weather.py).

Internal vs published model: The run_dir/forecast/{init_date}/ subtree is the published model — a self-contained directory that can be rerun, archived, or diffed. The splice logic (observed ERA5 up to init_date + materialised climatology beyond) is an internal implementation detail.

Boundary contracts: - Input: the production HindcastSlice model artefacts from Experiment context (borrowed via ForecastSlice.training); forecast.raw_obs_filepath (ERA5 zarr); forecast.materialised_climo_filepath (materialised climatology zarr); canonical pred.parquet for area imputation. - Output: run_dir/forecast/{init_date}/indices.zarr; …/features/pred.parquet; …/walk_forward_preds.parquet; …/postprocessed_{init_date}.parquet; …/Treefera_*_Forecast_*.csv. - The forecast pipeline must not write to canonical hindcast artefacts under features_dir/{commodity}/ (DESIGN.md line 124). - init_date must match exactly an init_date present in pred.parquet; no fuzzy resolution (DESIGN.md line 89).

Notes: Forecast features are built into an isolated per-init directory, not into the shared features_dir. The hindcast hindcast_init_season_doys grid governs hindcast scoring only; forecast accepts any calendar date (DESIGN.md line 93).

9. Experiment Tracking

Subpackages: lib/tracking/, mlflow_*.py (if present at top level)

Ubiquitous language: mlflow_run_id, mlflow_tracking_uri, run_name, data_hash, config_resolved.yaml, metadata_<stage>.yaml, log_data_hash, mlflow.autolog, resume mode, nested run

Public surface: helpers in lib/tracking/; the MLflow run id persisted in metadata_<stage>.yaml; the run_dir tag on the MLflow run.

Internal vs published model: The metadata_<stage>.yaml is a published side-channel — it enables ExperimentResult.from_run_dir to look up the MLflow run and enables run forecast --run-dir to resume the same MLflow run across multiple init_date invocations (README.md MLflow section).

Boundary contracts: - Input: any stage output (artefacts logged via mlflow.log_artifacts). - Output: MLflow run with tags (git_commit, run_dir), params (full resolved config), and artefacts (reports/, config_resolved.yaml). - Cross-cutting: every run hindcast, run fit-production, and run forecast opens or resumes a run; run forecast --run-dir resumes rather than creates (DESIGN.md line 23).

Notes: MLflow is a hard dependency (mlflow >= 3, DESIGN.md line 23). Pure-transform stages (run features, run diagnostics) use Config directly without RunRunner to avoid tracking overhead (DESIGN.md line 25).

10a. Reference Data

Subpackages: lib/reference_data/ (nass.py, wasde.py, conab.py, nass_benchmarks.py, base_reference_yield_loader.py, loader.py)

Ubiquitous language: NASS, WASDE, CONAB, marketing_year, nass_actual, wasde_in_season, nass_benchmarks, reference_yield, area_harvested_ha, production_kg, ReferenceYieldLoader

Public surface: load_nass_panel(config), load_wasde(config), load_conab(config) and the base_reference_yield_loader.BaseReferenceYieldLoader abstraction.

Internal vs published model: The loader returns standardised DataFrames keyed by (year, geo_identifier) in internal units (kg/ha). Source-specific formats (NASS bu/ac, WASDE marketing-year alignment) are translated here and never leak into the consuming contexts.

Boundary contracts: - Input: raw NASS parquet, WASDE CSV, CONAB files at paths declared in ExperimentConfig.reference_data[*].filepath. - Output: standardised DataFrames consumed by Post-processing (bias correction, NASS panel), Evaluation (benchmark comparison), and Delivery (benchmark columns in DeliveryRow).

Notes: The marketing_year concept is used by WASDE alignment but is a known fuzzy seam — it should eventually be modelled as a subset of season_year rather than a parallel concept (DOMAIN_MODEL2.md §9).

10b. Geo & Identifiers

Subpackages: lib/geo/ (identifiers.py, aggregation.py, selection.py), delivery/geo_normalise.py

Ubiquitous language: GeoIdentifier (NewType), geo_identifier, AggregationLevel, ADM0/ADM1/ADM2, make_geo_identifier, normalise_geo_identifier, area_weighted_mean, included_geo_identifiers, FitAggregationPolicy, production_cumulative_threshold

Public surface: GeoIdentifier NewType alias (lib/geo/identifiers.py); make_geo_identifier, normalise_geo_identifier; area_weighted_mean(df, weight_col) (lib/geo/aggregation.py); select_included_geos(fit_data, threshold) (lib/geo/selection.py).

Internal vs published model: GeoIdentifier is a NewType("GeoIdentifier", str) — a plain str at runtime matching the regex ^ADM0:[a-z0-9]+(/ADM1:.+(/ADM2:.+)?)?$. No separate level field is stored; the ADM level is inferred from the prefix (DOMAIN_MODEL.md §3.1).

Boundary contracts: - Input: raw geo keys from any source (FIPS integers, mixed-case names, NASS state/county strings). - Output: canonical lowercase ADM0:usa/ADM1:{state}/ADM2:{county} strings used as join keys on every artefact across the entire pipeline. - area_weighted_mean raises loudly if any NaN weight slips through, guaranteeing that missed county-area cells surface as exceptions rather than degraded numbers (DESIGN.md line 127).

Notes: Legacy prefix handling (country/state/county → ADM) belongs solely in geobounds_processor.py (project memory note), not in identifiers.py. The FIPS-to-ADM translation is a one-way anti-corruption layer at the input boundary only.

11. Dashboard

Subpackages: app/ (app.py, app_utils.py, charts.py, charts_evolution.py, _chart_helpers.py, _dashboard_config.py, _eval_shim.py, run_loader.py)

Ubiquitous language: FoldSchedule, run_loader, eval_shim, PlotGroup, charts_evolution, dashboard_config, Streamlit app

Public surface: app.py (Streamlit entry point); run_loader.py (loads ExperimentResult for display).

Internal vs published model: The Dashboard is a read-only consumer of ExperimentResult and ExperimentConfig. It has no write path into the pipeline artefact tree.

Boundary contracts: - Input: an existing run_dir discovered at startup; ExperimentConfig loaded via require_input_data_dir(). - Output: browser-rendered visualisations only; no filesystem writes. - app/ is intentionally a parallel leaf entry point — nothing in the package imports from it (DOMAIN_MODEL.md §8.1 rule 8).

Notes: _eval_shim.py exists to avoid importing the full Streamlit context at import time. FoldSchedule (_dashboard_config.py:198) is a dashboard-internal value object describing per-commodity fold calendars.

Anti-corruption layers

The following translation steps prevent one context's model from leaking raw representations into another.

Translation From context To context Location
Raw NASS bu/ac → internal kg/ha Reference Data (raw source) Feature Engineering features/builders/yields.pybu_acre_to_kg_ha() via lib/unit_utils.py
FIPS / mixed-case names → GeoIdentifier ADM path External raw data Geo & Identifiers (then everywhere) lib/geo/identifiers.pynormalise_geo_identifier()
Internal kg/ha → client delivery units (bu/ac, lbs/ac) Experiment / Post-processing Delivery delivery/schemas.pyDeliveryRow validators; delivery/conversions.py rename pass
WASDE marketing-year alignment → internal season_year Reference Data (WASDE CSV) Post-processing / Delivery lib/reference_data/wasde.py
Observed ERA5 + materialised climatology → unified forecast feature matrix Forecast (external obs + climo) Feature Engineering (forecast path) features/forecast_weather.pymaterialise_forecast_indices()
delivery/conversions.py consuming conformal helpers from stages/run_meta_models.py Post-processing internals Delivery delivery/conversions.py imports compute_conformal_half_widths_from_training from stages/run_meta_models.pytracked tech-debt (DOMAIN_MODEL.md §8.1 rule 4); these helpers should migrate to lib/
delivery/export.py consuming preflight helpers from run/preflight.py Preflight internals Delivery delivery/export.py imports preflight_paths_for_export from run/preflight.pytracked tech-debt (same source); should migrate to lib/

Open questions on context boundaries

  1. marketing_year vs season_year — WASDE uses a marketing year (Oct–Sep for US grains) that is not fully collapsed into season_year. The Reference Data → Post-processing seam carries an implicit translation that is currently ad hoc. Formalising the mapping as an explicit value object would remove the ambiguity (DOMAIN_MODEL2.md §9).

  2. Conformal helpers in stages/ vs lib/delivery/conversions.py imports conformal half-width computation from stages/run_meta_models.py, creating an upward edge from the Delivery context into the Experiment orchestration layer. The correct home for these helpers is lib/ (Post-processing or a new lib/conformal/). Until moved, this is the single real layering violation in the package.

  3. Forecast vs Hindcast mode boundary — Mode is determined statically by whether ExperimentConfig.forecast is set (DESIGN.md line 504). There is no runtime-toggle path. If a single workflow ever needs to produce both hindcast and forecast artefacts in one pass (beyond run all), the static config split would need rethinking.

  4. included_geo_identifiers ownership — This frozenset is computed in the Experiment context (FIT stage), persisted to run_dir, and consumed by Post-processing, Evaluation, and Delivery. It currently travels as a required kwarg rather than being encapsulated within ExperimentResult. Whether it should become a first-class property on ExperimentResult is an open design question.

  5. Dashboard as diagnostic vs operational toolapp/ currently imports from lib/results/ and config.py directly. If the Dashboard ever needs to trigger pipeline stages (e.g. re-deliver from the UI), it would need to cross from a leaf context into the Experiment orchestration layer, which would violate the current import DAG rule.