Bounded Contexts — commodity_hindcast¶
Overview¶
A bounded context is a region of the domain within which a particular ubiquitous language and model is internally consistent. The same word can mean different things in different contexts ("yield" in feature engineering is a raw target column; "yield" in delivery is a validated DeliveryRow.mean field in client units). This document identifies the natural context seams in commodity_hindcast/, derived from three complementary lenses: the subpackage layout under market_insights_models/src/commodity_hindcast/, the CLI subcommand structure in cli.py, and the EARS-format stage contracts in DESIGN.md. The existing editorial domain model (domain-modelling/DOMAIN_MODEL2.md) identifies seven contexts; this document refines and extends that list to ten, splitting out Reference Data, Geo & Identifiers, and the Dashboard as first-class contexts and adding an explicit Anti-corruption layer section.
Context map¶
flowchart LR
CFG[1 · Configuration]
PFL[2 · Preflight]
FEAT[3 · Feature Engineering]
EXP[4 · Experiment & Modelling]
POST[5 · Post-processing]
EVAL[6 · Evaluation & Diagnostics]
DELIV[7 · Delivery]
FCST[8 · Forecast]
TRK[9 · Tracking]
REF[10a · Reference Data]
GEO[10b · Geo & Identifiers]
DASH[11 · Dashboard]
CFG --> PFL
CFG --> FEAT
CFG --> EXP
CFG --> FCST
FEAT --> EXP
FEAT --> FCST
EXP --> POST
EXP --> EVAL
EXP --> FCST
POST --> EVAL
POST --> DELIV
EXP --> DELIV
FCST --> POST
FCST --> DELIV
EXP --> TRK
FCST --> TRK
REF --> POST
REF --> EVAL
REF --> DELIV
GEO --> FEAT
GEO --> EXP
GEO --> POST
GEO --> DELIV
DASH --> EXP
DASH --> CFG
Dependency direction is left-to-right or top-to-bottom; leaf contexts (DASH, TRK, PFL) consume but are not consumed by pipeline contexts.
Contexts¶
1. Configuration & Orchestration¶
Subpackages: config.py, cli.py, Makefile, configs/*.yaml, lib/path_utils.py, lib/calendar.py
Ubiquitous language: ExperimentConfig, CommodityConfig, ResolvablePath, INPUT_DATA_DIR, data_root, experiment_name, run_dir_base, commodity, season_start, MonthDay, SeasonWindow, feature_cols, ci_levels, ForecastConfig, BuilderConfig, ModelConfig, ExperimentProtocolConfig
Public surface: ExperimentConfig (pydantic-settings root, config.py:547); CommodityConfig (config.py:284); the cli Click group entry point (cli.py).
Internal vs published model: The config classes are pure data — no build_* factories, no I/O. Factories live in sibling build.py files beside the types they construct (DESIGN.md line 13). config.py is the DAG root; every other context imports from it, never the reverse.
Boundary contracts:
- Input: INPUT_DATA_DIR env var; per-commodity configs/<commodity>_experiment.yaml YAML.
- Output: a frozen ExperimentConfig instance consumed by every downstream context.
- ResolvablePath fields are resolved against data_root at load time; the require_input_data_dir() helper (lib/path_utils.py) is the sole env-var reader.
Notes: The single-direction import rule (DESIGN.md line 49) makes this context the root of the entire import DAG. No other module may create a back-edge into config.py. cli.py is a leaf that reads from this context but is never imported by it.
2. Preflight¶
Subpackages: run/preflight.py
Ubiquitous language: Check, run_preflight(), passed, critical, message, preflight_paths_for_<stage>, has_X existence flag, _iter_resolvable_fields
Public surface: Check dataclass (run/preflight.py:20); run_preflight(checks) -> None (run/preflight.py:42); per-stage preflight_paths_for_* functions.
Internal vs published model: Check is a value object (no identity, immutable). run_preflight is a pure gate — it either exits or returns. No persistent state.
Boundary contracts:
- Input: list of Check objects built by the calling stage from _iter_resolvable_fields(config).
- Output: none on success; SystemExit on any critical failure.
- Every stage entry point calls preflight before any compute (DESIGN.md line 69).
Notes: Deliberately kept as a cross-cutting slice. Its only dependency is config.py (for ResolvablePath field traversal). The DESIGN.md rule (line 69) forbids deferring existence validation to I/O helpers — preflight is the only place FileNotFoundError is acceptable.
3. Feature Engineering¶
Subpackages: features/ (run.py, assemble.py, builders/, forecast_weather.py), lib/edit_and_imputation/, lib/calendar.py
Ubiquitous language: Builder, FeatureFactory, fit.parquet, pred.parquet, metadata.json, assemble(), EditRule, EditReport, gstd (growing-season-to-date), init_date, season_doy, harvest_season_doy, index_cols, feature_cols, target_col, builders_dir, required_for_pred_parquet
Public surface: build_features(cfg, force) (features/run.py:33); Builder protocol with interface.py and registry.py (features/builders/interface.py, features/builders/registry.py); assemble() (features/assemble.py:171).
Internal vs published model: Individual builder modules (yields.py, weather.py, climo.py, ndvi.py, stress.py) are internal. The public contract is the parquet schema: rows keyed by (year, geo_identifier, init_date); metadata.json sidecar declaring column roles.
Boundary contracts:
- Input: per-commodity source files (NASS parquet, ERA5 zarr, climo zarr, NDVI, stress) at paths declared in CommodityConfig.builders[*].filepath.
- Output: features/{commodity}/fit.parquet, features/{commodity}/pred.parquet, features/{commodity}/metadata.json (README.md pipeline diagram).
- The assemble() step writes INDEX_COLS first, then target_col, then feature columns — row alignment is guaranteed by construction (DESIGN.md line 82).
Notes: The EditRule system (lib/edit_and_imputation/edit.py) is an anti-corruption layer within this context — it translates raw external data into the internal clean panel before pivoting. Each Builder applies its own edits: list[EditRuleConfig] in YAML order. The gstd prefix convention (DESIGN.md line 45) is enforced here, not downstream.
4. Experiment & Modelling¶
Subpackages: run/ (runner.py, experiment_protocol.py), stages/run_fit.py, stages/run_hindcast.py, stages/run_predict.py, models/detrend/, models/regression/, lib/results/
Ubiquitous language: ExperimentResult, HindcastSlice, AbstractSlice, fold_label, cutoff, ExpandingFoldGenerator, train_preds.parquet, walk_forward_preds.parquet, year_data.parquet, detrender, regressor, run_dir, included_geo_identifiers, production fold, sim_yield_kg_ha, TrendAxis, AbstractDetrend, AbstractRegressionImpl
Public surface: ExperimentResult (lib/results/run_result.py:32); HindcastSlice (lib/results/results_slice.py:112); AbstractSlice protocol (lib/results/results_slice.py:73); train(train_data, fold_label, config) (stages/run_fit.py); run(config_path) + fit_production(config_path) (stages/run_hindcast.py).
Internal vs published model: The run_dir is the published model — a filesystem layout consumed by downstream contexts. The in-memory ExperimentResult is a lazy handle to that layout; it carries no computed data (DESIGN.md line 99). Detrender and regressor registries (models/detrend/, models/regression/) are internal dispatch details.
Boundary contracts:
- Input: features/{commodity}/{fit,pred}.parquet from Feature Engineering.
- Output: run_dir/models/{commodity}/{fold_label}/ (detrender, regressor, fill values); run_dir/preds/{commodity}/{fold_label}/ (train_preds, walk_forward_preds, year_data).
- included_geo_identifiers persisted as run_dir/included_geo_identifiers.json and threaded as a required kwarg through the entire evaluation chain (DESIGN.md line 113).
- No stage imports another stage's internals — cross-stage reuse is via lib/ helpers only (DESIGN.md line 99).
Notes: The FIT stage is deliberately zero-metrics, zero-plots (DESIGN.md line 101). All four phases (FIT, POSTPROCESS, EVALUATE, DELIVER) are independently re-runnable against the same run_dir. The walk-forward loop fits the model once per season_year and reuses frozen coefficients across all init_dates for that year (DESIGN.md lines 91–92).
5. Post-processing¶
Subpackages: stages/run_meta_models.py, models/meta_models/ (bias_correction.py, conformalise.py)
Ubiquitous language: postprocess_experiment, bias_corrector, NoBiasCorrector, CoverageBiasCorrector, conformal interval, conformal half-width, coverage, selection bias correction, postprocessed/{commodity}_national.parquet, SUPPORTED_CI_LEVELS, area-weighted mean
Public surface: postprocess_experiment(run_root, included_geo_identifiers) (stages/run_meta_models.py:89); AbstractBiasCorrector protocol; compute_conformal_half_widths_from_training, compute_pooled_conformal_half_widths_from_training (consumed by Delivery context — see Anti-corruption layers below).
Internal vs published model: The bias corrector pickle and the postprocessed/{commodity}_national.parquet are published. Internal implementation of conformal calibration (residual quantile accumulation, fold sorting) is opaque to downstream contexts.
Boundary contracts:
- Input: HindcastSlice.walk_forward_preds_path per fold from ExperimentResult; NASS reference panel (loaded once via lib/reference_data/nass.py).
- Output: run_dir/postprocessed/{commodity}_national.parquet; per-fold bias_corrector.pkl at models/{commodity}/{fold_label}/bias_corrector.pkl.
Notes: Aggregation from ADM2 to ADM0 happens here using area-weighted means (lib/geo/aggregation.py); unweighted averaging is explicitly forbidden (DESIGN.md line 122). CI levels must be members of SUPPORTED_CI_LEVELS and monotonically ordered (DOMAIN_MODEL2.md §6).
6. Evaluation & Diagnostics¶
Subpackages: stages/run_diagnostics.py, diagnostics/ (metrics.py, runners.py, plots/)
Ubiquitous language: evaluate_experiment, PlotRunner, PlotGroup, PlotSpec, PLOT_REGISTRY, prepare_data, resolve_kwargs, gen_metrics, compute_marginal_pdp, reports/, metrics_table.csv, rolling_forecast.png, improvement_heatmap.png
Public surface: evaluate_experiment(run_root) (stages/run_diagnostics.py:12); PlotGroup / PlotSpec / PLOT_REGISTRY (diagnostics/plots/registry.py); gen_metrics (diagnostics/metrics.py).
Internal vs published model: Individual plot functions are internal — they accept a flat DataFrame and return a Figure (no I/O, no config access). Only PlotRunner owns side-effects (disk writes, MLflow logging). Plot functions are read-only consumers; they never call predict() except for PDPs that genuinely require perturbed inputs (DESIGN.md line 106).
Boundary contracts:
- Input: ExperimentResult (lazy handle); postprocessed/{commodity}_national.parquet; reference data via lib/reference_data/.
- Output: run_dir/reports/**/*.png; run_dir/reports/metrics_table.csv and related tabular artefacts; all logged as MLflow artefacts.
- Read-only with respect to run_dir model artefacts — evaluate_experiment is a consumer, not a producer of predictions.
Notes: gen_metrics is the only consumer whose parameter is widened to AbstractSlice (polymorphic over hindcast folds and forecast slices). All other consumers in this context stay typed against HindcastSlice (DOMAIN_MODEL.md §7.2).
7. Delivery¶
Subpackages: delivery/ (schemas.py, conversions.py, geo_normalise.py, export.py), stages/run_deliver.py
Ubiquitous language: HindcastDelivery, DeliveryRow, deliver_experiment, walk_forward_preds_to_delivery_rows, delivery_to_dataframe, enforce_ci_narrowing, drop_frozen_tail, mean, lower_95, upper_95, nass_actual, wasde_in_season, weather_correction_bu_ac, variable, model, generated_date, ADM level, Treefera_{commodity}_{ADM}_Hindcast_{YYYYMMDD}.csv
Public surface: HindcastDelivery and DeliveryRow (delivery/schemas.py); deliver_experiment(run_root) (stages/run_deliver.py:40); walk_forward_preds_to_delivery_rows(result, level, ci_levels, mode) (delivery/conversions.py:136).
Internal vs published model: DeliveryRow is the published model — a pydantic model with explicit schema invariants enforced at construction. Column names and units in the CSV are the client-facing contract; internal _kg_ha columns are never written to delivery files.
Boundary contracts:
- Input: walk_forward_preds.parquet from each fold; postprocessed/{commodity}_national.parquet; NASS and WASDE benchmarks via lib/reference_data/.
- Output: per-ADM-level delivery/Treefera_*.csv; also run_dir/forecast/{init_date}/Treefera_*_Forecast_*.csv for the forecast path.
- Invariants on DeliveryRow (delivery/schemas.py): CI-band ordering (lower_95 ≤ … ≤ mean ≤ … ≤ upper_95), ISO-8601 dates, init_date year within ±1 of harvest year, no duplicate (year, init_date, geo_identifier), equal init_date count per (year, geo).
- Unit conversion from kg/ha to bu/ac or lbs/ac happens only here, nowhere else (DESIGN.md line 116).
Notes: ADM1/ADM2 aggregations from county-level predictions happen in deliver_experiment, not in Post-processing. This means the Delivery context is the only place that resolves what "state-level yield" means for the client.
8. Forecast¶
Subpackages: stages/run_forecast.py, features/forecast_weather.py, features/forecast_long_range_stub.py, run/ (shared with Experiment context)
Ubiquitous language: ForecastSlice, materialise_forecast_indices, build_forecast_features, run_forecast_features, run_forecast_predict, indices.zarr, raw_obs_filepath, materialised_climo_filepath, splice, init_date, season_year, forecast/{init_date}/, run_features, run_predict
Public surface: ForecastSlice (lib/results/results_slice.py:299); run(run_dir, season_year, init_date), run_features(...), run_predict(...) in stages/run_forecast.py; materialise_forecast_indices(config, results) (features/forecast_weather.py).
Internal vs published model: The run_dir/forecast/{init_date}/ subtree is the published model — a self-contained directory that can be rerun, archived, or diffed. The splice logic (observed ERA5 up to init_date + materialised climatology beyond) is an internal implementation detail.
Boundary contracts:
- Input: the production HindcastSlice model artefacts from Experiment context (borrowed via ForecastSlice.training); forecast.raw_obs_filepath (ERA5 zarr); forecast.materialised_climo_filepath (materialised climatology zarr); canonical pred.parquet for area imputation.
- Output: run_dir/forecast/{init_date}/indices.zarr; …/features/pred.parquet; …/walk_forward_preds.parquet; …/postprocessed_{init_date}.parquet; …/Treefera_*_Forecast_*.csv.
- The forecast pipeline must not write to canonical hindcast artefacts under features_dir/{commodity}/ (DESIGN.md line 124).
- init_date must match exactly an init_date present in pred.parquet; no fuzzy resolution (DESIGN.md line 89).
Notes: Forecast features are built into an isolated per-init directory, not into the shared features_dir. The hindcast hindcast_init_season_doys grid governs hindcast scoring only; forecast accepts any calendar date (DESIGN.md line 93).
9. Experiment Tracking¶
Subpackages: lib/tracking/, mlflow_*.py (if present at top level)
Ubiquitous language: mlflow_run_id, mlflow_tracking_uri, run_name, data_hash, config_resolved.yaml, metadata_<stage>.yaml, log_data_hash, mlflow.autolog, resume mode, nested run
Public surface: helpers in lib/tracking/; the MLflow run id persisted in metadata_<stage>.yaml; the run_dir tag on the MLflow run.
Internal vs published model: The metadata_<stage>.yaml is a published side-channel — it enables ExperimentResult.from_run_dir to look up the MLflow run and enables run forecast --run-dir to resume the same MLflow run across multiple init_date invocations (README.md MLflow section).
Boundary contracts:
- Input: any stage output (artefacts logged via mlflow.log_artifacts).
- Output: MLflow run with tags (git_commit, run_dir), params (full resolved config), and artefacts (reports/, config_resolved.yaml).
- Cross-cutting: every run hindcast, run fit-production, and run forecast opens or resumes a run; run forecast --run-dir resumes rather than creates (DESIGN.md line 23).
Notes: MLflow is a hard dependency (mlflow >= 3, DESIGN.md line 23). Pure-transform stages (run features, run diagnostics) use Config directly without RunRunner to avoid tracking overhead (DESIGN.md line 25).
10a. Reference Data¶
Subpackages: lib/reference_data/ (nass.py, wasde.py, conab.py, nass_benchmarks.py, base_reference_yield_loader.py, loader.py)
Ubiquitous language: NASS, WASDE, CONAB, marketing_year, nass_actual, wasde_in_season, nass_benchmarks, reference_yield, area_harvested_ha, production_kg, ReferenceYieldLoader
Public surface: load_nass_panel(config), load_wasde(config), load_conab(config) and the base_reference_yield_loader.BaseReferenceYieldLoader abstraction.
Internal vs published model: The loader returns standardised DataFrames keyed by (year, geo_identifier) in internal units (kg/ha). Source-specific formats (NASS bu/ac, WASDE marketing-year alignment) are translated here and never leak into the consuming contexts.
Boundary contracts:
- Input: raw NASS parquet, WASDE CSV, CONAB files at paths declared in ExperimentConfig.reference_data[*].filepath.
- Output: standardised DataFrames consumed by Post-processing (bias correction, NASS panel), Evaluation (benchmark comparison), and Delivery (benchmark columns in DeliveryRow).
Notes: The marketing_year concept is used by WASDE alignment but is a known fuzzy seam — it should eventually be modelled as a subset of season_year rather than a parallel concept (DOMAIN_MODEL2.md §9).
10b. Geo & Identifiers¶
Subpackages: lib/geo/ (identifiers.py, aggregation.py, selection.py), delivery/geo_normalise.py
Ubiquitous language: GeoIdentifier (NewType), geo_identifier, AggregationLevel, ADM0/ADM1/ADM2, make_geo_identifier, normalise_geo_identifier, area_weighted_mean, included_geo_identifiers, FitAggregationPolicy, production_cumulative_threshold
Public surface: GeoIdentifier NewType alias (lib/geo/identifiers.py); make_geo_identifier, normalise_geo_identifier; area_weighted_mean(df, weight_col) (lib/geo/aggregation.py); select_included_geos(fit_data, threshold) (lib/geo/selection.py).
Internal vs published model: GeoIdentifier is a NewType("GeoIdentifier", str) — a plain str at runtime matching the regex ^ADM0:[a-z0-9]+(/ADM1:.+(/ADM2:.+)?)?$. No separate level field is stored; the ADM level is inferred from the prefix (DOMAIN_MODEL.md §3.1).
Boundary contracts:
- Input: raw geo keys from any source (FIPS integers, mixed-case names, NASS state/county strings).
- Output: canonical lowercase ADM0:usa/ADM1:{state}/ADM2:{county} strings used as join keys on every artefact across the entire pipeline.
- area_weighted_mean raises loudly if any NaN weight slips through, guaranteeing that missed county-area cells surface as exceptions rather than degraded numbers (DESIGN.md line 127).
Notes: Legacy prefix handling (country/state/county → ADM) belongs solely in geobounds_processor.py (project memory note), not in identifiers.py. The FIPS-to-ADM translation is a one-way anti-corruption layer at the input boundary only.
11. Dashboard¶
Subpackages: app/ (app.py, app_utils.py, charts.py, charts_evolution.py, _chart_helpers.py, _dashboard_config.py, _eval_shim.py, run_loader.py)
Ubiquitous language: FoldSchedule, run_loader, eval_shim, PlotGroup, charts_evolution, dashboard_config, Streamlit app
Public surface: app.py (Streamlit entry point); run_loader.py (loads ExperimentResult for display).
Internal vs published model: The Dashboard is a read-only consumer of ExperimentResult and ExperimentConfig. It has no write path into the pipeline artefact tree.
Boundary contracts:
- Input: an existing run_dir discovered at startup; ExperimentConfig loaded via require_input_data_dir().
- Output: browser-rendered visualisations only; no filesystem writes.
- app/ is intentionally a parallel leaf entry point — nothing in the package imports from it (DOMAIN_MODEL.md §8.1 rule 8).
Notes: _eval_shim.py exists to avoid importing the full Streamlit context at import time. FoldSchedule (_dashboard_config.py:198) is a dashboard-internal value object describing per-commodity fold calendars.
Anti-corruption layers¶
The following translation steps prevent one context's model from leaking raw representations into another.
| Translation | From context | To context | Location |
|---|---|---|---|
Raw NASS bu/ac → internal kg/ha |
Reference Data (raw source) | Feature Engineering | features/builders/yields.py — bu_acre_to_kg_ha() via lib/unit_utils.py |
FIPS / mixed-case names → GeoIdentifier ADM path |
External raw data | Geo & Identifiers (then everywhere) | lib/geo/identifiers.py — normalise_geo_identifier() |
Internal kg/ha → client delivery units (bu/ac, lbs/ac) |
Experiment / Post-processing | Delivery | delivery/schemas.py — DeliveryRow validators; delivery/conversions.py rename pass |
WASDE marketing-year alignment → internal season_year |
Reference Data (WASDE CSV) | Post-processing / Delivery | lib/reference_data/wasde.py |
| Observed ERA5 + materialised climatology → unified forecast feature matrix | Forecast (external obs + climo) | Feature Engineering (forecast path) | features/forecast_weather.py — materialise_forecast_indices() |
delivery/conversions.py consuming conformal helpers from stages/run_meta_models.py |
Post-processing internals | Delivery | delivery/conversions.py imports compute_conformal_half_widths_from_training from stages/run_meta_models.py — tracked tech-debt (DOMAIN_MODEL.md §8.1 rule 4); these helpers should migrate to lib/ |
delivery/export.py consuming preflight helpers from run/preflight.py |
Preflight internals | Delivery | delivery/export.py imports preflight_paths_for_export from run/preflight.py — tracked tech-debt (same source); should migrate to lib/ |
Open questions on context boundaries¶
-
marketing_yearvsseason_year— WASDE uses a marketing year (Oct–Sep for US grains) that is not fully collapsed intoseason_year. The Reference Data → Post-processing seam carries an implicit translation that is currently ad hoc. Formalising the mapping as an explicit value object would remove the ambiguity (DOMAIN_MODEL2.md §9). -
Conformal helpers in
stages/vslib/—delivery/conversions.pyimports conformal half-width computation fromstages/run_meta_models.py, creating an upward edge from the Delivery context into the Experiment orchestration layer. The correct home for these helpers islib/(Post-processing or a newlib/conformal/). Until moved, this is the single real layering violation in the package. -
Forecast vs Hindcast mode boundary — Mode is determined statically by whether
ExperimentConfig.forecastis set (DESIGN.mdline 504). There is no runtime-toggle path. If a single workflow ever needs to produce both hindcast and forecast artefacts in one pass (beyondrun all), the static config split would need rethinking. -
included_geo_identifiersownership — Thisfrozensetis computed in the Experiment context (FIT stage), persisted torun_dir, and consumed by Post-processing, Evaluation, and Delivery. It currently travels as a required kwarg rather than being encapsulated withinExperimentResult. Whether it should become a first-class property onExperimentResultis an open design question. -
Dashboard as diagnostic vs operational tool —
app/currently imports fromlib/results/andconfig.pydirectly. If the Dashboard ever needs to trigger pipeline stages (e.g. re-deliver from the UI), it would need to cross from a leaf context into the Experiment orchestration layer, which would violate the current import DAG rule.