Aggregates — commodity_hindcast¶

What an aggregate is here¶

An aggregate is a cluster of entities and value objects treated as a single unit for the purposes of change, validation, and persistence. The aggregate root is the only externally-addressable entry point; child entities and value objects are accessed exclusively through it and do not exist independently outside the aggregate boundary. Invariants that span multiple child objects are enforced on the root. In commodity_hindcast, "persistence" means the on-disk artefact tree under run_dir/ or the YAML config file — no relational database is involved for primary domain state.

Identified aggregates¶

ExperimentConfig (root)¶

Children: CommodityConfig, ModelConfig, ExperimentProtocolConfig, PostprocessConfig (→ BiasCorrectorConfig), DeliveryConfig, ForecastConfig (optional), list[ReferenceYieldSpec] (each is a _ReferenceYieldSpecBase subclass), and transitively the Builder discriminated union + SeasonWindow list held by CommodityConfig, EditRuleConfig list held by YieldsBuilder, and AssembleStressConfig (a nested Pydantic model on StressBuilder, config.py:203) governing regeneration of the stress parquet from a raw indices zarr. Note: EvaluationConfig is mentioned in the in-package DOMAIN_MODEL.md and DOMAIN_MODEL2.md as a conceptual child, and appears as an example in DESIGN.md:11, but no EvaluationConfig class exists in config.py — evaluation scoring is handled by parameters within ExperimentProtocolConfig and the diagnostics layer directly.

Consistency rule: The entire config is loaded and validated atomically by pydantic-settings and pydantic model validators before any stage compute begins. Partial or invalid configs are rejected at load time. Key cross-field invariants:

experiment_name matches [a-zA-Z0-9_-]+ (config.py:629).
Every ResolvablePath field anywhere in the nested tree is resolved against data_root by _iter_resolvable_fields() inside a model_validator(mode="after") (config.py:785). Relative paths anchor at data_root; S3 URIs and absolute paths pass through.
model.use_sample_weights=True requires model.weather_correction_fit_level="ADM2" (config.py:492).
forecast.init_date must be set when the forecast pipeline is invoked; None signals hindcast mode (config.py:666).
reference_data entries must have unique name values (config.py:793).
delivery.ci_levels must be a subset of SUPPORTED_CI_LEVELS and all values in (0, 1) (config.py:432).
postprocess.conformalise must be non-empty and contain no duplicates (config.py:543).

Persistence: Written as <run_dir>/config_resolved.yaml immediately after the run root is created; loaded lazily and cached once per run_dir per process by _load_config() in lib/results/results_slice.py:49.

Source: config.py:573, lib/results/results_slice.py:49

ExperimentResult (root)¶

Children: tuple[HindcastSlice, ...] (one per numeric fold plus the production fold), tuple[ForecastSlice, ...] (one per (season_year, init_date) pair). Also holds a reference back to ExperimentConfig (the resolved config loaded from config_resolved.yaml).

Consistency rule: ExperimentResult is a frozen dataclass (lib/results/run_result.py:30). It is a lazy handle — it does not carry computed data in memory; the disk is the contract. Callers see whatever artefacts are currently present on disk. Valid states:

`hindcast_slices`	`forecast_slices`	State
empty	empty	Fresh `run_dir` — features built, no modelling yet.
non-empty	empty	Hindcast complete, no forecasts issued.
non-empty	non-empty	Full: production model trained and forecasts issued.

Key invariants:

If forecast_slices is non-empty, a production HindcastSlice must exist under run_dir/models/{commodity}/production/ (enforced by ForecastSlice.training raising FileNotFoundError if the detrender is absent). lib/results/results_slice.py:418
from_run_dir() discovers slices from preds/{commodity}/{fold_label}/train_preds.parquet (numeric labels only; production is accessed via the production property). lib/results/run_result.py:66
Forecast slices are discovered from forecast/{season_year}/{init_date}/preds/walk_forward_preds.parquet. lib/results/run_result.py:96
included_geo_identifiers.txt is written by save_included_geo_identifiers() and read by load_included_geo_identifiers(); the FIT phase must write it before PREDICT runs. lib/results/run_result.py:157,164

Persistence: Filesystem directory rooted at <run_dir_base>/<timestamp>_<experiment_name>/, e.g. runs/20260505_143022_corn_v1/. stages/run_hindcast.py:83

Source: lib/results/run_result.py:30, stages/run_hindcast.py:77

HindcastSlice (root within ExperimentResult)¶

Children: Lazy-loaded artefacts under the slice's own on-disk subtree — detrender.pkl, model.{ridge|pca_ridge|xgboost}, feature_fill_values.parquet, train_preds.parquet, walk_forward_preds.parquet, year_data.parquet, and optional bias_corrector.pkl. Also exposes shared run-level feature paths (features_fit_path, features_pred_path) resolved via _load_config(run_dir).

Consistency rule: All slice artefacts are either fully present (fold is complete) or treated as not-yet-computed. No partial artefact state is meaningful; stages overwrite each fold in full on re-run. has_bias_corrector is the only conditional check — bias correction is optional depending on BiasCorrectorConfig.kind. The slice is frozen (frozen=True) so its identity fields (run_dir, fold_label) cannot change after construction. lib/results/results_slice.py:112

Key invariants:

cutoff for a numeric fold is date(int(fold_label), 1, 1). For the production fold it is date(feature_end_year + 1, 1, 1). lib/results/results_slice.py:151
load_model() tries three concrete classes in sequence and raises RuntimeError if none succeeds, rather than silently returning a wrong type. lib/results/results_slice.py:182
The bias-corrector path is always canonical (postprocessed/{commodity}/{fold_label}/bias_corrector.pkl); loading is mediated by has_bias_corrector. lib/results/results_slice.py:41,173

Persistence: - Models: <run_dir>/models/{commodity}/{fold_label}/ - Predictions: <run_dir>/preds/{commodity}/{fold_label}/ - Bias corrector: <run_dir>/postprocessed/{commodity}/{fold_label}/bias_corrector.pkl

Source: lib/results/results_slice.py:112

ForecastSlice (root within ExperimentResult)¶

Children: Per-init artefacts under run_dir/forecast/{season_year}/{init_date}/ — indices.zarr (spliced obs+climo), features/pred.parquet, preds/walk_forward_preds.parquet, preds/year_data.parquet, postprocessed/national.parquet, delivery/{ADM}_Forecast_{init_date}.csv. Trained artefacts (model, detrender, fill values) are not owned — they are delegated to the production HindcastSlice via self.training.

Consistency rule: The slice is frozen (frozen=True). Its identity is (run_dir, commodity, season_year, init_date). __post_init__ validates that run_dir is an existing directory. lib/results/results_slice.py:317

Key invariants:

self.training resolves by calling production_hindcast_slice(self.run_dir, self.commodity) and raises FileNotFoundError if the production detrender is absent. The forecast pipeline cannot proceed without a production model. lib/results/results_slice.py:410
Forecast artefacts live under their own root subtree and never collide with canonical hindcast artefacts. run_features writes the per-init pred.parquet to features_dir/ (inside the slice root), not to the canonical cfg.features_dir/{commodity}/pred.parquet. lib/results/results_slice.py:340
cutoff is self.init_date (typed uniformly with HindcastSlice). lib/results/results_slice.py:382

Persistence: <run_dir>/forecast/{season_year}/{init_date}/ — fully isolated subtree. lib/results/results_slice.py:323

Source: lib/results/results_slice.py:299

HindcastDelivery (root)¶

Children: list[DeliveryRow] plus the scalar generated_date field. Each DeliveryRow is a frozen pydantic model (value object) carrying identity (commodity, year, init_date, geo_identifier, variable, model), the predicted mean, optional benchmark fields (nass_actual, wasde_in_season, etc.), and conformal interval band pairs.

Consistency rule: Validated atomically by pydantic validators on construction:

generated_date must be ISO YYYY-MM-DD. delivery/schemas.py:239
No duplicate (year, init_date, geo_identifier) tuples across all rows. delivery/schemas.py:249
Every (year, geo_identifier) group must have the same number of init_date entries (fold consistency). delivery/schemas.py:223 (DOMAIN_MODEL2.md §4.7 citation)
Per-row DeliveryRow invariants: CI ordering lower_95 ≤ … ≤ mean ≤ … ≤ upper_95 (delivery/schemas.py:176); init_date within [year − LONG_RANGE_HORIZON_YEARS, year + 1] (delivery/schemas.py:197); init_date is ISO YYYY-MM-DD (delivery/schemas.py:164).
extra="forbid" on DeliveryRow ensures any undeclared benchmark column (e.g. a mis-named ReferenceYieldSpec) produces a loud ValidationError rather than silent data loss. delivery/schemas.py:126

Persistence: One CSV per ADM level per run — delivery/Treefera_{commodity}_{ADM0|ADM1|ADM2}_Hindcast_{YYYYMMDD}.csv for hindcast; forecast/{season_year}/{init_date}/delivery/Treefera_{commodity}_{level}_Forecast_{init_date}.csv for forecast. stages/run_hindcast.py:224, stages/run_forecast.py:398

Source: delivery/schemas.py:109,227

ExperimentProtocolConfig + Fold schedule (root)¶

Children: cv_strategy (string), test_years (list of ints), production_cumulative_threshold (float), production_recent_years (int). The ExpandingFoldGenerator is instantiated from this config; each generated fold is a transient value, not a persisted entity. FoldSchedule in app/_dashboard_config.py is a derived read-only view of this schedule for dashboard display only.

Consistency rule: test_years must be a non-empty ordered list; the walk-forward generator yields them in sorted order. cv_strategy is currently always "expanding" — the field is declared to allow future extension without a schema change. production_cumulative_threshold is in (0, 1]. config.py:456

Persistence: Embedded in ExperimentConfig; persisted only as part of config_resolved.yaml.

Source: config.py:456, run/runner.py:27

Check list (aggregate, per preflight call)¶

Children: list[Check] — each Check is a frozen dataclass with (name, passed, message, critical). The list is produced by a per-stage factory function (preflight_paths_for_hindcast, preflight_paths_for_forecast_features, etc.) and consumed atomically by run_preflight().

Consistency rule: run_preflight() iterates the list; non-critical failures log a WARNING; the first critical failure logs an ERROR and raises SystemExit. The check list itself is immutable once returned from the factory. run/preflight.py:42

Key invariant: a critical check failure aborts the entire stage — there is no partial recovery or soft retry. The pipeline is a linear DAG with no state machine; stages are idempotent and re-runnable from scratch after fixing the failing condition.

Persistence: Not persisted. Check results are ephemeral gate outputs; the MLflow run records whether the overall run succeeded.

Source: run/preflight.py:20,42

Cross-aggregate references¶

Aggregates reference each other by ID (path, fold label, or commodity name) rather than by direct object nesting. The table below lists every cross-boundary reference.

From	To	Reference type	Notes
`ExperimentResult`	`ExperimentConfig`	Direct embed (config loaded from `config_resolved.yaml`)	The config is read-only once loaded; it is carried as a field on ExperimentResult. `lib/results/run_result.py:34`
`HindcastSlice`	`ExperimentConfig`	By path — `_load_config(run_dir)` (cached)	Slices do not store config; they load it lazily from `run_dir/config_resolved.yaml`. `lib/results/results_slice.py:49`
`ForecastSlice`	`HindcastSlice["production"]`	By `run_dir` + `commodity` string — `production_hindcast_slice()`	ForecastSlice reaches trained artefacts via the production HindcastSlice without holding a reference to ExperimentResult. `lib/results/results_slice.py:410`
`ForecastSlice`	`ExperimentConfig`	By path — same `_load_config(run_dir)` helper	`features_fit_path` and `features_pred_path` resolve via the cached config. `lib/results/results_slice.py:388`
`HindcastDelivery`	`ExperimentResult`	By `run_dir` string — callers pass the result to conversion helpers	`walk_forward_preds_to_delivery_rows(experiment, ...)` takes the ExperimentResult as a parameter; HindcastDelivery itself carries no run_dir. `stages/run_hindcast.py:224`
`Check` list	`ExperimentConfig`	Factory functions consume ExperimentConfig to build Check lists	Preflight is cross-cutting; it touches the Config aggregate to verify ResolvablePath fields. `run/preflight.py:59`
MLflow Run	`ExperimentResult`	By `run_dir` tag	One MLflow run per pipeline invocation; `run_dir` is stored as a tag and as a logged artefact. `domain-modelling/DOMAIN_MODEL2.md §2`
`FoldSchedule` (dashboard)	`CommodityConfig`	By commodity name string — `build_fold_schedule(commodity)`	Dashboard-only; no pipeline stage depends on FoldSchedule. `app/_dashboard_config.py:262`

Aggregate invariant summary¶

The table below gives a compact per-aggregate view of the key invariant, lifecycle event that establishes it, and the code enforcer.

Aggregate	Key invariant	Established by	Enforcer
ExperimentConfig	All ResolvablePaths resolve under data_root	Config load	`model_validator _resolve_data_paths` `config.py:777`
ExperimentConfig	experiment_name is slug-safe	Config load	`field_validator _validate_experiment_name` `config.py:627`
ExperimentConfig	ci_levels ∈ SUPPORTED_CI_LEVELS ∩ (0,1)	Config load	`field_validator _validate_ci_levels` `config.py:433`
ExperimentConfig	reference_data names are unique	Config load	`model_validator _reference_data_names_unique` `config.py:793`
ExperimentConfig	use_sample_weights=True only with ADM2 fit level	Config load	`model_validator _validate_sample_weight_usage` `config.py:492`
ExperimentResult	production slice exists before forecasts	Forecast run	`ForecastSlice.training` raises `FileNotFoundError` `results_slice.py:418`
ExperimentResult	included_geo_identifiers written before PREDICT	FIT stage	`_persist_included()` called in `run_hindcast.run()` `stages/run_hindcast.py:218`
HindcastSlice	All fold artefacts present or treated as missing	Per fold write	Atomic overwrite; no partial checkpoint semantics
HindcastDelivery	No duplicate (year, init_date, geo_identifier)	Delivery construction	`model_validator _validate_no_duplicate_keys` `delivery/schemas.py:249`
HindcastDelivery	Fold consistency: equal init_date count per (year, geo)	Delivery construction	`model_validator` `delivery/schemas.py:223`
DeliveryRow	CI ordering lower_95 ≤ … ≤ mean ≤ … ≤ upper_95	Row construction	`model_validator _validate_ci_ordering` `delivery/schemas.py:176`
DeliveryRow	init_date within [year − LONG_RANGE_HORIZON, year + 1]	Row construction	`model_validator _validate_init_date_year` `delivery/schemas.py:197`
Check list	Critical failure aborts the stage via SystemExit	Preflight execution	`run_preflight()` `run/preflight.py:42`

Aggregate boundary decisions¶

The following decisions explain why the aggregate boundaries were drawn where they are. These are editorial decisions captured from DOMAIN_MODEL.md and DOMAIN_MODEL2.md.

Why ExperimentConfig is one aggregate, not seven¶

Config validation is an atomic pre-computation gate. All seven nested config objects must be internally consistent before any stage can start — for example, use_sample_weights on ModelConfig depends on weather_correction_fit_level, and every ResolvablePath anywhere in the tree must resolve under the shared data_root. Splitting the config into separate aggregates would require cross-aggregate validators, which is more complex than a single pydantic model_validator(mode="after") walking the whole tree.

Why ExperimentResult is a lazy handle, not a data container¶

Stage isolation is the core architectural principle: no in-memory objects cross phase boundaries. If ExperimentResult carried computed data (e.g., loaded DataFrames), a stage restart would require re-executing the prior stage to rebuild the in-memory state rather than simply re-reading from disk. Making it a lazy handle (a frozen dataclass of paths and loaders) means any stage can reconstruct the full domain context from a run_dir string alone. This is documented explicitly in lib/results/run_result.py:31.

Why HindcastSlice and ForecastSlice are children of ExperimentResult, not independent roots¶

Slices are not independently addressable without their run_dir — the run_dir IS the primary identity. A HindcastSlice for fold "2020" in run A is a completely different object from one in run B, even though the fold labels match. Modelling them as children of ExperimentResult (the run_dir root) correctly expresses this: the lifecycle of a slice is tied to its run_dir. ExperimentResult.from_run_dir() is the only constructor — you cannot build a valid slice collection without a run_dir.

Why HindcastDelivery is a separate aggregate from ExperimentResult¶

Delivery is the client contract. Its invariants (CI ordering, fold consistency, ISO date formats) are schema-level claims auditable by clients and QA scripts. Keeping delivery validation separate from the pipeline's internal representation means:

Delivery validators can be run against re-exported CSVs without loading the full pipeline context.
ExperimentResult invariants (slice completeness, production model existence) and delivery invariants (CI ordering, fold consistency) have different owners and different failure modes.
One ExperimentResult produces three HindcastDelivery instances (one per ADM level) — the 1:N relationship is cleaner when delivery is its own aggregate.

Why the Check list is an aggregate per preflight call, not a global singleton¶

Preflight checks are stage-specific: the hindcast and forecast stages validate different paths and different required artefacts. The list is assembled fresh by a stage-specific factory function, consumed atomically by run_preflight(), and then discarded. There is no persistent check state between stages — this is intentional because each stage is independently re-runnable.

Notes on the canonical entity list¶

The canonical entity list provided for this phase uses the following names that are worth clarifying against the code:

Commodity — not a domain class; it is the commodity string field on CommodityConfig. The entity with commodity-specific behaviour is CommodityConfig.
SeasonYear — not a class; it is an int field on ForecastSlice and in feature parquets. The temporal concept is documented in the ubiquitous language glossary.
InitDate — not a class; it is a date field on ForecastSlice and in feature parquets.
Region — not a class; it is a NewType("GeoIdentifier", str) alias. No aggregate boundary applies.
Yield — not a class; it is a float column (yield_kg_ha, sim_yield_kg_ha, etc.) in parquet files.
Fold — not a persistent class; it is the transient output of ExpandingFoldGenerator.generate_folds(). The persistent artefact for a fold is HindcastSlice.
RunDir — not a domain class; it is a Path field on ExperimentResult and on every slice. The aggregate that owns the run directory is ExperimentResult.
CalibrationResult — a persisted aggregate sidecar, not a transient value. It is a @dataclass(frozen=True) in models/meta_models/conformalise.py:111 with first-class save (conformalise.py:215) and load (conformalise.py:225) methods backed by long-format parquet. The four residual modes each produce a distinct per-mode sidecar at <run_dir>/conformal/{mode}.parquet (e.g. conformal/hindcast_oos_per_init_date.parquet). The wiki/sources/code/meta_models.md page provides full detail on serialisation.
ADM0Row / ADM1Row / ADM2Row — these are not separate classes; delivery is at the ADM level implied by the geo_identifier value on each DeliveryRow. All three levels use the same DeliveryRow schema.