Aggregates — commodity_hindcast¶
What an aggregate is here¶
An aggregate is a cluster of entities and value objects treated as a single unit for the purposes of change, validation, and persistence. The aggregate root is the only externally-addressable entry point; child entities and value objects are accessed exclusively through it and do not exist independently outside the aggregate boundary. Invariants that span multiple child objects are enforced on the root. In commodity_hindcast, "persistence" means the on-disk artefact tree under run_dir/ or the YAML config file — no relational database is involved for primary domain state.
Identified aggregates¶
ExperimentConfig (root)¶
Children: CommodityConfig, ModelConfig, ExperimentProtocolConfig, PostprocessConfig (→ BiasCorrectorConfig), DeliveryConfig, ForecastConfig (optional), list[ReferenceYieldSpec] (each is a _ReferenceYieldSpecBase subclass), and transitively the Builder discriminated union + SeasonWindow list held by CommodityConfig, EditRuleConfig list held by YieldsBuilder, and AssembleStressConfig (a nested Pydantic model on StressBuilder, config.py:203) governing regeneration of the stress parquet from a raw indices zarr. Note: EvaluationConfig is mentioned in the in-package DOMAIN_MODEL.md and DOMAIN_MODEL2.md as a conceptual child, and appears as an example in DESIGN.md:11, but no EvaluationConfig class exists in config.py — evaluation scoring is handled by parameters within ExperimentProtocolConfig and the diagnostics layer directly.
Consistency rule: The entire config is loaded and validated atomically by pydantic-settings and pydantic model validators before any stage compute begins. Partial or invalid configs are rejected at load time. Key cross-field invariants:
experiment_namematches[a-zA-Z0-9_-]+(config.py:629).- Every
ResolvablePathfield anywhere in the nested tree is resolved againstdata_rootby_iter_resolvable_fields()inside amodel_validator(mode="after")(config.py:785). Relative paths anchor atdata_root; S3 URIs and absolute paths pass through. model.use_sample_weights=Truerequiresmodel.weather_correction_fit_level="ADM2"(config.py:492).forecast.init_datemust be set when the forecast pipeline is invoked;Nonesignals hindcast mode (config.py:666).reference_dataentries must have uniquenamevalues (config.py:793).delivery.ci_levelsmust be a subset ofSUPPORTED_CI_LEVELSand all values in(0, 1)(config.py:432).postprocess.conformalisemust be non-empty and contain no duplicates (config.py:543).
Persistence: Written as <run_dir>/config_resolved.yaml immediately after the run root is created; loaded lazily and cached once per run_dir per process by _load_config() in lib/results/results_slice.py:49.
Source: config.py:573, lib/results/results_slice.py:49
ExperimentResult (root)¶
Children: tuple[HindcastSlice, ...] (one per numeric fold plus the production fold), tuple[ForecastSlice, ...] (one per (season_year, init_date) pair). Also holds a reference back to ExperimentConfig (the resolved config loaded from config_resolved.yaml).
Consistency rule: ExperimentResult is a frozen dataclass (lib/results/run_result.py:30). It is a lazy handle — it does not carry computed data in memory; the disk is the contract. Callers see whatever artefacts are currently present on disk. Valid states:
hindcast_slices |
forecast_slices |
State |
|---|---|---|
| empty | empty | Fresh run_dir — features built, no modelling yet. |
| non-empty | empty | Hindcast complete, no forecasts issued. |
| non-empty | non-empty | Full: production model trained and forecasts issued. |
Key invariants:
- If
forecast_slicesis non-empty, a productionHindcastSlicemust exist underrun_dir/models/{commodity}/production/(enforced byForecastSlice.trainingraisingFileNotFoundErrorif the detrender is absent).lib/results/results_slice.py:418 from_run_dir()discovers slices frompreds/{commodity}/{fold_label}/train_preds.parquet(numeric labels only; production is accessed via theproductionproperty).lib/results/run_result.py:66- Forecast slices are discovered from
forecast/{season_year}/{init_date}/preds/walk_forward_preds.parquet.lib/results/run_result.py:96 included_geo_identifiers.txtis written bysave_included_geo_identifiers()and read byload_included_geo_identifiers(); the FIT phase must write it before PREDICT runs.lib/results/run_result.py:157,164
Persistence: Filesystem directory rooted at <run_dir_base>/<timestamp>_<experiment_name>/, e.g. runs/20260505_143022_corn_v1/. stages/run_hindcast.py:83
Source: lib/results/run_result.py:30, stages/run_hindcast.py:77
HindcastSlice (root within ExperimentResult)¶
Children: Lazy-loaded artefacts under the slice's own on-disk subtree — detrender.pkl, model.{ridge|pca_ridge|xgboost}, feature_fill_values.parquet, train_preds.parquet, walk_forward_preds.parquet, year_data.parquet, and optional bias_corrector.pkl. Also exposes shared run-level feature paths (features_fit_path, features_pred_path) resolved via _load_config(run_dir).
Consistency rule: All slice artefacts are either fully present (fold is complete) or treated as not-yet-computed. No partial artefact state is meaningful; stages overwrite each fold in full on re-run. has_bias_corrector is the only conditional check — bias correction is optional depending on BiasCorrectorConfig.kind. The slice is frozen (frozen=True) so its identity fields (run_dir, fold_label) cannot change after construction. lib/results/results_slice.py:112
Key invariants:
cutofffor a numeric fold isdate(int(fold_label), 1, 1). For the production fold it isdate(feature_end_year + 1, 1, 1).lib/results/results_slice.py:151load_model()tries three concrete classes in sequence and raisesRuntimeErrorif none succeeds, rather than silently returning a wrong type.lib/results/results_slice.py:182- The bias-corrector path is always canonical (
postprocessed/{commodity}/{fold_label}/bias_corrector.pkl); loading is mediated byhas_bias_corrector.lib/results/results_slice.py:41,173
Persistence:
- Models: <run_dir>/models/{commodity}/{fold_label}/
- Predictions: <run_dir>/preds/{commodity}/{fold_label}/
- Bias corrector: <run_dir>/postprocessed/{commodity}/{fold_label}/bias_corrector.pkl
Source: lib/results/results_slice.py:112
ForecastSlice (root within ExperimentResult)¶
Children: Per-init artefacts under run_dir/forecast/{season_year}/{init_date}/ — indices.zarr (spliced obs+climo), features/pred.parquet, preds/walk_forward_preds.parquet, preds/year_data.parquet, postprocessed/national.parquet, delivery/{ADM}_Forecast_{init_date}.csv. Trained artefacts (model, detrender, fill values) are not owned — they are delegated to the production HindcastSlice via self.training.
Consistency rule: The slice is frozen (frozen=True). Its identity is (run_dir, commodity, season_year, init_date). __post_init__ validates that run_dir is an existing directory. lib/results/results_slice.py:317
Key invariants:
self.trainingresolves by callingproduction_hindcast_slice(self.run_dir, self.commodity)and raisesFileNotFoundErrorif the production detrender is absent. The forecast pipeline cannot proceed without a production model.lib/results/results_slice.py:410- Forecast artefacts live under their own
rootsubtree and never collide with canonical hindcast artefacts.run_featureswrites the per-initpred.parquettofeatures_dir/(inside the slice root), not to the canonicalcfg.features_dir/{commodity}/pred.parquet.lib/results/results_slice.py:340 cutoffisself.init_date(typed uniformly with HindcastSlice).lib/results/results_slice.py:382
Persistence: <run_dir>/forecast/{season_year}/{init_date}/ — fully isolated subtree. lib/results/results_slice.py:323
Source: lib/results/results_slice.py:299
HindcastDelivery (root)¶
Children: list[DeliveryRow] plus the scalar generated_date field. Each DeliveryRow is a frozen pydantic model (value object) carrying identity (commodity, year, init_date, geo_identifier, variable, model), the predicted mean, optional benchmark fields (nass_actual, wasde_in_season, etc.), and conformal interval band pairs.
Consistency rule: Validated atomically by pydantic validators on construction:
generated_datemust be ISOYYYY-MM-DD.delivery/schemas.py:239- No duplicate
(year, init_date, geo_identifier)tuples across all rows.delivery/schemas.py:249 - Every
(year, geo_identifier)group must have the same number ofinit_dateentries (fold consistency).delivery/schemas.py:223(DOMAIN_MODEL2.md §4.7 citation) - Per-row
DeliveryRowinvariants: CI orderinglower_95 ≤ … ≤ mean ≤ … ≤ upper_95(delivery/schemas.py:176);init_datewithin[year − LONG_RANGE_HORIZON_YEARS, year + 1](delivery/schemas.py:197);init_dateis ISOYYYY-MM-DD(delivery/schemas.py:164). extra="forbid"onDeliveryRowensures any undeclared benchmark column (e.g. a mis-named ReferenceYieldSpec) produces a loudValidationErrorrather than silent data loss.delivery/schemas.py:126
Persistence: One CSV per ADM level per run — delivery/Treefera_{commodity}_{ADM0|ADM1|ADM2}_Hindcast_{YYYYMMDD}.csv for hindcast; forecast/{season_year}/{init_date}/delivery/Treefera_{commodity}_{level}_Forecast_{init_date}.csv for forecast. stages/run_hindcast.py:224, stages/run_forecast.py:398
Source: delivery/schemas.py:109,227
ExperimentProtocolConfig + Fold schedule (root)¶
Children: cv_strategy (string), test_years (list of ints), production_cumulative_threshold (float), production_recent_years (int). The ExpandingFoldGenerator is instantiated from this config; each generated fold is a transient value, not a persisted entity. FoldSchedule in app/_dashboard_config.py is a derived read-only view of this schedule for dashboard display only.
Consistency rule: test_years must be a non-empty ordered list; the walk-forward generator yields them in sorted order. cv_strategy is currently always "expanding" — the field is declared to allow future extension without a schema change. production_cumulative_threshold is in (0, 1]. config.py:456
Persistence: Embedded in ExperimentConfig; persisted only as part of config_resolved.yaml.
Source: config.py:456, run/runner.py:27
Check list (aggregate, per preflight call)¶
Children: list[Check] — each Check is a frozen dataclass with (name, passed, message, critical). The list is produced by a per-stage factory function (preflight_paths_for_hindcast, preflight_paths_for_forecast_features, etc.) and consumed atomically by run_preflight().
Consistency rule: run_preflight() iterates the list; non-critical failures log a WARNING; the first critical failure logs an ERROR and raises SystemExit. The check list itself is immutable once returned from the factory. run/preflight.py:42
Key invariant: a critical check failure aborts the entire stage — there is no partial recovery or soft retry. The pipeline is a linear DAG with no state machine; stages are idempotent and re-runnable from scratch after fixing the failing condition.
Persistence: Not persisted. Check results are ephemeral gate outputs; the MLflow run records whether the overall run succeeded.
Source: run/preflight.py:20,42
Cross-aggregate references¶
Aggregates reference each other by ID (path, fold label, or commodity name) rather than by direct object nesting. The table below lists every cross-boundary reference.
| From | To | Reference type | Notes |
|---|---|---|---|
ExperimentResult |
ExperimentConfig |
Direct embed (config loaded from config_resolved.yaml) |
The config is read-only once loaded; it is carried as a field on ExperimentResult. lib/results/run_result.py:34 |
HindcastSlice |
ExperimentConfig |
By path — _load_config(run_dir) (cached) |
Slices do not store config; they load it lazily from run_dir/config_resolved.yaml. lib/results/results_slice.py:49 |
ForecastSlice |
HindcastSlice["production"] |
By run_dir + commodity string — production_hindcast_slice() |
ForecastSlice reaches trained artefacts via the production HindcastSlice without holding a reference to ExperimentResult. lib/results/results_slice.py:410 |
ForecastSlice |
ExperimentConfig |
By path — same _load_config(run_dir) helper |
features_fit_path and features_pred_path resolve via the cached config. lib/results/results_slice.py:388 |
HindcastDelivery |
ExperimentResult |
By run_dir string — callers pass the result to conversion helpers |
walk_forward_preds_to_delivery_rows(experiment, ...) takes the ExperimentResult as a parameter; HindcastDelivery itself carries no run_dir. stages/run_hindcast.py:224 |
Check list |
ExperimentConfig |
Factory functions consume ExperimentConfig to build Check lists | Preflight is cross-cutting; it touches the Config aggregate to verify ResolvablePath fields. run/preflight.py:59 |
| MLflow Run | ExperimentResult |
By run_dir tag |
One MLflow run per pipeline invocation; run_dir is stored as a tag and as a logged artefact. domain-modelling/DOMAIN_MODEL2.md §2 |
FoldSchedule (dashboard) |
CommodityConfig |
By commodity name string — build_fold_schedule(commodity) |
Dashboard-only; no pipeline stage depends on FoldSchedule. app/_dashboard_config.py:262 |
Aggregate invariant summary¶
The table below gives a compact per-aggregate view of the key invariant, lifecycle event that establishes it, and the code enforcer.
| Aggregate | Key invariant | Established by | Enforcer |
|---|---|---|---|
| ExperimentConfig | All ResolvablePaths resolve under data_root | Config load | model_validator _resolve_data_paths config.py:777 |
| ExperimentConfig | experiment_name is slug-safe | Config load | field_validator _validate_experiment_name config.py:627 |
| ExperimentConfig | ci_levels ∈ SUPPORTED_CI_LEVELS ∩ (0,1) | Config load | field_validator _validate_ci_levels config.py:433 |
| ExperimentConfig | reference_data names are unique | Config load | model_validator _reference_data_names_unique config.py:793 |
| ExperimentConfig | use_sample_weights=True only with ADM2 fit level | Config load | model_validator _validate_sample_weight_usage config.py:492 |
| ExperimentResult | production slice exists before forecasts | Forecast run | ForecastSlice.training raises FileNotFoundError results_slice.py:418 |
| ExperimentResult | included_geo_identifiers written before PREDICT | FIT stage | _persist_included() called in run_hindcast.run() stages/run_hindcast.py:218 |
| HindcastSlice | All fold artefacts present or treated as missing | Per fold write | Atomic overwrite; no partial checkpoint semantics |
| HindcastDelivery | No duplicate (year, init_date, geo_identifier) | Delivery construction | model_validator _validate_no_duplicate_keys delivery/schemas.py:249 |
| HindcastDelivery | Fold consistency: equal init_date count per (year, geo) | Delivery construction | model_validator delivery/schemas.py:223 |
| DeliveryRow | CI ordering lower_95 ≤ … ≤ mean ≤ … ≤ upper_95 | Row construction | model_validator _validate_ci_ordering delivery/schemas.py:176 |
| DeliveryRow | init_date within [year − LONG_RANGE_HORIZON, year + 1] | Row construction | model_validator _validate_init_date_year delivery/schemas.py:197 |
| Check list | Critical failure aborts the stage via SystemExit | Preflight execution | run_preflight() run/preflight.py:42 |
Aggregate boundary decisions¶
The following decisions explain why the aggregate boundaries were drawn where they are. These are editorial decisions captured from DOMAIN_MODEL.md and DOMAIN_MODEL2.md.
Why ExperimentConfig is one aggregate, not seven¶
Config validation is an atomic pre-computation gate. All seven nested config objects must be internally consistent before any stage can start — for example, use_sample_weights on ModelConfig depends on weather_correction_fit_level, and every ResolvablePath anywhere in the tree must resolve under the shared data_root. Splitting the config into separate aggregates would require cross-aggregate validators, which is more complex than a single pydantic model_validator(mode="after") walking the whole tree.
Why ExperimentResult is a lazy handle, not a data container¶
Stage isolation is the core architectural principle: no in-memory objects cross phase boundaries. If ExperimentResult carried computed data (e.g., loaded DataFrames), a stage restart would require re-executing the prior stage to rebuild the in-memory state rather than simply re-reading from disk. Making it a lazy handle (a frozen dataclass of paths and loaders) means any stage can reconstruct the full domain context from a run_dir string alone. This is documented explicitly in lib/results/run_result.py:31.
Why HindcastSlice and ForecastSlice are children of ExperimentResult, not independent roots¶
Slices are not independently addressable without their run_dir — the run_dir IS the primary identity. A HindcastSlice for fold "2020" in run A is a completely different object from one in run B, even though the fold labels match. Modelling them as children of ExperimentResult (the run_dir root) correctly expresses this: the lifecycle of a slice is tied to its run_dir. ExperimentResult.from_run_dir() is the only constructor — you cannot build a valid slice collection without a run_dir.
Why HindcastDelivery is a separate aggregate from ExperimentResult¶
Delivery is the client contract. Its invariants (CI ordering, fold consistency, ISO date formats) are schema-level claims auditable by clients and QA scripts. Keeping delivery validation separate from the pipeline's internal representation means:
- Delivery validators can be run against re-exported CSVs without loading the full pipeline context.
- ExperimentResult invariants (slice completeness, production model existence) and delivery invariants (CI ordering, fold consistency) have different owners and different failure modes.
- One ExperimentResult produces three HindcastDelivery instances (one per ADM level) — the 1:N relationship is cleaner when delivery is its own aggregate.
Why the Check list is an aggregate per preflight call, not a global singleton¶
Preflight checks are stage-specific: the hindcast and forecast stages validate different paths and different required artefacts. The list is assembled fresh by a stage-specific factory function, consumed atomically by run_preflight(), and then discarded. There is no persistent check state between stages — this is intentional because each stage is independently re-runnable.
Notes on the canonical entity list¶
The canonical entity list provided for this phase uses the following names that are worth clarifying against the code:
- Commodity — not a domain class; it is the
commoditystring field onCommodityConfig. The entity with commodity-specific behaviour isCommodityConfig. - SeasonYear — not a class; it is an
intfield onForecastSliceand in feature parquets. The temporal concept is documented in the ubiquitous language glossary. - InitDate — not a class; it is a
datefield onForecastSliceand in feature parquets. - Region — not a class; it is a
NewType("GeoIdentifier", str)alias. No aggregate boundary applies. - Yield — not a class; it is a float column (
yield_kg_ha,sim_yield_kg_ha, etc.) in parquet files. - Fold — not a persistent class; it is the transient output of
ExpandingFoldGenerator.generate_folds(). The persistent artefact for a fold isHindcastSlice. - RunDir — not a domain class; it is a
Pathfield onExperimentResultand on every slice. The aggregate that owns the run directory isExperimentResult. - CalibrationResult — a persisted aggregate sidecar, not a transient value. It is a
@dataclass(frozen=True)inmodels/meta_models/conformalise.py:111with first-classsave(conformalise.py:215) andload(conformalise.py:225) methods backed by long-format parquet. The four residual modes each produce a distinct per-mode sidecar at<run_dir>/conformal/{mode}.parquet(e.g.conformal/hindcast_oos_per_init_date.parquet). Thewiki/sources/code/meta_models.mdpage provides full detail on serialisation. - ADM0Row / ADM1Row / ADM2Row — these are not separate classes; delivery is at the ADM level implied by the
geo_identifiervalue on eachDeliveryRow. All three levels use the sameDeliveryRowschema.