Entity: ExperimentConfig¶
Definition¶
ExperimentConfig is the frozen, validated root configuration object for a single commodity pipeline run. It inherits pydantic_settings.BaseSettings and is the sole config authority passed to every stage: features, FIT, POSTPROCESS, EVALUATE, DELIVER, and FORECAST. All subordinate config blocks (CommodityConfig, ModelConfig, etc.) are nested fields. All ResolvablePath fields anywhere in the nested tree are resolved against data_root at construction time before any stage begins.
Kind¶
Pydantic BaseSettings subclass — root config aggregate. Frozen after construction (model_config sets frozen implicitly via the validator pattern; note that model_config here is a SettingsConfigDict, not {"frozen": True}). In practice the config is treated as immutable once written to config_resolved.yaml.
Source of truth¶
market_insights_models/src/commodity_hindcast/config.py:611
YAML loading order¶
settings_customise_sources at config.py:860 overrides the pydantic-settings default resolution chain. Priority from highest to lowest:
| Priority | Source | Notes |
|---|---|---|
| 1 (highest) | CliSettingsSource |
Click CLI flags forwarded via _prepare_config() setting COMMODITY_HINDCAST_CONFIG env var |
| 2 | env_settings |
Environment variables; nested delimiter __ (e.g. MODEL__REGRESSION=ridge) |
| 3 | YamlConfigSettingsSource |
YAML file resolved by _experiment_config_yaml_path() — env var COMMODITY_HINDCAST_CONFIG > <project_root>/configs/config.yaml |
| 4 (lowest) | init_settings |
Pydantic field defaults |
dotenv_settings and file_secret_settings are explicitly removed from the chain. cli_parse_args=False (config.py:630) prevents pydantic-settings from consuming sys.argv directly — Click handles CLI flags independently (issue #264).
Key attributes¶
| Field | Type | Default | Meaning | YAML example |
|---|---|---|---|---|
data_root |
AnyPath |
require_input_data_dir() |
Base directory; all relative paths anchor here. Read from INPUT_DATA_DIR env var (alias data_root) |
— (env var only) |
experiment_name |
str |
required | Slug-safe identifier used for MLflow experiment name and run_dir naming. Pattern: [a-zA-Z0-9_-]+ |
corn_yield_prediction |
random_seed |
int |
42 |
Global RNG seed for reproducibility | 42 |
mlflow_tracking_uri |
str |
sqlite:///mlruns.db |
MLflow tracking backend URI | sqlite:///mlruns.db |
feature_start_year |
int |
1980 |
Earliest year for which feature rows are built | 1980 |
feature_end_year |
int |
2025 |
Latest year (inclusive) for feature parquets | 2025 |
check_data_exists |
list[str] |
[] |
Extra preflight paths that must exist before a run | [] |
raw_dir |
AnyPath \| None |
data_root / "raw" |
Raw data directory; filled by _fill_defaults_from_data_root |
— |
features_dir |
AnyPath \| None |
data_root / "features" |
Features parquet directory | — |
models_dir |
AnyPath \| None |
data_root / "models" |
Trained model artefacts directory | — |
preds_dir |
AnyPath \| None |
data_root / "predictions" |
Fold prediction parquets directory | — |
run_dir_base |
AnyPath \| None |
data_root / "runs" |
Root for all timestamped run_dirs |
— |
commodity |
CommodityConfig |
required | Commodity-specific constants (calendar, builders, feature columns) | inline dict or "corn" stem |
experiment_protocol |
ExperimentProtocolConfig |
required | Walk-forward CV schedule | inline dict |
model |
ModelConfig |
ModelConfig() |
Detrend strategy + regression estimator | inline dict |
reference_data |
list[ReferenceYieldSpec] |
[] |
External benchmark specs (WASDE / CONAB). Each name must be unique |
list of dicts |
postprocess |
PostprocessConfig |
required | Bias correction + conformal calibration settings | inline dict |
delivery |
DeliveryConfig |
required | CI levels, public model name | inline dict |
forecast |
ForecastConfig \| None |
None |
Forecast-specific paths + residual_mode; None → hindcast mode |
inline dict or omit |
Lifecycle¶
cli._prepare_config()setsCOMMODITY_HINDCAST_CONFIGenv var to the resolved YAML path, then callsExperimentConfig()(nomodel_validate; theBaseSettingsconstructor firessettings_customise_sources).- Pydantic validators run in declaration order:
_prepare_commodity(mode=before, resolves nested commodity YAML), then_fill_defaults_from_data_root,_resolve_data_paths,_reference_data_names_unique,ensure_dirs(all mode=after). - Validated config is passed to
stages/run_hindcast._create_run_root, which writesconfig_resolved.yamland mutatesmodels_dir/preds_dirin-place to point inside the newrun_dir. - All downstream stages receive the config from
config_resolved.yamlvia_load_config(run_dir)(cached per process).
Relationships¶
- Root of the
ExperimentConfigaggregate: directly containsCommodityConfig,ModelConfig,ExperimentProtocolConfig,PostprocessConfig(→BiasCorrectorConfig),DeliveryConfig,ForecastConfig | None,list[ReferenceYieldSpec]. - Held by
ExperimentResultas itsconfigfield — the single in-memory config instance per run. - Loaded lazily by
HindcastSliceandForecastSlicevia_load_config(run_dir)fromconfig_resolved.yaml. - Consumed by every stage module, all builder functions, and all model factory methods.
Concepts and pipelines that touch this entity¶
- Pipeline: hindcast —
ExperimentConfigis constructed at the CLI entry point and threaded through every stage. - Pipeline: forecast —
forecastfield is non-None;init_dateis injected at runtime bybuild_forecast_features. - Concept: ResolvablePath safety — all
ResolvablePathfields resolved againstdata_rootby_resolve_data_paths. - Concept: walk-forward CV —
experiment_protocolfield drives fold generation.
PRs and commits¶
- PR #361 (PR-361.md) — added
postprocess.conformalisetuple;ForecastConfiggained aresidual_modeplaceholder. - PR #372 (PR-372.md) — made
forecast.residual_modemandatory; extractedResidualModetomodels/meta_models/types.pyto avoid a circular import.
Open questions¶
build_detrender()andbuild_regressor()factory methods live onExperimentConfig— acknowledged as a mis-placement in a TODO atconfig.py:719. Future refactor should move them to the model layer.training_dropna_subset()and_fill_defaults_from_data_rootare also noted as candidates for relocation (config.py:772).