Skip to content

Entity: ModelConfig

Definition

ModelConfig is the frozen Pydantic model that selects the detrend implementation and regression estimator for the FIT stage, supplies their hyperparameter dicts, controls the fit-time geographic aggregation level for weather correction, and optionally enables area-weighted sample weighting. It lives as the model field of ExperimentConfig (1:1). extra="forbid" is set to prevent silent config drift when new fields are added.

Kind

Pydantic BaseModel, frozen=True, extra="forbid". Nested inside ExperimentConfig.

Source of truth

market_insights_models/src/commodity_hindcast/config.py:491

Key attributes

Field Type Default Meaning YAML example
detrend Literal["linear_state","gaussian_state","partial_pooling"] "partial_pooling" Selects the AbstractDetrend implementation constructed by ExperimentConfig.build_detrender() partial_pooling
detrend_params dict[str, Any] {} Keyword arguments forwarded to the selected detrender constructor. Corn uses detrend_fixed_slope_bu_ac: 1.6 {detrend_fixed_slope_bu_ac: 1.6}
regression Literal["ridge","xgboost","pca_ridge"] "ridge" Selects the AbstractRegressionImpl constructed by ExperimentConfig.build_regressor() pca_ridge
regression_params dict[str, Any] {} Keyword arguments forwarded to the selected regressor. PCA-Ridge expects n_components, alpha, nan_policy {n_components: 2, alpha: 10.0, nan_policy: raise}
weather_correction_fit_level Literal["ADM0","ADM1","ADM2"] \| None None Geographic aggregation level at which the national weather-correction residual is fitted. None disables weather correction entirely ADM0
use_sample_weights bool False Whether to pass area_harvested_ha as sample weights to the estimator. Requires weather_correction_fit_level="ADM2" false
weight_column str "area_harvested_ha" Column used for fit-time area weighting and optional sample weights area_harvested_ha

Validators

  • _normalize_weather_correction_fit_level (config.py:511, mode=before): strips whitespace and uppercases the string; normalises empty string to None.
  • _validate_sample_weight_usage (config.py:518, mode=after): raises ValueError when use_sample_weights=True but weather_correction_fit_level != "ADM2". The cross-field constraint is documented in AGGREGATES.md as a key ExperimentConfig invariant.

Lifecycle

Constructed as part of ExperimentConfig. Immutable thereafter. ExperimentConfig.build_detrender() and build_regressor() factory methods read model.detrend, model.detrend_params, model.regression, and model.regression_params to instantiate the concrete model objects at runtime (lazy imports to avoid circular imports).

Relationships

  • Owned by ExperimentConfig (config.py:685).
  • Drives Detrender selection: linear_stateLinearStateDetrend; gaussian_stateGaussianWindowStateDetrend; partial_poolingPartialPoolingDetrend.
  • Drives Regressor selection: ridgeRidgeRegressor; xgboostXGBRegressor; pca_ridgePcaRidgeRegressor.
  • Drives FitAggregationPolicy via build_fit_aggregation_policy() which reads weather_correction_fit_level and weight_column.

Concepts and pipelines that touch this entity

  • Pipeline: hindcast FIT stagebuild_detrender() and build_regressor() are called once per fold in run_experiment.
  • Concept: detrending — detrend field and detrend_params control the trend-removal step.
  • Concept: weather correctionweather_correction_fit_level and regression_params.weather_correction_weight / season_doy_weather_weight jointly configure the national weather-correction blend.

PRs and commits

  • PR #340 (PR-340.md) — pca_ridge regression added; weather_correction_fit_level and use_sample_weights cross-field validator introduced.
  • PR #331 (PR-331.md) — partial_pooling detrend option added.

Open questions

  • extra="forbid" is the only config class with this setting. All others silently drop unknown fields. Aligning the policy across classes is a TODO noted in config.py comments.
  • regression_params carries both generic sklearn kwargs (alpha, nan_policy) and pipeline-specific keys (weather_correction_weight, season_doy_weather_weight, n_components). The mixing of concerns is flagged in DOMAIN_MODEL2.md §4.3.