Entity: CommodityConfig¶
Definition¶
CommodityConfig is the frozen Pydantic model that encodes all commodity-specific constants for a single crop × country run. It is a nested field of ExperimentConfig (1:1) and is the sole source of truth for the crop calendar, the builder registry, feature and target column names, delivery unit, and yield plausibility bounds. Every stage and utility that depends on crop-specific behaviour receives cfg.commodity rather than a bare commodity string.
Kind¶
Pydantic BaseModel, frozen=True. Nested inside ExperimentConfig.
Source of truth¶
market_insights_models/src/commodity_hindcast/config.py:274
Key attributes¶
| Field | Type | Default | Meaning | YAML example |
|---|---|---|---|---|
commodity |
str |
required | Internal commodity key (e.g. "corn", "soybeans", "wheat", "cotton") |
corn |
country_code |
str |
required | ISO-3 code, stripped+uppercased at load. Injected into every geo_identifier ADM0 segment |
USA |
season_start |
MonthDay |
required | Calendar anchor for season DOY 1 (e.g. Apr-1 for US corn) | {month: 4, day: 1} |
season_start_year_offset |
int |
required | Shifts season_start_date(y) relative to harvest year. 0 = same-year crop; -1 = southern-hemisphere cross-year (Brazil soy) |
0 |
harvest_season_doy |
int |
required | Season-DOY of harvest; used by harvest_date() and freeze_cap_sdoy |
184 |
hindcast_init_season_doys |
tuple[int, ...] |
required | Weekly init-date grid as season-DOYs; drives hindcast_init_dates() |
[1, 8, 15, ...] |
bushel_weight_lbs |
float |
required | Lbs per bushel for kg/ha ↔ bu/acre conversion (corn 56, soy 60, wheat 60, cotton 1.0) | 56.0 |
delivery_unit |
str |
required | Output unit label in delivery CSVs ("bu_acre" or "lbs_acre") |
bu_acre |
yield_range |
tuple[float, float] |
required | Sanity bounds (lower, upper) in delivery_unit; delivery clips and validates against these |
[50.0, 250.0] |
feature_cols |
list[str] |
required | Ordered column names fed to the regression estimator | [dry_days_zscore_gstd, ...] |
target_col |
str |
required | Raw yield column used as training target (typically yield_kg_ha) |
yield_kg_ha |
target_detrended_col |
str |
required | Detrended residual column name | yield_kg_ha_detrended |
yield_col |
str |
required | Yield column in feature parquets | yield_kg_ha |
area_col |
str |
required | Area column name for weighting | area_harvested_ha |
auxiliary_cols |
tuple[str, ...] |
() |
Columns carried through the feature matrix but excluded from feature_cols |
[area_harvested_ha, production_kg] |
climo_windows |
tuple[SeasonWindow, ...] |
() |
Named aggregation windows for climatology indices | see below |
climo_zscore_vars |
tuple[str, ...] |
required | Variable names for which z-score features are computed against climo | [gdd, precip, ...] |
weather_windows |
tuple[SeasonWindow, ...] |
required | Named aggregation windows for weather indices | see below |
weather_vars |
tuple[str, ...] |
required | Variables accumulated over each weather_window |
[edd, precip, ...] |
climo_lag_days |
int |
1 |
Lag between init_date and last included climo observation |
1 |
weather_lag_days |
int |
1 |
Lag between init_date and last included weather observation |
1 |
freeze_cap_sdoy |
int \| None |
None |
Season-DOY beyond which weather/climo features freeze (crop done growing) | 184 |
actuals_source_short |
str |
"NASS" |
Short label for actuals in metrics tables and chart legends | IBGE |
actuals_source_label |
str |
"NASS Survey Yield (area-weighted)" |
Long label for chart titles and report prose | IBGE-PAM municipal yield (area-weighted) |
builders |
dict[str, Builder] |
required | Builder registry; dict key auto-injected as type discriminator by _inject_builder_type_from_key |
{yields: {...}, weather: {...}} |
ndvi_mean_col |
str |
"ndvi_mean" |
NDVI mean column name | ndvi_mean |
ndvi_p75_col |
str |
"ndvi_p75" |
NDVI 75th-percentile column name | ndvi_p75 |
SeasonWindow fields¶
SeasonWindow is a frozen dataclass (config.py:256) with fields name: str, sdoy_start: int, sdoy_end: int | None. When sdoy_end is None the window runs to the current init date (progressive/growing-season-to-date window).
Lifecycle¶
_inject_builder_type_from_key(config.py:354, mode=before) copies each builders-dict key into the builder payload'stypefield so YAML authors need not repeat the type._normalise_country_code(config.py:345, mode=after) strips whitespace, uppercases, and enforces the[A-Z]{3}ISO-3 pattern.- Constructed as part of
ExperimentConfigvia_prepare_commoditywhich can resolve an inline dict or a path-string reference to a nested YAML. - Immutable for the remainder of the run; all stage code reads
cfg.commodity.*directly.
Relationships¶
- Owned by
ExperimentConfig(1:1). - Contains
dict[str, Builder](oneYieldsBuilder,WeatherBuilder,ClimoBuilder,NDVIBuilder, orStressBuilderper source). - Drives
HindcastSlice.cutoffviaseason_start_date()andhindcast_init_dates(). - Drives the
experiment_keyproperty used in every artefact path:{commodity}_{country_code.lower()}(e.g.corn_usa,soybeans_bra). - Referenced by
FoldSchedule(dashboard layer,app/_dashboard_config.py) by commodity name string.
Concepts and pipelines that touch this entity¶
- FeatureBuilderConfig — each builder in
commodity.buildersis aFeatureBuilderConfigvariant. - Pipeline: feature build — builders are iterated;
feature_colsdetermine the output parquet schema. - Concept: season DOY —
season_start,season_start_year_offset, andhindcast_init_season_doysjointly define the DOY system.
PRs and commits¶
- PR #353 — Brazil soybean config introduced
season_start_year_offset: -1,actuals_source_short, andactuals_source_labelfields. - PR #345 (PR-345.md) —
country_codemade mandatory and normalised;experiment_keyproperty derived from it.
Open questions¶
- Wheat sub-types (
WINTER_WHEAT,SPRING_DURUM_WHEAT, etc.) are listed in config comments but the NASS preprocessor only producescrop_type: WHEAT. Documented as a known issue inDOMAIN_MODEL2.md §9and in the project memory.