Skip to content

Entity: CommodityConfig

Definition

CommodityConfig is the frozen Pydantic model that encodes all commodity-specific constants for a single crop × country run. It is a nested field of ExperimentConfig (1:1) and is the sole source of truth for the crop calendar, the builder registry, feature and target column names, delivery unit, and yield plausibility bounds. Every stage and utility that depends on crop-specific behaviour receives cfg.commodity rather than a bare commodity string.

Kind

Pydantic BaseModel, frozen=True. Nested inside ExperimentConfig.

Source of truth

market_insights_models/src/commodity_hindcast/config.py:274

Key attributes

Field Type Default Meaning YAML example
commodity str required Internal commodity key (e.g. "corn", "soybeans", "wheat", "cotton") corn
country_code str required ISO-3 code, stripped+uppercased at load. Injected into every geo_identifier ADM0 segment USA
season_start MonthDay required Calendar anchor for season DOY 1 (e.g. Apr-1 for US corn) {month: 4, day: 1}
season_start_year_offset int required Shifts season_start_date(y) relative to harvest year. 0 = same-year crop; -1 = southern-hemisphere cross-year (Brazil soy) 0
harvest_season_doy int required Season-DOY of harvest; used by harvest_date() and freeze_cap_sdoy 184
hindcast_init_season_doys tuple[int, ...] required Weekly init-date grid as season-DOYs; drives hindcast_init_dates() [1, 8, 15, ...]
bushel_weight_lbs float required Lbs per bushel for kg/ha ↔ bu/acre conversion (corn 56, soy 60, wheat 60, cotton 1.0) 56.0
delivery_unit str required Output unit label in delivery CSVs ("bu_acre" or "lbs_acre") bu_acre
yield_range tuple[float, float] required Sanity bounds (lower, upper) in delivery_unit; delivery clips and validates against these [50.0, 250.0]
feature_cols list[str] required Ordered column names fed to the regression estimator [dry_days_zscore_gstd, ...]
target_col str required Raw yield column used as training target (typically yield_kg_ha) yield_kg_ha
target_detrended_col str required Detrended residual column name yield_kg_ha_detrended
yield_col str required Yield column in feature parquets yield_kg_ha
area_col str required Area column name for weighting area_harvested_ha
auxiliary_cols tuple[str, ...] () Columns carried through the feature matrix but excluded from feature_cols [area_harvested_ha, production_kg]
climo_windows tuple[SeasonWindow, ...] () Named aggregation windows for climatology indices see below
climo_zscore_vars tuple[str, ...] required Variable names for which z-score features are computed against climo [gdd, precip, ...]
weather_windows tuple[SeasonWindow, ...] required Named aggregation windows for weather indices see below
weather_vars tuple[str, ...] required Variables accumulated over each weather_window [edd, precip, ...]
climo_lag_days int 1 Lag between init_date and last included climo observation 1
weather_lag_days int 1 Lag between init_date and last included weather observation 1
freeze_cap_sdoy int \| None None Season-DOY beyond which weather/climo features freeze (crop done growing) 184
actuals_source_short str "NASS" Short label for actuals in metrics tables and chart legends IBGE
actuals_source_label str "NASS Survey Yield (area-weighted)" Long label for chart titles and report prose IBGE-PAM municipal yield (area-weighted)
builders dict[str, Builder] required Builder registry; dict key auto-injected as type discriminator by _inject_builder_type_from_key {yields: {...}, weather: {...}}
ndvi_mean_col str "ndvi_mean" NDVI mean column name ndvi_mean
ndvi_p75_col str "ndvi_p75" NDVI 75th-percentile column name ndvi_p75

SeasonWindow fields

SeasonWindow is a frozen dataclass (config.py:256) with fields name: str, sdoy_start: int, sdoy_end: int | None. When sdoy_end is None the window runs to the current init date (progressive/growing-season-to-date window).

Lifecycle

  1. _inject_builder_type_from_key (config.py:354, mode=before) copies each builders-dict key into the builder payload's type field so YAML authors need not repeat the type.
  2. _normalise_country_code (config.py:345, mode=after) strips whitespace, uppercases, and enforces the [A-Z]{3} ISO-3 pattern.
  3. Constructed as part of ExperimentConfig via _prepare_commodity which can resolve an inline dict or a path-string reference to a nested YAML.
  4. Immutable for the remainder of the run; all stage code reads cfg.commodity.* directly.

Relationships

  • Owned by ExperimentConfig (1:1).
  • Contains dict[str, Builder] (one YieldsBuilder, WeatherBuilder, ClimoBuilder, NDVIBuilder, or StressBuilder per source).
  • Drives HindcastSlice.cutoff via season_start_date() and hindcast_init_dates().
  • Drives the experiment_key property used in every artefact path: {commodity}_{country_code.lower()} (e.g. corn_usa, soybeans_bra).
  • Referenced by FoldSchedule (dashboard layer, app/_dashboard_config.py) by commodity name string.

Concepts and pipelines that touch this entity

  • FeatureBuilderConfig — each builder in commodity.builders is a FeatureBuilderConfig variant.
  • Pipeline: feature build — builders are iterated; feature_cols determine the output parquet schema.
  • Concept: season DOY — season_start, season_start_year_offset, and hindcast_init_season_doys jointly define the DOY system.

PRs and commits

  • PR #353 — Brazil soybean config introduced season_start_year_offset: -1, actuals_source_short, and actuals_source_label fields.
  • PR #345 (PR-345.md) — country_code made mandatory and normalised; experiment_key property derived from it.

Open questions

  • Wheat sub-types (WINTER_WHEAT, SPRING_DURUM_WHEAT, etc.) are listed in config comments but the NASS preprocessor only produces crop_type: WHEAT. Documented as a known issue in DOMAIN_MODEL2.md §9 and in the project memory.