Skip to content

Entity: Commodity

Definition

An agricultural crop being modelled — corn, soybean, wheat, cotton, or brazil_soybean. Commodity is the root discriminator for all crop-specific constants: crop calendar, feature columns, delivery units, plausibility bounds, and the ADM0 country scope. It is not represented as a standalone class; its identity lives as the CommodityConfig.commodity: str field, and all derived constants are co-located on CommodityConfig.

Kind

Value object (string field on a frozen Pydantic model). The canonical container is CommodityConfig (config.py:274). Commodity itself has no class; it is the name for the identity dimension of CommodityConfig.

Source of truth

market_insights_models/src/commodity_hindcast/config.py:274CommodityConfig class declaration. The commodity identity field is at config.py:279. The country_code that pairs with it to form experiment_key is at config.py:289.

Key attributes / structure

Field Type Notes
commodity str Primary identity; lower-case slug, e.g. "corn", "soybeans", "wheat", "cotton"
country_code str ISO-3 country code (validated, uppercased); default "USA"; pairs with commodity to form experiment_key
season_start MonthDay First calendar day of the growing season
season_start_year_offset int 0 for same-year crops; -1 for cross-year crops (winter wheat, brazil soy)
harvest_season_doy int Season DOY of harvest; upper bound of the growing window
hindcast_init_season_doys tuple[int, …] Season DOYs for the weekly init-date grid
yield_range tuple[float, float] Plausibility bounds in delivery units; enforced at delivery/ boundary
delivery_unit str "bu_acre" for grains; "lbs_acre" for cotton
bushel_weight_lbs float Commodity-specific bushel weight for kg/ha ↔ bu/ac conversion; 1.0 for cotton
feature_cols list[str] Ordered list of weather/climo/NDVI columns fed to the regressor
target_col str Training target column (typically "yield_kg_ha")
actuals_source_short str Label for the truth source (default "NASS"; "CONAB" for Brazil)

Known instances (5 active commodities):

YAML config commodity country_code Delivery unit
configs/corn_usa.yaml corn USA bu_acre
configs/soybeans_usa.yaml soybeans USA bu_acre
configs/wheat_usa.yaml wheat USA bu_acre
configs/cotton_usa.yaml cotton USA lbs_acre
configs/soybeans_bra.yaml soybeans BRA bu_acre

Lifecycle

Created: Parsed from a commodity YAML config at ExperimentConfig construction time. The _inject_builder_type_from_key validator (config.py:354) auto-wires the builders dict keys. country_code is validated and uppercased at load time.

Consumed: Every pipeline stage receives the parent ExperimentConfig; all commodity-specific dispatch (calendar, unit conversion, feature column selection) keys on CommodityConfig. make_geo_identifier (lib/geo/identifiers.py:207) uses country_code to anchor the ADM0 segment of every GeoIdentifier.

Destroyed: Never destroyed; CommodityConfig is frozen and immutable for the lifetime of a run.

Relationships to other entities

  • SeasonYear — governs — CommodityConfig.season_start_date(season_year) converts a SeasonYear integer to a calendar anchor
  • InitDate — generates — CommodityConfig.hindcast_init_dates(season_year) returns the full weekly init-date grid for any season year
  • Region — scopes — country_code determines the ADM0 segment of every GeoIdentifier minted for this commodity's features
  • Yield — defines bounds for — yield_range and delivery_unit govern unit conversion and plausibility clamping at delivery
  • Fold — implicitly scoped by — each fold's filesystem path includes CommodityConfig.experiment_key as a directory component

Concepts and pipelines that touch this entity

  • Pipeline: hindcast (P5) — commodity is the top-level discriminator for every stage
  • Pipeline: forecast (P5) — ExperimentConfig.init_dates_for dispatches on commodity's hindcast_init_season_doys
  • Concept: unit conversion (P5) — lib/unit_utils.py uses bushel_weight_lbs from CommodityConfig

PRs and commits

  • PR-360 — Adds configs/soybeans_bra.yaml; introduces country_code as a required field on CommodityConfig to disambiguate US vs Brazil soybean runs; fixes a silent factor-67 unit bug in CONAB yield loading
  • PR-339 — Package restructure that moved CommodityConfig to its canonical location in config.py

Open questions

  • Should Commodity be promoted to a proper NewType or Enum so that downstream code receives type-safe dispatch rather than raw str comparisons?
  • The actuals_source_short / actuals_source_label split is partly cosmetic; is there a cleaner encoding that ties the label to the ReferenceYieldSpec discriminator?
  • Wheat sub-type labels (WINTER_WHEAT, etc.) appear in config notes but are never produced by the NASS preprocessor — this is an open issue documented in DOMAIN_MODEL2.md §9.
  • brazil_soybean vs soybeans + BRA naming: two configs use the commodity string "soybeans" with different country_code values, rather than a separate commodity slug. This may cause confusion when filtering by commodity string alone.