Skip to content

Cotton USA Experiment Config

Top-level fields

Field Value Meaning
random_seed 42 Global RNG seed
mlflow_tracking_uri sqlite:///mlruns.db Local SQLite MLflow store
experiment_name cotton_yield_prediction MLflow experiment name
feature_start_year 1980 Earliest feature year
feature_end_year 2025 Latest feature year
check_data_exists ["data_tmi/features"] Preflight check: features directory must exist (unlike other configs which use [])
commodity.commodity cotton Internal commodity key
commodity.country_code USA ISO-3 code
commodity.season_start month 4, day 1 Season epoch: 1 Apr (same calendar as corn)
commodity.season_start_year_offset 0 Same-year crop
commodity.harvest_season_doy 184 Season-DOY of harvest
commodity.bushel_weight_lbs 1.0 Set to 1.0 so the internal bu/acre column stores lbs/acre directly; cotton has no standard bushel
commodity.delivery_unit lbs_acre Pounds per acre — only config that departs from bu_acre
commodity.yield_range [400.0, 1200.0] Sanity bounds in lbs/acre at delivery
commodity.freeze_cap_sdoy 184 Weather features capped at season DOY 184
experiment_protocol.cv_strategy expanding Walk-forward expanding CV
experiment_protocol.test_years 2020–2024 Hold-out years
experiment_protocol.production_cumulative_threshold 0.95 Top-95% counties by recent production
experiment_protocol.production_recent_years 5 Years for production ranking
model.detrend partial_pooling Partial-pooling detrender
model.detrend_params {} No fixed slope — trend slope fitted freely from data
model.regression pca_ridge PCA(2) → Ridge
model.weather_correction_fit_level ADM0 National-level weather correction
model.regression_params.weather_correction_weight (commented out) No explicit ramp weight; comment notes 0.3 as reference but it is not active
model.regression_params.n_components 2 PCA components
model.regression_params.alpha 10.0 Ridge regularisation
model.regression_params.nan_policy raise Regressors reject NaNs
forecast.residual_mode hindcast_oos_per_init_date Conformal calibration mode
postprocess.bias_corrector.kind none No bias correction
delivery.model_public_name TFFS_V0 Public model label
delivery.ci_levels [0.5, 0.68, 0.80, 0.90, 0.95] Conformal interval levels

Builders

Builder Kind Notable parameters
yields YieldsBuilder filepath: data/nass/preprocessed_cotton.parquet; crop_type: COTTON_UPLAND; edits: panel null-impute area (3-yr trailing median), deductive yield impute; required_for_pred_parquet: false
weather WeatherBuilder filepath: s3://{env}-treefera-greenprint-data/weather/processed/indices/conus_adm2_cotton.zarr; required_for_pred_parquet: true
weather_stress WeatherBuilder filepath: s3://{env}-treefera-greenprint-data/weather/processed/stress/conus_adm2_conus_cotton_ytd_stress.zarr; required_for_pred_parquet: true
climo ClimoBuilder filepath: data/weather/climo_indices/conus_adm2.zarr (local path); required_for_pred_parquet: true

Cotton has no stress parquet builder (the legacy NASS-stress assemble path used by corn/wheat is not wired for cotton); the weather_stress builder reads directly from the cotton-specific YTD stress zarr.

Weather and climo windows

Climo windows (cotton_usa.yaml:commodity.climo_windows):

Name sdoy_start sdoy_end Calendar
gstd 1 null Full growing season to date
apr_jul 1 122 Apr 1 – Jul 31 (fixed)

Weather windows (cotton_usa.yaml:commodity.weather_windows) — four phenological phases matching QUBE cotton stages:

Name sdoy_start sdoy_end Calendar Phase
apr_may 1 61 Apr 1 – May 31 Establishment
jun 62 91 Jun 1 – Jun 30 Square/bloom
jul 92 122 Jul 1 – Jul 31 Continued boll set
aug_sep 123 183 Aug 1 – Sep 30 Boll fill

Cotton uniquely has an aug_sep weather window absent from corn.

Feature columns

20 features drawn from QUBE COTTON_EB_NATIONAL_WX_95_EXP_RAMP_NO_FIXED_SLOPE spec (cotton_usa.yaml:commodity.feature_cols). No stress-lag features. Includes edd_aug_sep and precip_aug_sep for the boll-fill phase, which are absent from the corn feature set.

Reference data

Field Value
kind wasde
name wasde
filepath data/wasde/wasde_cotton_us_yield.csv
commodity cotton
geography united_states
cutoff_month_day month 2, day 1
unit bu_acre (label only; values are lbs/acre)

Note: the WASDE reference unit field is listed as bu_acre but the actual cotton WASDE yield series is in lbs/acre. This is a label inconsistency in the config (cotton_usa.yaml:reference_data[0].unit).

Forecast paths

Field Value
raw_obs_filepath (not specified in this config — no forecast: block)
materialised_climo_filepath (not specified)

Cotton does not define a forecast: block, indicating operational forecast mode is not yet configured for this commodity.

What makes this config distinctive

Cotton is the only commodity to use delivery_unit: lbs_acre and correspondingly sets bushel_weight_lbs: 1.0 so that the kg/ha ↔ delivery-unit conversion is a straight unit-weight pass-through rather than a proper bushel conversion. It also sets detrend_params: {} (no fixed slope), and uses local (non-S3-template) weather and climo zarr paths — suggesting this config was set up for a local dev/data environment rather than the cloud pipeline. The cotton config also carries the only non-empty check_data_exists list among the US crops (data_tmi/features).

Cross-references