Cotton USA Experiment Config¶
Top-level fields¶
| Field | Value | Meaning |
|---|---|---|
random_seed |
42 | Global RNG seed |
mlflow_tracking_uri |
sqlite:///mlruns.db |
Local SQLite MLflow store |
experiment_name |
cotton_yield_prediction |
MLflow experiment name |
feature_start_year |
1980 | Earliest feature year |
feature_end_year |
2025 | Latest feature year |
check_data_exists |
["data_tmi/features"] |
Preflight check: features directory must exist (unlike other configs which use []) |
commodity.commodity |
cotton |
Internal commodity key |
commodity.country_code |
USA |
ISO-3 code |
commodity.season_start |
month 4, day 1 | Season epoch: 1 Apr (same calendar as corn) |
commodity.season_start_year_offset |
0 | Same-year crop |
commodity.harvest_season_doy |
184 | Season-DOY of harvest |
commodity.bushel_weight_lbs |
1.0 | Set to 1.0 so the internal bu/acre column stores lbs/acre directly; cotton has no standard bushel |
commodity.delivery_unit |
lbs_acre |
Pounds per acre — only config that departs from bu_acre |
commodity.yield_range |
[400.0, 1200.0] |
Sanity bounds in lbs/acre at delivery |
commodity.freeze_cap_sdoy |
184 | Weather features capped at season DOY 184 |
experiment_protocol.cv_strategy |
expanding |
Walk-forward expanding CV |
experiment_protocol.test_years |
2020–2024 | Hold-out years |
experiment_protocol.production_cumulative_threshold |
0.95 | Top-95% counties by recent production |
experiment_protocol.production_recent_years |
5 | Years for production ranking |
model.detrend |
partial_pooling |
Partial-pooling detrender |
model.detrend_params |
{} |
No fixed slope — trend slope fitted freely from data |
model.regression |
pca_ridge |
PCA(2) → Ridge |
model.weather_correction_fit_level |
ADM0 |
National-level weather correction |
model.regression_params.weather_correction_weight |
(commented out) | No explicit ramp weight; comment notes 0.3 as reference but it is not active |
model.regression_params.n_components |
2 | PCA components |
model.regression_params.alpha |
10.0 | Ridge regularisation |
model.regression_params.nan_policy |
raise |
Regressors reject NaNs |
forecast.residual_mode |
hindcast_oos_per_init_date |
Conformal calibration mode |
postprocess.bias_corrector.kind |
none |
No bias correction |
delivery.model_public_name |
TFFS_V0 |
Public model label |
delivery.ci_levels |
[0.5, 0.68, 0.80, 0.90, 0.95] |
Conformal interval levels |
Builders¶
| Builder | Kind | Notable parameters |
|---|---|---|
yields |
YieldsBuilder |
filepath: data/nass/preprocessed_cotton.parquet; crop_type: COTTON_UPLAND; edits: panel null-impute area (3-yr trailing median), deductive yield impute; required_for_pred_parquet: false |
weather |
WeatherBuilder |
filepath: s3://{env}-treefera-greenprint-data/weather/processed/indices/conus_adm2_cotton.zarr; required_for_pred_parquet: true |
weather_stress |
WeatherBuilder |
filepath: s3://{env}-treefera-greenprint-data/weather/processed/stress/conus_adm2_conus_cotton_ytd_stress.zarr; required_for_pred_parquet: true |
climo |
ClimoBuilder |
filepath: data/weather/climo_indices/conus_adm2.zarr (local path); required_for_pred_parquet: true |
Cotton has no stress parquet builder (the legacy NASS-stress assemble path used by corn/wheat is not wired for cotton); the weather_stress builder reads directly from the cotton-specific YTD stress zarr.
Weather and climo windows¶
Climo windows (cotton_usa.yaml:commodity.climo_windows):
| Name | sdoy_start | sdoy_end | Calendar |
|---|---|---|---|
gstd |
1 | null | Full growing season to date |
apr_jul |
1 | 122 | Apr 1 – Jul 31 (fixed) |
Weather windows (cotton_usa.yaml:commodity.weather_windows) — four phenological phases matching QUBE cotton stages:
| Name | sdoy_start | sdoy_end | Calendar | Phase |
|---|---|---|---|---|
apr_may |
1 | 61 | Apr 1 – May 31 | Establishment |
jun |
62 | 91 | Jun 1 – Jun 30 | Square/bloom |
jul |
92 | 122 | Jul 1 – Jul 31 | Continued boll set |
aug_sep |
123 | 183 | Aug 1 – Sep 30 | Boll fill |
Cotton uniquely has an aug_sep weather window absent from corn.
Feature columns¶
20 features drawn from QUBE COTTON_EB_NATIONAL_WX_95_EXP_RAMP_NO_FIXED_SLOPE spec (cotton_usa.yaml:commodity.feature_cols). No stress-lag features. Includes edd_aug_sep and precip_aug_sep for the boll-fill phase, which are absent from the corn feature set.
Reference data¶
| Field | Value |
|---|---|
kind |
wasde |
name |
wasde |
filepath |
data/wasde/wasde_cotton_us_yield.csv |
commodity |
cotton |
geography |
united_states |
cutoff_month_day |
month 2, day 1 |
unit |
bu_acre (label only; values are lbs/acre) |
Note: the WASDE reference unit field is listed as bu_acre but the actual cotton WASDE yield series is in lbs/acre. This is a label inconsistency in the config (cotton_usa.yaml:reference_data[0].unit).
Forecast paths¶
| Field | Value |
|---|---|
raw_obs_filepath |
(not specified in this config — no forecast: block) |
materialised_climo_filepath |
(not specified) |
Cotton does not define a forecast: block, indicating operational forecast mode is not yet configured for this commodity.
What makes this config distinctive¶
Cotton is the only commodity to use delivery_unit: lbs_acre and correspondingly sets bushel_weight_lbs: 1.0 so that the kg/ha ↔ delivery-unit conversion is a straight unit-weight pass-through rather than a proper bushel conversion. It also sets detrend_params: {} (no fixed slope), and uses local (non-S3-template) weather and climo zarr paths — suggesting this config was set up for a local dev/data environment rather than the cloud pipeline. The cotton config also carries the only non-empty check_data_exists list among the US crops (data_tmi/features).