Wheat USA Experiment Config¶
Top-level fields¶
| Field | Value | Meaning |
|---|---|---|
random_seed |
42 | Global RNG seed |
mlflow_tracking_uri |
sqlite:///mlruns.db |
Local SQLite MLflow store |
experiment_name |
wheat_yield_prediction |
MLflow experiment name |
feature_start_year |
1982 | Earliest feature year (two years later than other crops — cross-year window requires prior-year data) |
feature_end_year |
2025 | Latest feature year |
check_data_exists |
["features"] |
Preflight check: features directory must exist |
commodity.commodity |
wheat |
Internal commodity key |
commodity.country_code |
USA |
ISO-3 code |
commodity.season_start |
month 10, day 1 | Season epoch: 1 Oct of the prior calendar year |
commodity.season_start_year_offset |
-1 | Cross-year crop: season_start_date(2020) = Oct 1, 2019 |
commodity.harvest_season_doy |
305 | Season-DOY of harvest (Jul 31 from Oct-1 epoch) |
commodity.bushel_weight_lbs |
60.0 | Wheat: 60 lb/bushel |
commodity.delivery_unit |
bu_acre |
Standard bushel delivery |
commodity.yield_range |
[0.0, 260.0] |
Wide lower bound (0.0) to accommodate failed-crop years |
commodity.freeze_cap_sdoy |
243 | Weather cap at season DOY 243 (May 31 from Oct-1 epoch) |
experiment_protocol.cv_strategy |
expanding |
Walk-forward expanding CV |
experiment_protocol.test_years |
2019–2023 | Hold-out years (one year earlier than other US crops) |
experiment_protocol.production_cumulative_threshold |
0.9999 | Effectively all counties included (near-100% threshold) |
experiment_protocol.production_recent_years |
5 | Years for production ranking |
model.detrend |
partial_pooling |
Partial-pooling detrender |
model.detrend_params |
{} |
No fixed slope |
model.regression |
pca_ridge |
PCA(2) → Ridge |
model.weather_correction_fit_level |
ADM0 |
National-level weather correction |
model.regression_params.n_components |
2 | PCA components |
model.regression_params.alpha |
10.0 | Ridge regularisation |
model.regression_params.nan_policy |
raise |
Regressors reject NaNs |
forecast.residual_mode |
hindcast_oos_per_init_date |
Conformal calibration mode |
postprocess.bias_corrector.kind |
none |
No bias correction |
delivery.model_public_name |
TFFS_V0 |
Public model label |
delivery.ci_levels |
[0.5, 0.68, 0.80, 0.90, 0.95] |
Conformal interval levels |
Builders¶
| Builder | Kind | Notable parameters |
|---|---|---|
yields |
YieldsBuilder |
filepath: data/nass/wheat.parquet; crop_type: WINTER_WHEAT; 4 edit rules (see below); required_for_pred_parquet: false |
stress |
StressBuilder |
filepath: data/stress/preprocessed_wheat_stress.parquet; assembled from conus_adm2_wheat.zarr (S3); gs_start_doy 305, gs_end_doy 213 (cross-year window); rename_map adds 4 lagged stress features; required_for_pred_parquet: true |
weather |
WeatherBuilder |
filepath: s3://{env}-treefera-greenprint-data/weather/processed/indices/conus_adm2_wheat.zarr; required_for_pred_parquet: true |
weather_stress |
WeatherBuilder |
filepath: s3://{env}-treefera-greenprint-data/weather/processed/stress/conus_adm2_conus_wheat_ytd_stress.zarr; required_for_pred_parquet: true; currently degraded by upstream geoid-mismatch on z_PW_wheat_gdd_cumsum until the upstream zarr is rebuilt |
climo |
ClimoBuilder |
filepath: s3://{env}-treefera-greenprint-data/weather/processed/climo_indices/conus_adm2.zarr; required_for_pred_parquet: true |
Wheat yield edit rules¶
Wheat has the most extensive edit chain of all US commodities (wheat_usa.yaml:commodity.builders.yields.edits):
impute_null_area_from_own_history— panel null-impute with 5-year lookback (others use 3).impute_null_yield_from_prod_area— deductive impute:yield = prod / area.yield_matches_prod_over_area— ratio edit: fires when published yield differs from prod/area by >2×; replaces with deductive impute.yield_within_plausible_range— range edit: flags (does NOT impute) rows above 15,000 kg/ha (~223 bu/ac).
The range edit uses on_fail: operation: flag — uniquely conservative among all configs.
Wheat stress rename map¶
stress_score: stress_score_lag1
z_GS_EDD: z_gs_edd_lag1
z_GS_GDD: z_gs_gdd_lag1
z_GS_Ptotal: z_gs_ptotal_lag1
Wheat exposes 4 lagged stress features (vs corn's 2), adding z_gs_gdd_lag1 and z_gs_ptotal_lag1. However, these are not present in the active feature_cols list — the config comment explains that *_zscore_gstd and lag-1 stress features were removed due to NaN/constant-value issues.
Weather and climo windows¶
Climo windows (wheat_usa.yaml:commodity.climo_windows):
| Name | sdoy_start | sdoy_end | Calendar |
|---|---|---|---|
gstd |
1 | null | Full growing season to date (Oct 1 onwards) |
apr_jul |
183 | 273 | Apr 1 – Jun 30 in wheat's Oct-1 epoch |
Note: the wheat apr_jul climo window has a non-zero sdoy_start (183) — it is a fixed mid-season window, unlike the corn/soybean apr_jul which always starts at sdoy 1.
Weather windows (wheat_usa.yaml:commodity.weather_windows) — four phenological phases across the cross-year season:
| Name | sdoy_start | sdoy_end | Calendar | Phase |
|---|---|---|---|---|
oct_dec |
1 | 92 | Oct 1 – Dec 31 | Fall establishment |
jan_mar |
93 | 182 | Jan 1 – Mar 31 | Winter dormancy |
apr_may |
183 | 243 | Apr 1 – May 31 | Spring greenup |
jun_jul |
244 | 305 | Jun 1 – Jul 31 | Grain fill through harvest |
Feature columns¶
11 features (wheat_usa.yaml:commodity.feature_cols) — a reduced set compared to the other US crops. Key differences:
- No
*_zscore_gstdfeatures (cross-year gstd window wraps calendar year boundaries, producing all-NaN values in the current climo zarr). - No stress-lag features (constant across init_dates within a harvest year; excluded to avoid pinning each year to a baseline).
- Phase-specific accumulations span all four phenological windows:
edd_oct_dec,precip_oct_dec,edd_jan_mar,precip_jan_mar,edd_apr_may,precip_apr_may. - Z-score features restricted to
apr_julwindow only:gdd_zscore_apr_jul,tavg_zscore_apr_jul,dtr_zscore_apr_jul,dry_days_zscore_apr_jul,precip_zscore_apr_jul.
Season-DOY weather ramp¶
A 6-point ramp calibrated to phenological milestones (wheat_usa.yaml:model.regression_params.season_doy_weather_weight):
| sdoy | Calendar | Weight |
|---|---|---|
| 1 | Oct 1 | 0.0 |
| 92 | Dec 31 | 0.2 |
| 182 | Mar 31 | 0.4 |
| 243 | May 31 | 0.7 |
| 273 | Jun 30 | 0.9 |
| 305 | Jul 31 | 1.0 |
Each step corresponds to the completion of a phenological window, making the ramp semantically grounded in the crop calendar.
Reference data¶
| Field | Value |
|---|---|
kind |
wasde |
name |
wasde |
filepath |
data/wasde/wasde_wheat_us_yield.csv |
commodity |
wheat |
geography |
united_states |
cutoff_month_day |
month 2, day 1 |
unit |
bu_acre |
Forecast paths¶
| Field | Value |
|---|---|
raw_obs_filepath |
s3://{env}-treefera-greenprint-data/weather/processed/areal_aggregation/conus_adm2.zarr |
materialised_climo_filepath |
s3://{env}-treefera-greenprint-data/weather/processed/climatology/conus_adm2_baseline_1980_2025_w31_materialised.zarr |
Wheat reuses the commodity-agnostic CONUS ADM2 zarrs (same as corn). S3 {env} templates are used.
What makes this config distinctive¶
Winter wheat is the only cross-year US crop (season_start_year_offset: -1): its season starts 1 Oct of the prior calendar year and runs through Jul 31 of the harvest year, spanning roughly 10 months. This forces feature_start_year: 1982 (two years later than others), alters the climo window definition (non-zero sdoy_start for the apr_jul window), and motivates a 4-phase weather window scheme covering fall through harvest. The production_cumulative_threshold: 0.9999 effectively includes all wheat counties, reflecting the more dispersed US winter-wheat geography. The stress builder registers 4 lagged features but none are active in feature_cols due to known data-quality issues (cross-year climo NaN pathology).