Skip to content

Wheat USA Experiment Config

Top-level fields

Field Value Meaning
random_seed 42 Global RNG seed
mlflow_tracking_uri sqlite:///mlruns.db Local SQLite MLflow store
experiment_name wheat_yield_prediction MLflow experiment name
feature_start_year 1982 Earliest feature year (two years later than other crops — cross-year window requires prior-year data)
feature_end_year 2025 Latest feature year
check_data_exists ["features"] Preflight check: features directory must exist
commodity.commodity wheat Internal commodity key
commodity.country_code USA ISO-3 code
commodity.season_start month 10, day 1 Season epoch: 1 Oct of the prior calendar year
commodity.season_start_year_offset -1 Cross-year crop: season_start_date(2020) = Oct 1, 2019
commodity.harvest_season_doy 305 Season-DOY of harvest (Jul 31 from Oct-1 epoch)
commodity.bushel_weight_lbs 60.0 Wheat: 60 lb/bushel
commodity.delivery_unit bu_acre Standard bushel delivery
commodity.yield_range [0.0, 260.0] Wide lower bound (0.0) to accommodate failed-crop years
commodity.freeze_cap_sdoy 243 Weather cap at season DOY 243 (May 31 from Oct-1 epoch)
experiment_protocol.cv_strategy expanding Walk-forward expanding CV
experiment_protocol.test_years 2019–2023 Hold-out years (one year earlier than other US crops)
experiment_protocol.production_cumulative_threshold 0.9999 Effectively all counties included (near-100% threshold)
experiment_protocol.production_recent_years 5 Years for production ranking
model.detrend partial_pooling Partial-pooling detrender
model.detrend_params {} No fixed slope
model.regression pca_ridge PCA(2) → Ridge
model.weather_correction_fit_level ADM0 National-level weather correction
model.regression_params.n_components 2 PCA components
model.regression_params.alpha 10.0 Ridge regularisation
model.regression_params.nan_policy raise Regressors reject NaNs
forecast.residual_mode hindcast_oos_per_init_date Conformal calibration mode
postprocess.bias_corrector.kind none No bias correction
delivery.model_public_name TFFS_V0 Public model label
delivery.ci_levels [0.5, 0.68, 0.80, 0.90, 0.95] Conformal interval levels

Builders

Builder Kind Notable parameters
yields YieldsBuilder filepath: data/nass/wheat.parquet; crop_type: WINTER_WHEAT; 4 edit rules (see below); required_for_pred_parquet: false
stress StressBuilder filepath: data/stress/preprocessed_wheat_stress.parquet; assembled from conus_adm2_wheat.zarr (S3); gs_start_doy 305, gs_end_doy 213 (cross-year window); rename_map adds 4 lagged stress features; required_for_pred_parquet: true
weather WeatherBuilder filepath: s3://{env}-treefera-greenprint-data/weather/processed/indices/conus_adm2_wheat.zarr; required_for_pred_parquet: true
weather_stress WeatherBuilder filepath: s3://{env}-treefera-greenprint-data/weather/processed/stress/conus_adm2_conus_wheat_ytd_stress.zarr; required_for_pred_parquet: true; currently degraded by upstream geoid-mismatch on z_PW_wheat_gdd_cumsum until the upstream zarr is rebuilt
climo ClimoBuilder filepath: s3://{env}-treefera-greenprint-data/weather/processed/climo_indices/conus_adm2.zarr; required_for_pred_parquet: true

Wheat yield edit rules

Wheat has the most extensive edit chain of all US commodities (wheat_usa.yaml:commodity.builders.yields.edits):

  1. impute_null_area_from_own_history — panel null-impute with 5-year lookback (others use 3).
  2. impute_null_yield_from_prod_area — deductive impute: yield = prod / area.
  3. yield_matches_prod_over_area — ratio edit: fires when published yield differs from prod/area by >2×; replaces with deductive impute.
  4. yield_within_plausible_range — range edit: flags (does NOT impute) rows above 15,000 kg/ha (~223 bu/ac).

The range edit uses on_fail: operation: flag — uniquely conservative among all configs.

Wheat stress rename map

stress_score: stress_score_lag1
z_GS_EDD: z_gs_edd_lag1
z_GS_GDD: z_gs_gdd_lag1
z_GS_Ptotal: z_gs_ptotal_lag1

Wheat exposes 4 lagged stress features (vs corn's 2), adding z_gs_gdd_lag1 and z_gs_ptotal_lag1. However, these are not present in the active feature_cols list — the config comment explains that *_zscore_gstd and lag-1 stress features were removed due to NaN/constant-value issues.

Weather and climo windows

Climo windows (wheat_usa.yaml:commodity.climo_windows):

Name sdoy_start sdoy_end Calendar
gstd 1 null Full growing season to date (Oct 1 onwards)
apr_jul 183 273 Apr 1 – Jun 30 in wheat's Oct-1 epoch

Note: the wheat apr_jul climo window has a non-zero sdoy_start (183) — it is a fixed mid-season window, unlike the corn/soybean apr_jul which always starts at sdoy 1.

Weather windows (wheat_usa.yaml:commodity.weather_windows) — four phenological phases across the cross-year season:

Name sdoy_start sdoy_end Calendar Phase
oct_dec 1 92 Oct 1 – Dec 31 Fall establishment
jan_mar 93 182 Jan 1 – Mar 31 Winter dormancy
apr_may 183 243 Apr 1 – May 31 Spring greenup
jun_jul 244 305 Jun 1 – Jul 31 Grain fill through harvest

Feature columns

11 features (wheat_usa.yaml:commodity.feature_cols) — a reduced set compared to the other US crops. Key differences:

  • No *_zscore_gstd features (cross-year gstd window wraps calendar year boundaries, producing all-NaN values in the current climo zarr).
  • No stress-lag features (constant across init_dates within a harvest year; excluded to avoid pinning each year to a baseline).
  • Phase-specific accumulations span all four phenological windows: edd_oct_dec, precip_oct_dec, edd_jan_mar, precip_jan_mar, edd_apr_may, precip_apr_may.
  • Z-score features restricted to apr_jul window only: gdd_zscore_apr_jul, tavg_zscore_apr_jul, dtr_zscore_apr_jul, dry_days_zscore_apr_jul, precip_zscore_apr_jul.

Season-DOY weather ramp

A 6-point ramp calibrated to phenological milestones (wheat_usa.yaml:model.regression_params.season_doy_weather_weight):

sdoy Calendar Weight
1 Oct 1 0.0
92 Dec 31 0.2
182 Mar 31 0.4
243 May 31 0.7
273 Jun 30 0.9
305 Jul 31 1.0

Each step corresponds to the completion of a phenological window, making the ramp semantically grounded in the crop calendar.

Reference data

Field Value
kind wasde
name wasde
filepath data/wasde/wasde_wheat_us_yield.csv
commodity wheat
geography united_states
cutoff_month_day month 2, day 1
unit bu_acre

Forecast paths

Field Value
raw_obs_filepath s3://{env}-treefera-greenprint-data/weather/processed/areal_aggregation/conus_adm2.zarr
materialised_climo_filepath s3://{env}-treefera-greenprint-data/weather/processed/climatology/conus_adm2_baseline_1980_2025_w31_materialised.zarr

Wheat reuses the commodity-agnostic CONUS ADM2 zarrs (same as corn). S3 {env} templates are used.

What makes this config distinctive

Winter wheat is the only cross-year US crop (season_start_year_offset: -1): its season starts 1 Oct of the prior calendar year and runs through Jul 31 of the harvest year, spanning roughly 10 months. This forces feature_start_year: 1982 (two years later than others), alters the climo window definition (non-zero sdoy_start for the apr_jul window), and motivates a 4-phase weather window scheme covering fall through harvest. The production_cumulative_threshold: 0.9999 effectively includes all wheat counties, reflecting the more dispersed US winter-wheat geography. The stress builder registers 4 lagged features but none are active in feature_cols due to known data-quality issues (cross-year climo NaN pathology).

Cross-references