Skip to content

Corn USA Experiment Config

Top-level fields

Field Value Meaning
random_seed 42 Global RNG seed for reproducibility
mlflow_tracking_uri sqlite:///mlruns.db Local SQLite MLflow store relative to data_root
experiment_name corn_yield_prediction MLflow experiment name
feature_start_year 1980 Earliest year for which feature rows are built
feature_end_year 2026 Latest year (inclusive of current operational year)
check_data_exists [] No extra preflight checks; reference_data paths are checked automatically via ResolvablePath walker
commodity.commodity corn Internal commodity key
commodity.country_code USA ISO-3 code; threaded into every geo_identifier
commodity.season_start month 4, day 1 Season epoch: 1 Apr
commodity.season_start_year_offset 0 Same-year crop; season and harvest year are identical
commodity.harvest_season_doy 184 Season-DOY of harvest (1 Oct from Apr-1 epoch)
commodity.bushel_weight_lbs 56.0 Corn: 56 lb/bushel for kg/ha ↔ bu/acre conversion
commodity.delivery_unit bu_acre Unit used in delivery CSVs
commodity.yield_range [50.0, 250.0] Sanity bounds in bu/acre at delivery
commodity.freeze_cap_sdoy 184 Weather features capped at 1 Oct (season DOY 184)
experiment_protocol.cv_strategy expanding Walk-forward expanding-window CV
experiment_protocol.test_years 2020–2024 Hold-out years for OOS evaluation
experiment_protocol.production_cumulative_threshold 0.95 Top-95% counties by recent production kept
experiment_protocol.production_recent_years 5 Years used to rank counties
model.detrend partial_pooling Partial-pooling detrender
model.detrend_params.detrend_fixed_slope_bu_ac 1.6 Forced trend slope in bu/acre/yr
model.regression pca_ridge PCA(2) → Ridge regression
model.weather_correction_fit_level ADM0 National-level weather correction fit
model.regression_params.weather_correction_weight 0.3 Blend weight for national weather correction
model.regression_params.n_components 2 PCA components retained
model.regression_params.alpha 10.0 Ridge regularisation strength
model.regression_params.nan_policy raise Regressors reject NaNs; filling happens upstream
forecast.residual_mode hindcast_oos_per_init_date Conformal calibration against per-init-date OOS residuals
postprocess.bias_corrector.kind none No bias correction applied (top-95% universe is large enough)
delivery.model_public_name TFFS_V0 Public model label in delivery artefacts
delivery.ci_levels [0.5, 0.68, 0.80, 0.90, 0.95] Conformal interval levels exported

Builders

Builder Kind Notable parameters
yields YieldsBuilder filepath: data/nass/preprocessed_corn.parquet; edits: panel null-impute area (3-yr trailing median), deductive yield impute from prod/area; required_for_pred_parquet: false
stress StressBuilder filepath: data/stress/preprocessed_corn_stress.parquet; assembled from conus_adm2_corn.zarr (S3); gs DOYs 91–334; baseline 1980–2010; rename_map: stress_score → stress_score_lag1, z_GS_EDD → z_gs_edd_lag1; required_for_pred_parquet: true
weather WeatherBuilder filepath: s3://{env}-treefera-greenprint-data/weather/processed/indices/conus_adm2_corn.zarr; format zarr; required_for_pred_parquet: true
weather_stress WeatherBuilder filepath: s3://{env}-treefera-greenprint-data/weather/processed/stress/conus_adm2_corn_ytd_stress.zarr; required_for_pred_parquet: true
climo ClimoBuilder filepath: s3://{env}-treefera-greenprint-data/weather/processed/climo_indices/conus_adm2.zarr; required_for_pred_parquet: true

Weather and climo windows

Climo windows (corn_usa.yaml:commodity.climo_windows):

Name sdoy_start sdoy_end Calendar
gstd 1 null Full growing season to date (progressive)
apr_jul 1 122 Apr 1 – Jul 31 (fixed)

Weather windows (corn_usa.yaml:commodity.weather_windows):

Name sdoy_start sdoy_end Calendar
apr_may 1 61 Apr 1 – May 31
jun 62 91 Jun 1 – Jun 30
jul 92 122 Jul 1 – Jul 31

Feature columns

20 features drawn from QUBE EB_NATIONAL_WX_95_SBC_EXP_RAMP spec (corn_usa.yaml:commodity.feature_cols):

  • Z-score features: dry_days_zscore_{apr_jul,gstd}, dtr_zscore_{apr_jul,gstd}, edd_zscore_{apr_jul,gstd}, gdd_zscore_{apr_jul,gstd}, precip_zscore_{apr_jul,gstd}, tavg_zscore_{apr_jul,gstd}
  • Windowed accumulations: edd_{apr_may,jun,jul}, precip_{apr_may,jun,jul}
  • Stress lag features: stress_score_lag1, z_gs_edd_lag1

Corn is the only US commodity to include stress-lag features in its feature set.

Season-DOY weather ramp

The season_doy_weather_weight ramp (corn_usa.yaml:model.regression_params.season_doy_weather_weight) interpolates from 0.0 at sdoy 1 to 1.0 at sdoy 77, matching the QUBE eb_national_wx_95_sbc_exp_ramp schedule. This scales the national weather correction relative to the trend prediction as the season progresses.

Reference data

Field Value
kind wasde
name wasde
filepath data/wasde/wasde_corn_us_yield.csv
commodity corn
geography united_states
cutoff_month_day month 2, day 1
unit bu_acre

The WASDE series provides the in-season national benchmark. The cutoff of 1 Feb prevents look-ahead: WASDE releases from February onwards are excluded when evaluating any init date before that.

Forecast paths

Field Value
raw_obs_filepath s3://{env}-treefera-greenprint-data/weather/processed/areal_aggregation/conus_adm2.zarr
materialised_climo_filepath s3://{env}-treefera-greenprint-data/weather/processed/climatology/conus_adm2_baseline_1980_2025_w31_materialised.zarr

Both are S3 paths with {env} injected at runtime.

What makes this config distinctive

Corn is the only US commodity with a stress builder — it reads a lagged composite stress score (stress_score_lag1, z_gs_edd_lag1) derived from the prior season, adding a year-on-year carryover signal absent from the other US crops. It also explicitly sets detrend_fixed_slope_bu_ac: 1.6, pinning the national yield trend to a calibrated value rather than fitting it freely, which the cotton and soybean configs do not do. The feature_end_year: 2026 (one year ahead of other US configs at 2025) ensures the current operational season always has a row in pred.parquet.

Cross-references