Corn USA Experiment Config¶
Top-level fields¶
| Field | Value | Meaning |
|---|---|---|
random_seed |
42 | Global RNG seed for reproducibility |
mlflow_tracking_uri |
sqlite:///mlruns.db |
Local SQLite MLflow store relative to data_root |
experiment_name |
corn_yield_prediction |
MLflow experiment name |
feature_start_year |
1980 | Earliest year for which feature rows are built |
feature_end_year |
2026 | Latest year (inclusive of current operational year) |
check_data_exists |
[] |
No extra preflight checks; reference_data paths are checked automatically via ResolvablePath walker |
commodity.commodity |
corn |
Internal commodity key |
commodity.country_code |
USA |
ISO-3 code; threaded into every geo_identifier |
commodity.season_start |
month 4, day 1 | Season epoch: 1 Apr |
commodity.season_start_year_offset |
0 | Same-year crop; season and harvest year are identical |
commodity.harvest_season_doy |
184 | Season-DOY of harvest (1 Oct from Apr-1 epoch) |
commodity.bushel_weight_lbs |
56.0 | Corn: 56 lb/bushel for kg/ha ↔ bu/acre conversion |
commodity.delivery_unit |
bu_acre |
Unit used in delivery CSVs |
commodity.yield_range |
[50.0, 250.0] |
Sanity bounds in bu/acre at delivery |
commodity.freeze_cap_sdoy |
184 | Weather features capped at 1 Oct (season DOY 184) |
experiment_protocol.cv_strategy |
expanding |
Walk-forward expanding-window CV |
experiment_protocol.test_years |
2020–2024 | Hold-out years for OOS evaluation |
experiment_protocol.production_cumulative_threshold |
0.95 | Top-95% counties by recent production kept |
experiment_protocol.production_recent_years |
5 | Years used to rank counties |
model.detrend |
partial_pooling |
Partial-pooling detrender |
model.detrend_params.detrend_fixed_slope_bu_ac |
1.6 | Forced trend slope in bu/acre/yr |
model.regression |
pca_ridge |
PCA(2) → Ridge regression |
model.weather_correction_fit_level |
ADM0 |
National-level weather correction fit |
model.regression_params.weather_correction_weight |
0.3 | Blend weight for national weather correction |
model.regression_params.n_components |
2 | PCA components retained |
model.regression_params.alpha |
10.0 | Ridge regularisation strength |
model.regression_params.nan_policy |
raise |
Regressors reject NaNs; filling happens upstream |
forecast.residual_mode |
hindcast_oos_per_init_date |
Conformal calibration against per-init-date OOS residuals |
postprocess.bias_corrector.kind |
none |
No bias correction applied (top-95% universe is large enough) |
delivery.model_public_name |
TFFS_V0 |
Public model label in delivery artefacts |
delivery.ci_levels |
[0.5, 0.68, 0.80, 0.90, 0.95] |
Conformal interval levels exported |
Builders¶
| Builder | Kind | Notable parameters |
|---|---|---|
yields |
YieldsBuilder |
filepath: data/nass/preprocessed_corn.parquet; edits: panel null-impute area (3-yr trailing median), deductive yield impute from prod/area; required_for_pred_parquet: false |
stress |
StressBuilder |
filepath: data/stress/preprocessed_corn_stress.parquet; assembled from conus_adm2_corn.zarr (S3); gs DOYs 91–334; baseline 1980–2010; rename_map: stress_score → stress_score_lag1, z_GS_EDD → z_gs_edd_lag1; required_for_pred_parquet: true |
weather |
WeatherBuilder |
filepath: s3://{env}-treefera-greenprint-data/weather/processed/indices/conus_adm2_corn.zarr; format zarr; required_for_pred_parquet: true |
weather_stress |
WeatherBuilder |
filepath: s3://{env}-treefera-greenprint-data/weather/processed/stress/conus_adm2_corn_ytd_stress.zarr; required_for_pred_parquet: true |
climo |
ClimoBuilder |
filepath: s3://{env}-treefera-greenprint-data/weather/processed/climo_indices/conus_adm2.zarr; required_for_pred_parquet: true |
Weather and climo windows¶
Climo windows (corn_usa.yaml:commodity.climo_windows):
| Name | sdoy_start | sdoy_end | Calendar |
|---|---|---|---|
gstd |
1 | null | Full growing season to date (progressive) |
apr_jul |
1 | 122 | Apr 1 – Jul 31 (fixed) |
Weather windows (corn_usa.yaml:commodity.weather_windows):
| Name | sdoy_start | sdoy_end | Calendar |
|---|---|---|---|
apr_may |
1 | 61 | Apr 1 – May 31 |
jun |
62 | 91 | Jun 1 – Jun 30 |
jul |
92 | 122 | Jul 1 – Jul 31 |
Feature columns¶
20 features drawn from QUBE EB_NATIONAL_WX_95_SBC_EXP_RAMP spec (corn_usa.yaml:commodity.feature_cols):
- Z-score features:
dry_days_zscore_{apr_jul,gstd},dtr_zscore_{apr_jul,gstd},edd_zscore_{apr_jul,gstd},gdd_zscore_{apr_jul,gstd},precip_zscore_{apr_jul,gstd},tavg_zscore_{apr_jul,gstd} - Windowed accumulations:
edd_{apr_may,jun,jul},precip_{apr_may,jun,jul} - Stress lag features:
stress_score_lag1,z_gs_edd_lag1
Corn is the only US commodity to include stress-lag features in its feature set.
Season-DOY weather ramp¶
The season_doy_weather_weight ramp (corn_usa.yaml:model.regression_params.season_doy_weather_weight) interpolates from 0.0 at sdoy 1 to 1.0 at sdoy 77, matching the QUBE eb_national_wx_95_sbc_exp_ramp schedule. This scales the national weather correction relative to the trend prediction as the season progresses.
Reference data¶
| Field | Value |
|---|---|
kind |
wasde |
name |
wasde |
filepath |
data/wasde/wasde_corn_us_yield.csv |
commodity |
corn |
geography |
united_states |
cutoff_month_day |
month 2, day 1 |
unit |
bu_acre |
The WASDE series provides the in-season national benchmark. The cutoff of 1 Feb prevents look-ahead: WASDE releases from February onwards are excluded when evaluating any init date before that.
Forecast paths¶
| Field | Value |
|---|---|
raw_obs_filepath |
s3://{env}-treefera-greenprint-data/weather/processed/areal_aggregation/conus_adm2.zarr |
materialised_climo_filepath |
s3://{env}-treefera-greenprint-data/weather/processed/climatology/conus_adm2_baseline_1980_2025_w31_materialised.zarr |
Both are S3 paths with {env} injected at runtime.
What makes this config distinctive¶
Corn is the only US commodity with a stress builder — it reads a lagged composite stress score (stress_score_lag1, z_gs_edd_lag1) derived from the prior season, adding a year-on-year carryover signal absent from the other US crops. It also explicitly sets detrend_fixed_slope_bu_ac: 1.6, pinning the national yield trend to a calibrated value rather than fitting it freely, which the cotton and soybean configs do not do. The feature_end_year: 2026 (one year ahead of other US configs at 2025) ensures the current operational season always has a row in pred.parquet.