Soybeans USA Experiment Config¶
Top-level fields¶
| Field | Value | Meaning |
|---|---|---|
random_seed |
42 | Global RNG seed |
mlflow_tracking_uri |
sqlite:///mlruns.db |
Local SQLite MLflow store |
experiment_name |
soybean_yield_prediction |
MLflow experiment name |
feature_start_year |
1980 | Earliest feature year |
feature_end_year |
2025 | Latest feature year |
check_data_exists |
[] |
No extra preflight checks |
commodity.commodity |
soybeans |
Internal commodity key |
commodity.country_code |
USA |
ISO-3 code |
commodity.season_start |
month 5, day 1 | Season epoch: 1 May (one month later than corn/cotton) |
commodity.season_start_year_offset |
0 | Same-year crop |
commodity.harvest_season_doy |
153 | Season-DOY of harvest (1 Oct from May-1 epoch = sdoy 153) |
commodity.bushel_weight_lbs |
60.0 | Soybeans: 60 lb/bushel |
commodity.delivery_unit |
bu_acre |
Standard bushel delivery |
commodity.yield_range |
[15.0, 70.0] |
Sanity bounds in bu/acre |
commodity.freeze_cap_sdoy |
184 | Weather cap (note: 184 days from May 1 reaches early November, slightly past harvest) |
experiment_protocol.cv_strategy |
expanding |
Walk-forward expanding CV |
experiment_protocol.test_years |
2020–2024 | Hold-out years |
experiment_protocol.production_cumulative_threshold |
0.90 | Top-90% of counties by production (lower than corn's 0.95) |
experiment_protocol.production_recent_years |
5 | Years for production ranking |
model.detrend |
partial_pooling |
Partial-pooling detrender |
model.detrend_params |
{} |
No fixed slope |
model.regression |
pca_ridge |
PCA(2) → Ridge |
model.weather_correction_fit_level |
ADM0 |
National-level weather correction |
model.regression_params.n_components |
2 | PCA components |
model.regression_params.alpha |
10.0 | Ridge regularisation |
model.regression_params.nan_policy |
raise |
Regressors reject NaNs |
forecast.residual_mode |
hindcast_oos_per_init_date |
Conformal calibration mode |
postprocess.bias_corrector.kind |
none |
No bias correction |
delivery.model_public_name |
TFFS_V0 |
Public model label |
delivery.ci_levels |
[0.5, 0.68, 0.80, 0.90, 0.95] |
Conformal interval levels |
Builders¶
| Builder | Kind | Notable parameters |
|---|---|---|
yields |
YieldsBuilder |
filepath: data/nass/soybeans.parquet; edits: panel null-impute area (3-yr trailing median), deductive yield impute; required_for_pred_parquet: false |
weather |
WeatherBuilder |
filepath: data/weather/indices/conus_adm2_conus_soy.zarr (local path); required_for_pred_parquet: true |
weather_stress |
WeatherBuilder |
filepath: s3://{env}-treefera-greenprint-data/weather/processed/stress/conus_adm2_conus_soy_ytd_stress.zarr; required_for_pred_parquet: true |
climo |
ClimoBuilder |
filepath: data/weather/climo_indices/conus_adm2.zarr (local path); required_for_pred_parquet: true |
Soybeans has no stress parquet builder (the legacy NASS-stress assemble path used by corn/wheat is not wired for soybeans); the weather_stress builder reads directly from the soy-specific YTD stress zarr — same shape as cotton.
Weather and climo windows¶
Climo windows (soybeans_usa.yaml:commodity.climo_windows):
| Name | sdoy_start | sdoy_end | Calendar |
|---|---|---|---|
gstd |
1 | null | Full growing season to date |
apr_jul |
1 | 122 | May 1 + 122 d = Aug 31 (fixed early-season reference) |
Note: the climo window is named apr_jul but is anchored to the May-1 epoch, so it spans May–August rather than April–July.
Weather windows (soybeans_usa.yaml:commodity.weather_windows):
| Name | sdoy_start | sdoy_end | Calendar | Phase |
|---|---|---|---|---|
apr_may |
1 | 31 | May 1 – May 31 | Planting establishment |
jun |
32 | 61 | Jun 1 – Jun 30 | Vegetative growth |
jul |
62 | 92 | Jul 1 – Jul 31 | Flowering |
aug_sep |
93 | 153 | Aug 1 – Sep 30 | Pod fill |
The apr_may window is only 31 days (May only), shorter than corn's 61-day April–May window, reflecting the later soybean planting date.
Feature columns¶
20 features from QUBE SOY_EB_NATIONAL_WX spec (soybeans_usa.yaml:commodity.feature_cols). No stress-lag features. Includes edd_aug_sep and precip_aug_sep for the pod-fill phase (same as cotton), reflecting the phenological importance of late-season heat and moisture for soybean yields.
Season-DOY weather ramp¶
The season_doy_weather_weight ramp is a simple two-point step (soybeans_usa.yaml:model.regression_params.season_doy_weather_weight):
This applies full weather correction weight from sdoy 2 onwards — the correction is not gradually ramped as in corn/wheat but switches on immediately. This is the same flat-ramp pattern as Brazil soybean.
Reference data¶
| Field | Value |
|---|---|
kind |
wasde |
name |
wasde |
filepath |
data/wasde/wasde_soybeans_us_yield.csv |
commodity |
soybeans |
geography |
united_states |
cutoff_month_day |
month 2, day 1 |
unit |
bu_acre |
Forecast paths¶
| Field | Value |
|---|---|
raw_obs_filepath |
/data/processing/weather/processed/areal_aggregation/conus_adm2.zarr |
materialised_climo_filepath |
/data/processing/weather/processed/climatology/conus_adm2_baseline_1980_2025_w31_materialised.zarr |
Soybeans uses absolute local paths rather than S3 URI templates for forecast zarrs (soybeans_usa.yaml:forecast). This is a notable difference from corn and wheat, which use s3://{env}-... templates.
What makes this config distinctive¶
Soybeans uses a May-1 season epoch (one month later than corn and cotton), which shifts all season-DOY numbers by approximately 30 days. The production_cumulative_threshold of 0.90 is lower than corn's 0.95, retaining a broader county universe. The weather ramp (season_doy_weather_weight: {1: 0.0, 2: 1.0}) applies full correction immediately rather than gradually, and the forecast paths are absolute local filesystem paths rather than S3 templates — indicating this config was calibrated for a specific EC2 dev environment.