Skip to content

Soybeans USA Experiment Config

Top-level fields

Field Value Meaning
random_seed 42 Global RNG seed
mlflow_tracking_uri sqlite:///mlruns.db Local SQLite MLflow store
experiment_name soybean_yield_prediction MLflow experiment name
feature_start_year 1980 Earliest feature year
feature_end_year 2025 Latest feature year
check_data_exists [] No extra preflight checks
commodity.commodity soybeans Internal commodity key
commodity.country_code USA ISO-3 code
commodity.season_start month 5, day 1 Season epoch: 1 May (one month later than corn/cotton)
commodity.season_start_year_offset 0 Same-year crop
commodity.harvest_season_doy 153 Season-DOY of harvest (1 Oct from May-1 epoch = sdoy 153)
commodity.bushel_weight_lbs 60.0 Soybeans: 60 lb/bushel
commodity.delivery_unit bu_acre Standard bushel delivery
commodity.yield_range [15.0, 70.0] Sanity bounds in bu/acre
commodity.freeze_cap_sdoy 184 Weather cap (note: 184 days from May 1 reaches early November, slightly past harvest)
experiment_protocol.cv_strategy expanding Walk-forward expanding CV
experiment_protocol.test_years 2020–2024 Hold-out years
experiment_protocol.production_cumulative_threshold 0.90 Top-90% of counties by production (lower than corn's 0.95)
experiment_protocol.production_recent_years 5 Years for production ranking
model.detrend partial_pooling Partial-pooling detrender
model.detrend_params {} No fixed slope
model.regression pca_ridge PCA(2) → Ridge
model.weather_correction_fit_level ADM0 National-level weather correction
model.regression_params.n_components 2 PCA components
model.regression_params.alpha 10.0 Ridge regularisation
model.regression_params.nan_policy raise Regressors reject NaNs
forecast.residual_mode hindcast_oos_per_init_date Conformal calibration mode
postprocess.bias_corrector.kind none No bias correction
delivery.model_public_name TFFS_V0 Public model label
delivery.ci_levels [0.5, 0.68, 0.80, 0.90, 0.95] Conformal interval levels

Builders

Builder Kind Notable parameters
yields YieldsBuilder filepath: data/nass/soybeans.parquet; edits: panel null-impute area (3-yr trailing median), deductive yield impute; required_for_pred_parquet: false
weather WeatherBuilder filepath: data/weather/indices/conus_adm2_conus_soy.zarr (local path); required_for_pred_parquet: true
weather_stress WeatherBuilder filepath: s3://{env}-treefera-greenprint-data/weather/processed/stress/conus_adm2_conus_soy_ytd_stress.zarr; required_for_pred_parquet: true
climo ClimoBuilder filepath: data/weather/climo_indices/conus_adm2.zarr (local path); required_for_pred_parquet: true

Soybeans has no stress parquet builder (the legacy NASS-stress assemble path used by corn/wheat is not wired for soybeans); the weather_stress builder reads directly from the soy-specific YTD stress zarr — same shape as cotton.

Weather and climo windows

Climo windows (soybeans_usa.yaml:commodity.climo_windows):

Name sdoy_start sdoy_end Calendar
gstd 1 null Full growing season to date
apr_jul 1 122 May 1 + 122 d = Aug 31 (fixed early-season reference)

Note: the climo window is named apr_jul but is anchored to the May-1 epoch, so it spans May–August rather than April–July.

Weather windows (soybeans_usa.yaml:commodity.weather_windows):

Name sdoy_start sdoy_end Calendar Phase
apr_may 1 31 May 1 – May 31 Planting establishment
jun 32 61 Jun 1 – Jun 30 Vegetative growth
jul 62 92 Jul 1 – Jul 31 Flowering
aug_sep 93 153 Aug 1 – Sep 30 Pod fill

The apr_may window is only 31 days (May only), shorter than corn's 61-day April–May window, reflecting the later soybean planting date.

Feature columns

20 features from QUBE SOY_EB_NATIONAL_WX spec (soybeans_usa.yaml:commodity.feature_cols). No stress-lag features. Includes edd_aug_sep and precip_aug_sep for the pod-fill phase (same as cotton), reflecting the phenological importance of late-season heat and moisture for soybean yields.

Season-DOY weather ramp

The season_doy_weather_weight ramp is a simple two-point step (soybeans_usa.yaml:model.regression_params.season_doy_weather_weight):

1: 0.0
2: 1.0

This applies full weather correction weight from sdoy 2 onwards — the correction is not gradually ramped as in corn/wheat but switches on immediately. This is the same flat-ramp pattern as Brazil soybean.

Reference data

Field Value
kind wasde
name wasde
filepath data/wasde/wasde_soybeans_us_yield.csv
commodity soybeans
geography united_states
cutoff_month_day month 2, day 1
unit bu_acre

Forecast paths

Field Value
raw_obs_filepath /data/processing/weather/processed/areal_aggregation/conus_adm2.zarr
materialised_climo_filepath /data/processing/weather/processed/climatology/conus_adm2_baseline_1980_2025_w31_materialised.zarr

Soybeans uses absolute local paths rather than S3 URI templates for forecast zarrs (soybeans_usa.yaml:forecast). This is a notable difference from corn and wheat, which use s3://{env}-... templates.

What makes this config distinctive

Soybeans uses a May-1 season epoch (one month later than corn and cotton), which shifts all season-DOY numbers by approximately 30 days. The production_cumulative_threshold of 0.90 is lower than corn's 0.95, retaining a broader county universe. The weather ramp (season_doy_weather_weight: {1: 0.0, 2: 1.0}) applies full correction immediately rather than gradually, and the forecast paths are absolute local filesystem paths rather than S3 templates — indicating this config was calibrated for a specific EC2 dev environment.

Cross-references