Concept: Weather Correction¶
What it is¶
weather_correction_bu_ac is a delivery-CSV column that communicates how much of
the model's current yield estimate is attributable to weather, expressed in bu/ac.
It starts at zero for the first init_date of a season and grows as observed weather
accumulates through the season.
The quantity is the detrended component of the model's point estimate, unit-converted to bu/ac:
where scale = kg_ha_to_bu_acre(1.0, bushel_weight_lbs) (delivery/conversions.py:208).
The model decomposes yield additively:
sim_yield_kg_ha_detrended is exactly the weather attribution — the part of the
prediction not explained by the secular trend. It is present in
walk_forward_preds.parquet at all ADM levels and must be included in the
area-weighted aggregation value_cols before the delivery alias is applied.
Where it lives in the code¶
The delivery expression is at delivery/conversions.py:334:
sim_yield_kg_ha_detrended is included in value_cols at delivery/conversions.py:219:
This inclusion is load-bearing: omitting it from value_cols means the column is
not passed to the area-weighted aggregator and arrives at delivery as NaN. This was
the root cause of the PR #331 bug — always null from initial shipping until the fix.
Key invariants¶
weather_correction_bu_acis ALWAYS populated. The pydantic field onDeliveryRowwas tightened fromfloat | None = Nonetofloatin PR #331, turning future regressions into validation errors at schema boundary.- At the first init_date of each season
weather_correction_bu_ac = 0.0exactly. This is a structural identity: no weather has accumulated, sosim_yield_kg_ha_detrended = 0. PR #331 proof:first_vintage_max = 0.000000across all 225 ADM0 rows. sim_yield_kg_ha_detrendedis produced by the PREDICT stage as the raw detrended output of the four-step inverse pipeline (run_predict.py:255–279) and carried through towalk_forward_preds.parquet.- The column is present at ADM0 / ADM1 / ADM2. Only ADM2 skips area-weighted aggregation and uses the county-level value directly.
Why structural beats vintage-delta¶
Before PR #331, a dead function add_weather_columns in transforms.py computed:
This is wrong for a single-vintage forecast: if only one init_date has been issued, the delta is trivially zero. PR #331 deleted the dead code (−36 lines) and replaced it with the structural identity.
From the PR #331 body:
| Property | Vintage-delta (mean − first_init) |
Structural (detrended × scale) |
|---|---|---|
| Multi-vintage hindcast | Meaningful | Numerically identical |
| Single-vintage forecast | Trivially 0 |
Meaningful |
| Needs cross-row plumbing | Yes (nat_sims) |
No |
| Truthful to model | Approximate | Exact identity |
PR #331 proof on real data (wheat ADM0, 2022 season)¶
init_date year mean weather_correction_bu_ac lower_90 upper_90
2021-10-01 2022 49.519 0.000 45.019 54.018
2022-04-08 2022 48.386 -1.133 43.886 52.885
2022-08-01 2022 46.571 -2.948 42.072 51.070
Bug-fix verification (all ADM levels): wx_nulls = 0 everywhere; lo90_nulls = up90_nulls = 0.
P90 bands (also fixed in PR #331)¶
Wheat, cotton, and soybean configs were missing 0.90 from ci_levels. The fix
adds 0.90 to three YAMLs. The CI band machinery is fully dynamic: add a level
to ci_levels and the lower_{pct} / upper_{pct} columns appear automatically;
no code changes required.
How it interacts with the pipeline¶
sim_yield_kg_ha_detrended is written by the PREDICT stage (step 2 of the
four-step inverse pipeline at run_predict.py:255–279). At the DELIVER stage,
walk_forward_preds_to_delivery_rows in delivery/conversions.py includes it in
value_cols and aliases it to weather_correction_bu_ac at conversions.py:334.
See the delivery pipeline for the full DELIVER stage walkthrough.
Pitfalls and historical bugs¶
- Always-null bug (PR #331, 2026-04-27):
sim_yield_kg_ha_detrendedwas not invalue_cols. The deadadd_weather_columnswas never called. Every row was null. DeliveryRowpermissive type:float | None = Noneallowed nulls to pass schema validation silently. Tightened tofloatin PR #331.add_weather_columnswas dead code: Any attempt to revive the vintage-delta approach would require threadingnat_simscross-row state through the pipeline.
Related entities and concepts¶
HindcastSlice—walk_forward_preds.parquetschema carriessim_yield_kg_ha_detrendedForecastSlice— forecast delivery uses the same columnconformal_modes.md— P90 band fix shares the same PR
PRs and commits¶
| PR | Relevance |
|---|---|
| PR-331 | Fixed always-null weather_correction_bu_ac; deleted dead vintage-delta code; added P90 bands to wheat/cotton/soybean; tightened DeliveryRow schema |
Open questions¶
weather_correction_bu_acis expressed in bu/ac only. Whether the rawsim_yield_kg_ha_detrended(kg/ha) should be exposed for non-US consumers is undocumented.- ADM2 variance (−40 to +22 bu/ac) vs ADM0 variance (−3 to +2 bu/ac) is expected (spatial heterogeneity) but not documented in the delivery schema.