Skip to content

Concept: Weather Correction

What it is

weather_correction_bu_ac is a delivery-CSV column that communicates how much of the model's current yield estimate is attributable to weather, expressed in bu/ac. It starts at zero for the first init_date of a season and grows as observed weather accumulates through the season.

The quantity is the detrended component of the model's point estimate, unit-converted to bu/ac:

weather_correction_bu_ac = sim_yield_kg_ha_detrended × scale

where scale = kg_ha_to_bu_acre(1.0, bushel_weight_lbs) (delivery/conversions.py:208).

The model decomposes yield additively:

sim_yield_kg_ha  =  trend(state, year)  +  sim_yield_kg_ha_detrended

sim_yield_kg_ha_detrended is exactly the weather attribution — the part of the prediction not explained by the secular trend. It is present in walk_forward_preds.parquet at all ADM levels and must be included in the area-weighted aggregation value_cols before the delivery alias is applied.

Where it lives in the code

The delivery expression is at delivery/conversions.py:334:

(pl.col("sim_yield_kg_ha_detrended") * scale).alias("weather_correction_bu_ac"),

sim_yield_kg_ha_detrended is included in value_cols at delivery/conversions.py:219:

value_cols = ["sim_yield_kg_ha", "sim_yield_kg_ha_detrended"]

This inclusion is load-bearing: omitting it from value_cols means the column is not passed to the area-weighted aggregator and arrives at delivery as NaN. This was the root cause of the PR #331 bug — always null from initial shipping until the fix.

Key invariants

  • weather_correction_bu_ac is ALWAYS populated. The pydantic field on DeliveryRow was tightened from float | None = None to float in PR #331, turning future regressions into validation errors at schema boundary.
  • At the first init_date of each season weather_correction_bu_ac = 0.0 exactly. This is a structural identity: no weather has accumulated, so sim_yield_kg_ha_detrended = 0. PR #331 proof: first_vintage_max = 0.000000 across all 225 ADM0 rows.
  • sim_yield_kg_ha_detrended is produced by the PREDICT stage as the raw detrended output of the four-step inverse pipeline (run_predict.py:255–279) and carried through to walk_forward_preds.parquet.
  • The column is present at ADM0 / ADM1 / ADM2. Only ADM2 skips area-weighted aggregation and uses the county-level value directly.

Why structural beats vintage-delta

Before PR #331, a dead function add_weather_columns in transforms.py computed:

weather_correction = mean − first_init_date_mean(year)   # vintage delta

This is wrong for a single-vintage forecast: if only one init_date has been issued, the delta is trivially zero. PR #331 deleted the dead code (−36 lines) and replaced it with the structural identity.

From the PR #331 body:

Property Vintage-delta (mean − first_init) Structural (detrended × scale)
Multi-vintage hindcast Meaningful Numerically identical
Single-vintage forecast Trivially 0 Meaningful
Needs cross-row plumbing Yes (nat_sims) No
Truthful to model Approximate Exact identity

PR #331 proof on real data (wheat ADM0, 2022 season)

init_date    year  mean    weather_correction_bu_ac  lower_90  upper_90
2021-10-01   2022  49.519   0.000    45.019  54.018
2022-04-08   2022  48.386  -1.133    43.886  52.885
2022-08-01   2022  46.571  -2.948    42.072  51.070

Bug-fix verification (all ADM levels): wx_nulls = 0 everywhere; lo90_nulls = up90_nulls = 0.

P90 bands (also fixed in PR #331)

Wheat, cotton, and soybean configs were missing 0.90 from ci_levels. The fix adds 0.90 to three YAMLs. The CI band machinery is fully dynamic: add a level to ci_levels and the lower_{pct} / upper_{pct} columns appear automatically; no code changes required.

How it interacts with the pipeline

sim_yield_kg_ha_detrended is written by the PREDICT stage (step 2 of the four-step inverse pipeline at run_predict.py:255–279). At the DELIVER stage, walk_forward_preds_to_delivery_rows in delivery/conversions.py includes it in value_cols and aliases it to weather_correction_bu_ac at conversions.py:334.

See the delivery pipeline for the full DELIVER stage walkthrough.

Pitfalls and historical bugs

  • Always-null bug (PR #331, 2026-04-27): sim_yield_kg_ha_detrended was not in value_cols. The dead add_weather_columns was never called. Every row was null.
  • DeliveryRow permissive type: float | None = None allowed nulls to pass schema validation silently. Tightened to float in PR #331.
  • add_weather_columns was dead code: Any attempt to revive the vintage-delta approach would require threading nat_sims cross-row state through the pipeline.

PRs and commits

PR Relevance
PR-331 Fixed always-null weather_correction_bu_ac; deleted dead vintage-delta code; added P90 bands to wheat/cotton/soybean; tightened DeliveryRow schema

Open questions

  • weather_correction_bu_ac is expressed in bu/ac only. Whether the raw sim_yield_kg_ha_detrended (kg/ha) should be exposed for non-US consumers is undocumented.
  • ADM2 variance (−40 to +22 bu/ac) vs ADM0 variance (−3 to +2 bu/ac) is expected (spatial heterogeneity) but not documented in the delivery schema.