PR #331 — Populate weather_correction_bu_ac and add P90 bands¶
At a glance¶
- Author: ai-tommytf
- Merged: 2026-04-27
- Branch:
tl/fix-delivery-rows-weather-and-P90 - Net effect: Six files, +6/-41 lines. Fixes two bugs in the deliver stage: (1)
weather_correction_bu_acwas always null — fixed by includingsim_yield_kg_ha_detrendedin the value-column aggregation and aliasing it at delivery time; (2)lower_90/upper_90were absent from wheat/cotton/soybean configs — fixed by adding0.90to theirci_levelslists. - Why this matters:
weather_correction_bu_acis the column clients use to understand how much of a prediction is due to weather vs trend; every row had been empty since the system shipped.
PR body (faithful extract)¶
The two bugs¶
Bug 1 — weather_correction_bu_ac was always null.
The column existed in the schema but had never been populated. The delivery pipeline had dead code (add_weather_columns in transforms.py) that was never called.
Bug 2 — Wheat / cotton / soybean missing 90% confidence band.
Corn had 0.90 in its ci_levels; the other three commodities had [0.5, 0.68, 0.80, 0.95] — the 0.90 entry was simply forgotten.
The structural identity (why Bug 1 was easy to fix)¶
The model predicts as:
sim_yield_kg_ha = trend(state, year) + sim_yield_kg_ha_detrended
└─────────────────┘ └──────────────────────────┘
slow drift due to year-to-year weather noise
seeds/fertiliser/tech
sim_yield_kg_ha_detrended is exactly the weather attribution. The fix is one new line in convert.py:
# value_cols now includes the detrended series
value_cols = ["sim_yield_kg_ha", "sim_yield_kg_ha_detrended"]
# At alias time:
(pl.col("sim_yield_kg_ha_detrended") * scale).alias("weather_correction_bu_ac"),
The old add_weather_columns function computed weather_correction = mean − first_init_date_mean(year) (a vintage delta) — a wrong approximation that required cross-row plumbing and was always zero for single-vintage forecasts. It was dead code and is now deleted (−36 lines from transforms.py).
Why structural is better than vintage-delta¶
| Property | Vintage-delta (mean − first_init) |
Structural (detrended × scale) |
|---|---|---|
| Multi-vintage hindcast | Meaningful | Numerically identical |
| Single-vintage forecast | Trivially 0 |
Meaningful |
| Needs cross-row plumbing | Yes (nat_sims) |
No |
| Truthful to model | Approximate | Exact identity |
Schema tightened¶
# before
weather_correction_bu_ac: float | None = None
# after
weather_correction_bu_ac: float # always populated; pydantic validates
Config fix — three YAML files¶
# before (wheat/cotton/soybean)
ci_levels: [0.5, 0.68, 0.80, 0.95]
# after
ci_levels: [0.5, 0.68, 0.80, 0.90, 0.95]
Proof on real data — wheat ADM0, 2022 season¶
init_date year mean weather_correction_bu_ac lower_90 upper_90
2021-10-01 2022 49.519 0.000 45.019 54.018
2021-10-08 2022 49.457 -0.062 44.958 53.956
...
2022-04-08 2022 48.386 -1.133 43.886 52.885
...
2022-08-01 2022 46.571 -2.948 42.072 51.070
The trajectory starts at zero (no weather data) and walks to −2.948 bu/ac at harvest — the 2022 US wheat season ran below trend. Exactly the signal the column was designed to communicate.
Bug-fix verification across all ADM levels¶
level rows wx_nulls lo90_nulls up90_nulls wx_min wx_max first_vintage_max
ADM0 225 0 0 0 -2.948 1.854 0.000000
ADM1 10750 0 0 0 -20.428 14.616 0.000000
ADM2 414250 0 0 0 -40.665 22.122 0.000000
wx_nulls = 0everywhere → Bug 1 fixed.lo90_nulls = up90_nulls = 0everywhere → Bug 2 fixed.first_vintage_max = 0.000000exactly → structural identity holds.
Files / lines touched¶
| Additions | Deletions | File |
|---|---|---|
| +0 | -36 | market_insights_models/src/commodity_hindcast/src/steps/delivery/transforms.py |
| +3 | -1 | market_insights_models/src/commodity_hindcast/src/steps/delivery/convert.py |
| +1 | -1 | market_insights_models/src/commodity_hindcast/configs/cotton_experiment.yaml |
| +1 | -1 | market_insights_models/src/commodity_hindcast/configs/soybean_experiment.yaml |
| +1 | -1 | market_insights_models/src/commodity_hindcast/configs/wheat_experiment.yaml |
| +1 | -1 | market_insights_models/src/commodity_hindcast/src/steps/delivery/schemas.py |
Cross-references¶
- Related entity pages: DeliveryRow
- Related concept pages: unit conventions, delivery transforms
- Related PR: PR-340 (dashboard work that built on the
nass_actual_area_weighted_allcolumn also populated in this era)
Lessons captured¶
weather_correction_bu_acissim_yield_kg_ha_detrended * scale— the exact detrended component of the model, not a vintage delta. It starts at zero each season and grows as observed weather accumulates.sim_yield_kg_ha_detrendedmust be included in thevalue_colslist passed to the area-weighted aggregator inconvert.py; omitting it silently leaves the column null.add_weather_columnsintransforms.pywas dead code (never called, never computed the correct value); it was deleted in this PR.weather_correction_bu_ac: float | None = NoneonDeliveryRowwas a permissive type that let null columns go undetected; tightening tofloatturns future regressions into pydantic validation errors.ci_levelsin each commodity YAML is the complete list of confidence bands produced; if0.90is absent,lower_90/upper_90will be absent from the delivery CSV.- The CI band machinery is dynamic: add a level to
ci_levelsand the columns appear automatically; remove it and they disappear. No code changes required.