Skip to content

PR #331 — Populate weather_correction_bu_ac and add P90 bands

At a glance

  • Author: ai-tommytf
  • Merged: 2026-04-27
  • Branch: tl/fix-delivery-rows-weather-and-P90
  • Net effect: Six files, +6/-41 lines. Fixes two bugs in the deliver stage: (1) weather_correction_bu_ac was always null — fixed by including sim_yield_kg_ha_detrended in the value-column aggregation and aliasing it at delivery time; (2) lower_90/upper_90 were absent from wheat/cotton/soybean configs — fixed by adding 0.90 to their ci_levels lists.
  • Why this matters: weather_correction_bu_ac is the column clients use to understand how much of a prediction is due to weather vs trend; every row had been empty since the system shipped.

PR body (faithful extract)

The two bugs

Bug 1 — weather_correction_bu_ac was always null. The column existed in the schema but had never been populated. The delivery pipeline had dead code (add_weather_columns in transforms.py) that was never called.

Bug 2 — Wheat / cotton / soybean missing 90% confidence band. Corn had 0.90 in its ci_levels; the other three commodities had [0.5, 0.68, 0.80, 0.95] — the 0.90 entry was simply forgotten.

The structural identity (why Bug 1 was easy to fix)

The model predicts as:

sim_yield_kg_ha  =  trend(state, year)  +  sim_yield_kg_ha_detrended
                    └─────────────────┘   └──────────────────────────┘
                     slow drift due to       year-to-year weather noise
                     seeds/fertiliser/tech

sim_yield_kg_ha_detrended is exactly the weather attribution. The fix is one new line in convert.py:

# value_cols now includes the detrended series
value_cols = ["sim_yield_kg_ha", "sim_yield_kg_ha_detrended"]

# At alias time:
(pl.col("sim_yield_kg_ha_detrended") * scale).alias("weather_correction_bu_ac"),

The old add_weather_columns function computed weather_correction = mean − first_init_date_mean(year) (a vintage delta) — a wrong approximation that required cross-row plumbing and was always zero for single-vintage forecasts. It was dead code and is now deleted (−36 lines from transforms.py).

Why structural is better than vintage-delta

Property Vintage-delta (mean − first_init) Structural (detrended × scale)
Multi-vintage hindcast Meaningful Numerically identical
Single-vintage forecast Trivially 0 Meaningful
Needs cross-row plumbing Yes (nat_sims) No
Truthful to model Approximate Exact identity

Schema tightened

# before
weather_correction_bu_ac: float | None = None
# after
weather_correction_bu_ac: float   # always populated; pydantic validates

Config fix — three YAML files

# before (wheat/cotton/soybean)
ci_levels: [0.5, 0.68, 0.80, 0.95]
# after
ci_levels: [0.5, 0.68, 0.80, 0.90, 0.95]

Proof on real data — wheat ADM0, 2022 season

init_date   year  mean   weather_correction_bu_ac  lower_90  upper_90
2021-10-01  2022 49.519  0.000    45.019  54.018
2021-10-08  2022 49.457 -0.062    44.958  53.956
...
2022-04-08  2022 48.386 -1.133    43.886  52.885
...
2022-08-01  2022 46.571 -2.948    42.072  51.070

The trajectory starts at zero (no weather data) and walks to −2.948 bu/ac at harvest — the 2022 US wheat season ran below trend. Exactly the signal the column was designed to communicate.

Bug-fix verification across all ADM levels

level      rows  wx_nulls  lo90_nulls  up90_nulls    wx_min    wx_max  first_vintage_max
 ADM0       225         0           0           0    -2.948     1.854           0.000000
 ADM1     10750         0           0           0   -20.428    14.616           0.000000
 ADM2    414250         0           0           0   -40.665    22.122           0.000000
  • wx_nulls = 0 everywhere → Bug 1 fixed.
  • lo90_nulls = up90_nulls = 0 everywhere → Bug 2 fixed.
  • first_vintage_max = 0.000000 exactly → structural identity holds.

Files / lines touched

Additions Deletions File
+0 -36 market_insights_models/src/commodity_hindcast/src/steps/delivery/transforms.py
+3 -1 market_insights_models/src/commodity_hindcast/src/steps/delivery/convert.py
+1 -1 market_insights_models/src/commodity_hindcast/configs/cotton_experiment.yaml
+1 -1 market_insights_models/src/commodity_hindcast/configs/soybean_experiment.yaml
+1 -1 market_insights_models/src/commodity_hindcast/configs/wheat_experiment.yaml
+1 -1 market_insights_models/src/commodity_hindcast/src/steps/delivery/schemas.py

Cross-references

  • Related entity pages: DeliveryRow
  • Related concept pages: unit conventions, delivery transforms
  • Related PR: PR-340 (dashboard work that built on the nass_actual_area_weighted_all column also populated in this era)

Lessons captured

  • weather_correction_bu_ac is sim_yield_kg_ha_detrended * scale — the exact detrended component of the model, not a vintage delta. It starts at zero each season and grows as observed weather accumulates.
  • sim_yield_kg_ha_detrended must be included in the value_cols list passed to the area-weighted aggregator in convert.py; omitting it silently leaves the column null.
  • add_weather_columns in transforms.py was dead code (never called, never computed the correct value); it was deleted in this PR.
  • weather_correction_bu_ac: float | None = None on DeliveryRow was a permissive type that let null columns go undetected; tightening to float turns future regressions into pydantic validation errors.
  • ci_levels in each commodity YAML is the complete list of confidence bands produced; if 0.90 is absent, lower_90 / upper_90 will be absent from the delivery CSV.
  • The CI band machinery is dynamic: add a level to ci_levels and the columns appear automatically; remove it and they disappear. No code changes required.