Risk register — commodity_hindcast¶

Last updated: 2026-05-08 Total risks tracked: 23 (R1..R14 from the original register; R15..R23 merged in from risks_addendum.md on 2026-05-08; 7 refinements applied to existing rows; new "Fixed in flight" section added with 18 historical lessons)

Severity legend¶

Critical: would corrupt a production delivery silently OR block a release
High: likely to cause an incident / needs intervention this quarter
Medium: known footgun; works around exist
Low: minor / cosmetic

Risks¶

#	Title	Severity	Likelihood	Description	Mitigation	Owner	Sources
1	TMI selection bias correction wrong sign	Critical	High	2020 corn flips from QUBE +2.23 to TMI -8.31 bu/ac under an identical SBC formula — wrong sign with ~3.7x different magnitude (NOT the inherited "~17x smaller" figure, which came from a now-deleted `gen_report.py::convert_metrics_to_bu_acre` re-unitting `selection_bias_kg_ha` to bu/ac without renaming). The historical "weather features drop core counties via global dropna" mechanism is no longer dominant: `training_dropna_subset` at `config.py:812-822` now returns ONLY `[com.target_col, com.target_detrended_col]` and no longer splats `commodity.feature_cols`. Root-cause status is therefore REOPEN — dropna-on-target alone cannot explain a 287-vs-972 county collapse.	Two-track mitigation. (B) Quick unblock — make dropna column-aware per consuming estimator: largely landed at `config.py:812-822` (subset narrowed to target/detrended target only); add a regression test that pins `training_dropna_subset` returns exactly that pair so a future widening cannot re-introduce the bug. (A) Root-cause — fix the weather feature builder so EDD z-scores cover all 935 top-95% counties; re-investigate the 287-vs-972 collapse now that dropna-splat is ruled out (1980 component fixed at `features/builders/yields.py:185-199`; Wayne County NC climo issue [PLACEHOLDER: not located in current tree]). Live dropna call sites: `stages/run_fit.py:94` and `run/experiment_protocol.py:54`. Switch evaluation to the full prediction universe; the `included_geo_identifiers` contract is codified at DESIGN.md:114.	[PLACEHOLDER: owner needed]	`MEMORY.md/project_sbc_tmi_bug.md`, `config.py:812-822`, `stages/run_fit.py:94`, `run/experiment_protocol.py:54`, `features/builders/yields.py:185-199`, DESIGN.md:114, sessions A2-A5
2	DESIGN.md vs code drift on `postprocessed/national.parquet`	High	High	DESIGN.md clause specifies `postprocessed/{experiment_key}_national.parquet` but code writes `postprocessed/national.parquet` (no infix). Any consumer that reads DESIGN.md as a code contract will fail, and the wiki `pipelines/postprocess.md` documents the discrepancy.	Either update `DESIGN.md:73` to drop the `{experiment_key}_` infix, or update the writer to include it. Code path: `lib/results/results_slice.py:368`, `lib/results/run_result.py:155`.	[PLACEHOLDER: owner needed]	`DESIGN.md:73`, `lib/results/results_slice.py:368`, `lib/results/run_result.py:155`, `LINT_REPORT.md§<section>`
3	S3 path anchoring on local-only sinks	High	High	`INPUT_DATA_DIR` is `s3://...` in QA; code that calls `.as_posix()`, `pathlib.Path(AnyPath(...))`, or anchors sqlite/lockfiles under `data_root` crashes (`AttributeError: 'S3Path' object has no attribute 'as_posix'`) or produces `sqlite:///s3://...`. Recurring class — last incident `f66f4ac9` (2026-04-29) broke wheat/corn/soy/cotton in QA.	Branch on `isinstance(x, CloudPath)` for any URI sink; use `str(path)` not `.as_posix()`; route local-only sinks (sqlite, FileLock) to `/tmp/<run-id>/...`; use `AnyPathParam` (PR-345) for click args; add parametrised tests covering both Path and `S3Path` anchors.	[PLACEHOLDER: owner needed]	`MEMORY.md/feedback_s3_path_anchoring.md`, `wiki/commodity_hindcast/sources/prs/PR-345.md`, commit `f66f4ac9`, fix `13117727`
4	MLflow SQLite locking on parallel same-commodity runs	High	Medium	Running two pipelines for the same commodity in parallel hits SQLAlchemy `OperationalError` because two writers contend for the same MLflow SQLite DB. Blocks concurrent backfills.	Run same-commodity pipelines sequentially; longer term, move MLflow tracking to a server backend (Postgres/MySQL) or per-run isolated DB files.	[PLACEHOLDER: owner needed]	`MEMORY.md` Known Issues section, `mlruns.db` artefacts in repo root
5	`_OOS_MODES` frozenset hand-maintained vs `ResidualMode` Literal	High	Low	`stages/run_forecast.py:82` keeps a hand-maintained frozenset of OOS modes used by `validate_residual_mode`. Any new value added to `ResidualMode` in `models/meta_models/types.py` that is not also added to `_OOS_MODES` would silently allow a non-OOS mode through the validation gate when no conformal sidecar exists.	Replace frozenset with `set(get_args(ResidualMode)) - {"in_sample_pooled"}`; add a unit test that pins the derivation.	[PLACEHOLDER: owner needed]	`LINT_REPORT.md§<section>`, `stages/run_forecast.py:82`, `models/meta_models/types.py`, `wiki/commodity_hindcast/sources/prs/PR-372.md`
6	Wheat sub-types listed in config but not produced	High	Medium	Wheat preprocessor only emits crop_type `WHEAT`. The 2026-04-17 unified-builder commit `c7041b68` ("refactor(commodity_hindcast/features): unify feature builders on one season-DOY path") consolidated all wheat configs onto a single season-DOY path; `WINTER_WHEAT` survives only as a NASS `crop_type` filter at `configs/wheat_usa.yaml:214`, while `SPRING_DURUM_WHEAT` and `SPRING_EXCL_DURUM_WHEAT` are not represented in current configs. Sub-type reintroduction would now require its own `season_start_doy`/`freeze_cap_sdoy`/`season_length` triple per sub-type and is untested under the unified path.	Either implement the sub-type split in the wheat preprocessor (with the per-sub-type season-DOY triples) or remove the unsupported entries from configs and document `WHEAT` as the only produced type. Treat sub-type reintroduction as orthogonal to unification.	[PLACEHOLDER: owner needed]	`MEMORY.md` Known Issues — Wheat sub-types, commit `c7041b68`, `configs/wheat_usa.yaml:214`, session A6
7	Sugar / non-US preprocessing missing geometries	High	Medium	Default `CROP_YIELD_GEOBOUNDARIES_FILE=/data/processing/yield_forecast/raw/boundaries/geometry.parquet` is US-only and has no IND/GHA/BRA rows; non-US runs need the 52 k-row `all_geoboundaries_processed.parquet`. Wrong env var ⇒ silently empty geometry joins.	Document the override; add a config-time check that asserts geography rows exist for the configured commodity; default Brazil/India/Ghana configs should set the path.	[PLACEHOLDER: owner needed]	`MEMORY.md` Known Issues — Sugar/non-US preprocessing
8	Two upward-import layering violations in `delivery/`	Medium	Medium	Two modules in `delivery/` import from the `stages/` layer rather than from the shared `lib/` layer, breaking the single-direction import DAG (DESIGN.md Clause 19). Risks reintroducing the cycle that PR-339 spent nine phases to break.	Move shared helpers (conformal apply, unit conversion) into `lib/`; have both `delivery/` and `stages/run_meta_models.py` import from there. Tracked in `BOUNDED_CONTEXTS.md`.	[PLACEHOLDER: owner needed]	`LINT_REPORT.md§<section>`, `wiki/commodity_hindcast/sources/prs/PR-339.md`, `wiki/commodity_hindcast/sources/prs/PR-353.md` open question §9
9	Long-range climo stub silently retrendless beyond zarr	Medium	Medium	`forecast_long_range_stub.py` (PR-369) fills missing z-score features for `season_year` beyond the materialised climo zarr's coverage with per-county trailing-3-year medians. Long-range forecasts collapse to trend-only output by design (`season_doy_weather_weight = 0`); this is correct only as long as the schedule is honoured. The filename is deliberately temporary — removal blocked on extending the climo zarr.	Extend the materialised climo zarr horizon; remove the stub when no caller imports from it. Until then, keep the three `WARNING` log lines and the docstring removal criteria visible. Use `materialise_for_forecast(...)` (centralised) rather than ad-hoc DOY collapse.	[PLACEHOLDER: owner needed]	`wiki/commodity_hindcast/sources/prs/PR-369.md`, `features/forecast_long_range_stub.py`, `MEMORY.md/feedback_centralised_climo.md`
10	Forecast walk-forward has no per-fold checkpoint	Medium	Low	Walk-forward CV restart re-does all earlier folds; an interrupted multi-year hindcast wastes hours of compute. Tracked as a §9 Open Question in DOMAIN_MODEL2.md.	Add per-fold completion sentinels under `run_dir/`; resume by skipping folds whose artefacts already exist and pass schema validation.	[PLACEHOLDER: owner needed]	`wiki/.../sources/prs/PR-353.md` §9 Open Questions, DOMAIN_MODEL2.md §9
11	Streamlit dashboard regressions invisible until manual launch	Medium	Medium	The `app/` package has no automated test that exercises `streamlit run` startup. PR-360's reference-data refactor silently broke dashboard imports until PR-363 patched three independent bugs by hand. The required `PYTHONPATH` + two env vars are also fragile — `streamlit run` bypasses uv's package discovery.	Add a smoke test that imports `app/app.py` under the same `PYTHONPATH` shape Streamlit uses; document the launch incantation in the runbook.	[PLACEHOLDER: owner needed]	`wiki/commodity_hindcast/sources/prs/PR-363.md`, `wiki/commodity_hindcast/sources/prs/PR-360.md`, `MEMORY.md/project_streamlit_app_launch.md`
12	QA RDS unreachable from dev EC2 forces local_data symlink	Medium	High	Fresh worktrees that run `dev_tools/run_model_local.py` time out fetching geometry from QA Postgres because the SG/route blocks the dev box. The script only hits the DB when `local_data/<model_id>/geometry.parquet` is missing — every worktree needs the symlink ritual.	Mirror the cached `geometry.parquet` files into a documented shared location; or open the SG route from dev EC2; or make the script fail fast with the symlink instruction.	[PLACEHOLDER: owner needed]	`MEMORY.md/project_local_data_symlink.md`
13	Wiki broken-link backlog (84 broken refs across 47 files)	Medium	High	Lint pass found 84 broken relative links and 6 high-value missing pages (notably `pipelines/hindcast.md`, referenced by 19 entity pages). New onboarding engineers will hit dead-ends across the entity catalogue.	Write `pipelines/hindcast.md` first (highest fan-in); fix PR-345/361/369/372 cross-ref paths; backfill `concepts/experiment_protocol.md` and `concepts/season_doy.md` (17 and 25 inbound refs respectively).	[PLACEHOLDER: owner needed]	`LINT_REPORT.md§Broken Links`, `LINT_REPORT.md§Recommended Next Pass`
14	`entities/EditRule.md` vs `entities/EditRuleConfig.md` naming ambiguity	Low	Low	PR-353.md references `EditRule.md` while only `EditRuleConfig.md` exists. May be one missing page (the rule operation, distinct from the rule config) or a stale name.	Decide whether `EditRule` (operation) is a separate entity from `EditRuleConfig` (declarative config); write the missing page or update the cross-ref.	[PLACEHOLDER: owner needed]	`LINT_REPORT.md§<section>`, `wiki/commodity_hindcast/sources/prs/PR-353.md`
15	tau2 floor 24 orders of magnitude tighter than QUBE	High	Medium	TMI sets `tau2 = max(..., 1e-30)` in partial-pooling EB shrinkage; the in-file comment at `models/detrend/partial_pooling_detrend.py:326-330` explicitly notes QUBE uses `1e-6` and that the TMI floor produces λ ≈ 0 (pure national prior) where QUBE produces λ ≈ 1e-6/SE². On full production panels (~927 counties) τ² ≈ 0.44 so the floor is never active, but on small/synthetic/degenerate-slope panels the TMI floor is effectively zero and shrinkage collapses to a single county's slope. The risk is acknowledged in the source comment but the floor has not been raised.	Raise floor to `1e-6` to match QUBE; add a regression test that fits an EB shrinkage on a degenerate slope panel and asserts the shrinkage does not collapse to a single county. Pin a synthetic-panel snapshot of the resulting `eb_lambdas` so the QUBE-equivalent regime is testable in CI.	[PLACEHOLDER: owner needed]	`models/detrend/partial_pooling_detrend.py:334` (floor), `:326-330` (comment), `:138, :465, :493` (`self._tau2`), session A2 (`3804f325`)
16	Per-commodity yield-range bounds are the only delivery-side defensive net	Medium	Medium	`config.py:324` defines `yield_range: tuple[float, float]` on `CommodityConfig` ("A new commodity MUST declare its own range; this crashes early at config load if omitted"). The canonical clip helper `clip_yield_to_delivery_range(df, yield_range, value_cols, log_tag=...)` lives at `lib/unit_utils.py:93-130` and is called once at the delivery boundary by `delivery/conversions.py:396-398` (import at `:47`). Per-commodity values: corn `[50.0, 250.0]` (`configs/corn_usa.yaml:88`), wheat `[0.0, 260.0]` (`configs/wheat_usa.yaml:95`), cotton `[400.0, 1200.0]` (`configs/cotton_usa.yaml:90`). Wheat's `0.0` lower bound is intentionally permissive (county panels include failed-crop years), but it admits the 94 bu/ac wheat-mean-from-misaligned-climo defect that A6 originally hit; the clip would not have caught it. `run/preflight.py` performs file-existence checks only — no value-quality / null-rate / z-score-std checks. The dashboard at `app/_dashboard_config.py:143, 161, 181, 204, 219` re-uses the same `yield_range` for axis bounds, so loosening the YAML silently widens the dashboard charts too.	Document `yield_range` as a load-bearing guardrail in DESIGN.md; tighten ranges where defensible (e.g. wheat lower bound could be `5.0` if the failed-crop-year edge case is handled upstream); add upstream feature-quality assertions (climo null-rate ceiling, z-score std ceiling) so the clip is not the sole net; add a unit test that exercises `clip_yield_to_delivery_range` on out-of-range delivery rows and asserts both the clip behaviour and the warn-on-clip log line via `log_tag`.	[PLACEHOLDER: owner needed]	`config.py:324`, `lib/unit_utils.py:93-130`, `delivery/conversions.py:47, :396-398`, `configs/corn_usa.yaml:88`, `configs/wheat_usa.yaml:91-95`, `configs/cotton_usa.yaml:90`, `app/_dashboard_config.py`, session A6 (`6f8c9256`)
17	climo_lag_days regression watch — unification shifted coefficients across all commodities	High	Low	Pre-fix, `climo_lag_days = 30` was applied to harvest-init training rows, not just inference. The training row at harvest sdoy=184 saw a window `[1..154]` for corn — i.e. 30 days of end-of-season weather missing from training, contradicting DESIGN.md ("SHALL fit on harvest-time data"). Post-fix (Option B): harvest-row uses lag=0; other rows use lag=1. Reported coefficient shifts: 10-95% on most features (`gdd_zscore_gstd` -16.65 → -32.12, +93%; `tavg_zscore_gstd` -42.21 → -57.54, +36%; `stress_score_lag1` -4.87 → +2.00 sign flip). Affects every commodity's forecasts. Live default at `config.py:350` (`climo_lag_days: int = 1`); call site at `features/builders/climo.py:123`.	Add a regression test that asserts harvest-init training rows use lag=0 and other rows use lag=1; pin a coefficient-sign baseline on a fixture; assert `config.commodity.climo_lag_days >= 1` at config load (negatives already guarded by `df7ea52f`, but no test). Material change; warrants a release-note callout.	[PLACEHOLDER: owner needed]	`config.py:350`, `features/builders/climo.py:123`, `domain-modelling/schema.yaml:219`, fixed in commit `c7041b68` (2026-04-17), guard added in `df7ea52f`, session A6
18	Leap-year off-by-one in legacy season-array slicer	Medium	Low	Pre-fix legacy code did `zarr[year=Y, dayofyear=start..366]` regardless of leap-year state. For non-leap years (e.g. 2022 Oct 1-Dec 31 = 92 days) the slice gave 93 positions; the legacy path silently let the season array be one slot longer than `season_length`. Caught by the unified prototype's shape assertion. The `_legacy` variant has been removed; all callers go through `features/builders/climo.py:34` (`_build_season_array`), `features/builders/ndvi.py:97`, or `features/builders/weather.py:55` (`build_season_array_from_daily_zarr`). `features/builders/weather_stress.py:29` imports the same shared engine.	Verify no consumer outside the unified path still calls a calendar-DOY zarr slice without checking `is_leap_year(Y)`. Add a property-based test (hypothesis) that for any season window `[start, end]` and year `Y`, the returned array has length `end - start + 1` regardless of leap-year state.	[PLACEHOLDER: owner needed]	Removed in `c7041b68`; current call sites in `features/builders/{weather,climo,ndvi,weather_stress}.py`, session A6
19	Silent config drift between TMI and QUBE-parity baseline	High	Medium	Three corn config knobs were silently misaligned with QUBE per A5: (a) `correction_shrinkage` defaulted to `1.0` vs QUBE `0.3` (3.3x larger weather corrections); (b) `season_doy_weather_weight` ramp commented out, effectively 1.0 everywhere vs QUBE's active ramp; combined effect ~45x larger weather weight in early season; (c) `weather_correction_fit_level` was at one point ADM2 vs QUBE ADM0. 2023/2024 symptom: opposite-sign weather corrections vs QUBE. Fix collapsed max wx-correction diff from 6.36 bu/ac (with sign flips) to 0.071 bu/ac. Live `configs/corn_usa.yaml:283` pins `weather_correction_fit_level: ADM0`; `season_doy_weather_weight:` is present at `:326` (block-mapping form, not exhaustively verified — could still be a no-op stub). `correction_shrinkage` no longer appears in any `configs/*.yaml` — [PLACEHOLDER: knob may have been renamed, absorbed into the per-row weight ramp, or deleted; lineage needs `git log -- configs/corn_usa.yaml`]. Cross-commodity coverage (soy, cotton, wheat) is unverified by this round.	Add a config-parity regression test pinning `weather_correction_fit_level=ADM0` and the active `season_doy_weather_weight` ramp against a snapshot YAML; trace what happened to `correction_shrinkage` and document the rename / removal in DESIGN.md; add an equivalent parity check for soy, cotton, wheat configs.	[PLACEHOLDER: owner needed]	`configs/corn_usa.yaml:283, :326`, `issues/20260415_tmi_qube_weather_correction_and_trend_alignment.md`, session A5 (`8f327031`)
20	1980 row preservation — fix landed but no test	Medium	Low	Pre-fix `if not prior_mask.any(): continue` dropped 1980 entirely (972 county-rows), per A5 accounting for ~89% of the trend drift between TMI and QUBE. The detrender uses raw NASS yield not lagged features, so 1980 was always usable. Live (verified): `features/builders/yields.py:185-199` emits 1980 with NaN lagged features (`yield_last`, `yield_avg_3`, `yield_avg_5` set to `np.full(n, np.nan)`) instead of skipping. Comment at `:191-193` codifies the rationale.	Add a unit test asserting 1980 rows survive `_compute_yield_features` with NaN lags rather than being dropped; assert the row count of the panel equals `n_geos × n_years`.	[PLACEHOLDER: owner needed]	`features/builders/yields.py:165, :185-199`, session A5
21	`union_fit_pred_for_production_ranking` sweeps unpopulated pred years	Medium	Medium	`stages/run_hindcast.py:135` calls `prod_panel = union_fit_pred_for_production_ranking(fit_data, pred_data)` followed by `select_by_production(prod_panel, ..., max_year=max(config.experiment_protocol.test_years))` at `:139-145`. Helper defined at `lib/geo/selection.py:10`. The bug shape: the union sweeps in the unpopulated pred year, and `max_year` references a year with all-NaN production, biasing the top-95% production ranking by ~13 counties. A2/A3's claim that "primary worktree at `src/main.py:87-92` already passes `fit_data` only" does NOT hold in the live tree on `tl/bra-soy-update`; the live tree still passes the union.	Filter all-NaN-production years before ranking inside `union_fit_pred_for_production_ranking`, or change the call site to pass `fit_data` only and explicitly set `max_year=int(fit["year"].max())`; add a unit test on the helper that constructs a union frame with a trailing all-NaN year and asserts the ranking is unchanged versus a fit-only frame.	[PLACEHOLDER: owner needed]	`stages/run_hindcast.py:135, :139-145`, `lib/geo/selection.py:10`, sessions A2, A3
22	Multi-worktree drift on shared files (process risk)	Medium	Medium	Sibling worktrees still exist on disk: `treefera-market-insights-commodity-hindcast/`, `treefera-market-insights-commodity-hindcast-minim-impl-model-update/`, `treefera-market-insights-corn-yield-productionisation-v2/`, `treefera-market-insights-mergediag/`, `treefera-market-insights-forecast-wt/`, `treefera-market-insights.wt-validation-reports/`. The drift pattern persists in principle: any fix landed in one worktree but not the others silently keeps the bug. A2 also documented a concurrent "TREND_AXIS refactor" session editing the same files mid-orchestration; the TrendAxis machinery now lives at `models/detrend/time_axis.py:12`, so the axis refactor did land — but cross-worktree consistency is unverified.	Consolidate to a single worktree; if multiple worktrees are required, document them and add a CI check that diffs critical files (`config.py`, `models/detrend/partial_pooling_detrend.py`, `stages/run_hindcast.py`, `stages/run_fit.py`) across worktrees and fails the build on divergence outside an explicit allowlist.	[PLACEHOLDER: owner needed]	parallel copies of `stages/run_.py`, `models/detrend/.py`, `config.py` across worktrees, sessions A2, A3
23	Test coverage gap at tier-1 ADR surfaces (walk-forward driver, conformal modes)	Medium	High	The test suite at `tests/unit/commodity_hindcast/` (83 `.py` files) and `tests/integration/commodity_hindcast/` exists and is healthy in aggregate, but two tier-1 surfaces are unexercised: (1) Walk-forward driver (ADR-001) — `run_walk_forward` (`run/runner.py:27`) and `_predict_fold_rolling` (`run/runner.py:86`) have zero direct test coverage (`grep -rn "run_walk_forward\\|_predict_fold_rolling" tests/` returns no matches). These are the rolling-fold entry points the hindcast pipeline drives through, so a regression here would only surface in end-to-end runs. (2) Conformal residual modes (ADR-002) — only 2 of the 4 supported modes appear in any test; `out_of_sample_per_year` and `hindcast_oos_pooled` are completely uncovered. ADR-003 (`validate_residual_mode`) IS covered by `tests/integration/commodity_hindcast/test_forecast_residual_mode_validation.py`.	Add a unit test for `run_walk_forward` over a small synthetic panel asserting fold-by-fold prediction shape and that `_predict_fold_rolling` advances training years monotonically. Parametrise the existing conformal test over all four `ResidualMode` values so `out_of_sample_per_year` and `hindcast_oos_pooled` are exercised. Cross-link to R15 (EB shrinkage path needs its own regression test) and R16 (delivery-clip helper needs a test).	[PLACEHOLDER: owner needed]	`run/runner.py:27, :86`, `tests/unit/commodity_hindcast/test_apply_conformal_experiment.py`, `tests/unit/commodity_hindcast/test_postprocess.py`, ADR-001/002/003 cross-check, verification 2026-05-08

Risks deferred¶

Code-style feedback items (feedback_fstrings.md, feedback_no_backwards_compat.md, feedback_no_claude_attribution.md) — these are review-time conventions, not production risks; covered in the contributor guide.
feedback_qa_leave_conab_columns.md — agent-behaviour guidance for QA reports, not a pipeline risk.
DESIGN.md "TODO: need to define the Delivery module job" (DESIGN.md:110) — editorial gap rather than runtime risk; rolled into wiki backlog (Risk 13).
Custom exception hierarchy / marketing_year collapse / forecast.md See Also (DOMAIN_MODEL2.md §9 Open Questions) — design open questions with no current incident pressure.
delivery/conversions.py aliases obs-yield to nass_actual regardless of geography (PR-360 follow-up) — values correct, label cosmetic; tracked there.
WASDE/commodity_ prefix path drift fixed by PR-361 — historical, no longer a live risk.

Fixed in flight¶

These items were diagnosed AND fixed within their session; the underlying files have since been restructured but the design intent survives. Citations re-anchored where possible. Kept here for institutional memory; CLOSED, not open risks.

gen_report.py silent unit double-labelling (A4 NEW-1) — convert_metrics_to_bu_acre re-units 7+ metric columns including selection_bias_kg_ha, mae, rmse while leaving the _kg_ha suffix in place. Fixed in commit 6f7132cf (now appends _bu_ac). The file gen_report.py no longer exists anywhere in the tree; DESIGN.md:117 still references gen_report.py:convert_metrics_to_bu_acre as the canonical converter, but the implementation has moved. [PLACEHOLDER: locate the current renamer in the post-restructure layout.] Residual transition risk only — any consumer joining metrics_table.csv by old column names will silently miss columns post-rename.
included_geos defaulting to single test-fold (A4 NEW-2) — Pre-fix eval.py:173 built included_geos from test_data["geo_identifier"] (one fold's split). Fixed by 6f7132cf (derive from fit_data_full) and reinforced by 2b5545fa (required kwarg, no fallback). The file eval.py no longer exists; the contract survives at DESIGN.md:114 ("required keyword argument", "no default", "no fallback"). Runtime enforcement path needs re-verification post-restructure to confirm the kwarg is still threaded through evaluate_model → gen_metrics → estimate_walk_forward_selection_bias_kg_ha → compute_selection_bias_for_year_kg_ha.
DESIGN.md unit-discipline contract (A4 NEW-3) — kg/ha as canonical internal unit; included_geo_identifiers as the single required parameter name. Verified at DESIGN.md:114-117.
PcaRidgeRegressor national-mode fillna(0) made explicit (A2 Risk 8) — class lives at models/regression/pca_ridge_regressor.py:65. [PLACEHOLDER: re-anchor "fillna(0) made explicit" comment to specific line.]
Imputer re-export shim removed and extract_sample_weight inlined (A5 finding 12) — imputation utilities live at lib/edit_and_imputation/imputation.py; partition_groups_by_valid_obs at :146 is consumed directly by models/detrend/partial_pooling_detrend.py:25. Re-export shim absence is consistent with the "no backwards compatibility patterns" rule.
TMI PartialPoolingDetrend and PcaRidgeRegressor proven QUBE-equivalent (A5 finding 7) — historical equivalence claim. No current code change required.
aggregate_weighted_frame() ↔ QUBE _aggregate_national() cross-test bit-for-bit identical (A5 finding 8) — historical, settled.
QUBE stale feature cache identified (A5 finding 9) — TMI is correct; QUBE has the data bug. Documented as a known TMI-vs-QUBE divergence.
QUBE MultiStageEstimator.predict() silent county drop (A5 finding 10) — TMI is arguably more correct; documented divergence.
Wheat dim-order crash (A6 N1) — pre-fix ds[var_name].values[:, mask] assumed (geoid, time) but conus_adm2_wheat.zarr is (time, geoid). Fix verified live at features/builders/weather.py:75-76: var_da = var_da.transpose(geo_id_col, time_dim); raw = var_da.values[:, mask]. Transpose-to-canonical in place; landed in c7041b68.
QUBE wheat climo silently wrong (A6 N3) — gstd.sdoy_start = 91 was re-interpreted by QUBE as calendar DOY 91 = April 1, missing the autumn vegetative phase entirely (74% null rate, std=7 z-scores, 94 bu/ac county mean). Fixed by unification in c7041b68 (build_climo over season-DOY for all commodities); the redundant weather_builder config field was deleted. Wheat now uses the same season-DOY path as corn and soy.
Redundant july_*_county features (A6 N10) — july_edd_county / july_precip_county removed. Live configs use edd_jul/precip_jul only.
Imputer audit confirmed no yield imputation exists (A5 finding 5) — partial_pooling_detrend.py:233, 268 use partition_groups_by_valid_obs (row filter, not impute). The Imputer plumbing fills only the trend line; all regressors enforce nan_policy: raise (verified at models/regression/pca_ridge_regressor.py:79, 97, 114, 138). [PLACEHOLDER: _assert_no_raw_yield_in_features claimed as guardrail in runtime.py could not be located via grep; the guard may live elsewhere, have been renamed, or been removed entirely.]
edd_zscore_apr_jul zero-fill discussion (A2 Risk 2) — historical design discussion.
XGBoost native-NaN handling routed through median imputer (A2 Risk 10) — historical design trade-off.
regression_params: dict[str, Any] typed-schema gap (A2 Risk 9) — partial protection only; full mitigation out of scope.
Pre-commit hook silently rewrote files (A2 Risk 7) — process lesson.
Stop-hook E902 ruff cwd issue (A6 N11) — ~/.claude/hooks/lint-check.py runs ruff from the session cwd, not from the git toplevel. Patch sketched in A6 but not applied as of the verification round. Developer-experience drag, not a code regression.

Heatmap¶

Risk severity × likelihood heatmap

The current PNG renders only R1..R14 from the original register; rows R15..R23 are not yet plotted. [NOTE: heatmap re-render required after this merge to include R15..R23.]