Risk register — commodity_hindcast¶
Last updated: 2026-05-08
Total risks tracked: 23 (R1..R14 from the original register; R15..R23 merged in from risks_addendum.md on 2026-05-08; 7 refinements applied to existing rows; new "Fixed in flight" section added with 18 historical lessons)
Severity legend¶
- Critical: would corrupt a production delivery silently OR block a release
- High: likely to cause an incident / needs intervention this quarter
- Medium: known footgun; works around exist
- Low: minor / cosmetic
Risks¶
| # | Title | Severity | Likelihood | Description | Mitigation | Owner | Sources |
|---|---|---|---|---|---|---|---|
| 1 | TMI selection bias correction wrong sign | Critical | High | 2020 corn flips from QUBE +2.23 to TMI -8.31 bu/ac under an identical SBC formula — wrong sign with ~3.7x different magnitude (NOT the inherited "~17x smaller" figure, which came from a now-deleted gen_report.py::convert_metrics_to_bu_acre re-unitting selection_bias_kg_ha to bu/ac without renaming). The historical "weather features drop core counties via global dropna" mechanism is no longer dominant: training_dropna_subset at config.py:812-822 now returns ONLY [com.target_col, com.target_detrended_col] and no longer splats commodity.feature_cols. Root-cause status is therefore REOPEN — dropna-on-target alone cannot explain a 287-vs-972 county collapse. |
Two-track mitigation. (B) Quick unblock — make dropna column-aware per consuming estimator: largely landed at config.py:812-822 (subset narrowed to target/detrended target only); add a regression test that pins training_dropna_subset returns exactly that pair so a future widening cannot re-introduce the bug. (A) Root-cause — fix the weather feature builder so EDD z-scores cover all 935 top-95% counties; re-investigate the 287-vs-972 collapse now that dropna-splat is ruled out (1980 component fixed at features/builders/yields.py:185-199; Wayne County NC climo issue [PLACEHOLDER: not located in current tree]). Live dropna call sites: stages/run_fit.py:94 and run/experiment_protocol.py:54. Switch evaluation to the full prediction universe; the included_geo_identifiers contract is codified at DESIGN.md:114. |
[PLACEHOLDER: owner needed] | MEMORY.md/project_sbc_tmi_bug.md, config.py:812-822, stages/run_fit.py:94, run/experiment_protocol.py:54, features/builders/yields.py:185-199, DESIGN.md:114, sessions A2-A5 |
| 2 | DESIGN.md vs code drift on postprocessed/national.parquet |
High | High | DESIGN.md clause specifies postprocessed/{experiment_key}_national.parquet but code writes postprocessed/national.parquet (no infix). Any consumer that reads DESIGN.md as a code contract will fail, and the wiki pipelines/postprocess.md documents the discrepancy. |
Either update DESIGN.md:73 to drop the {experiment_key}_ infix, or update the writer to include it. Code path: lib/results/results_slice.py:368, lib/results/run_result.py:155. |
[PLACEHOLDER: owner needed] | DESIGN.md:73, lib/results/results_slice.py:368, lib/results/run_result.py:155, LINT_REPORT.md§<section> |
| 3 | S3 path anchoring on local-only sinks | High | High | INPUT_DATA_DIR is s3://... in QA; code that calls .as_posix(), pathlib.Path(AnyPath(...)), or anchors sqlite/lockfiles under data_root crashes (AttributeError: 'S3Path' object has no attribute 'as_posix') or produces sqlite:///s3://.... Recurring class — last incident f66f4ac9 (2026-04-29) broke wheat/corn/soy/cotton in QA. |
Branch on isinstance(x, CloudPath) for any URI sink; use str(path) not .as_posix(); route local-only sinks (sqlite, FileLock) to /tmp/<run-id>/...; use AnyPathParam (PR-345) for click args; add parametrised tests covering both Path and S3Path anchors. |
[PLACEHOLDER: owner needed] | MEMORY.md/feedback_s3_path_anchoring.md, wiki/commodity_hindcast/sources/prs/PR-345.md, commit f66f4ac9, fix 13117727 |
| 4 | MLflow SQLite locking on parallel same-commodity runs | High | Medium | Running two pipelines for the same commodity in parallel hits SQLAlchemy OperationalError because two writers contend for the same MLflow SQLite DB. Blocks concurrent backfills. |
Run same-commodity pipelines sequentially; longer term, move MLflow tracking to a server backend (Postgres/MySQL) or per-run isolated DB files. | [PLACEHOLDER: owner needed] | MEMORY.md Known Issues section, mlruns.db artefacts in repo root |
| 5 | _OOS_MODES frozenset hand-maintained vs ResidualMode Literal |
High | Low | stages/run_forecast.py:82 keeps a hand-maintained frozenset of OOS modes used by validate_residual_mode. Any new value added to ResidualMode in models/meta_models/types.py that is not also added to _OOS_MODES would silently allow a non-OOS mode through the validation gate when no conformal sidecar exists. |
Replace frozenset with set(get_args(ResidualMode)) - {"in_sample_pooled"}; add a unit test that pins the derivation. |
[PLACEHOLDER: owner needed] | LINT_REPORT.md§<section>, stages/run_forecast.py:82, models/meta_models/types.py, wiki/commodity_hindcast/sources/prs/PR-372.md |
| 6 | Wheat sub-types listed in config but not produced | High | Medium | Wheat preprocessor only emits crop_type WHEAT. The 2026-04-17 unified-builder commit c7041b68 ("refactor(commodity_hindcast/features): unify feature builders on one season-DOY path") consolidated all wheat configs onto a single season-DOY path; WINTER_WHEAT survives only as a NASS crop_type filter at configs/wheat_usa.yaml:214, while SPRING_DURUM_WHEAT and SPRING_EXCL_DURUM_WHEAT are not represented in current configs. Sub-type reintroduction would now require its own season_start_doy/freeze_cap_sdoy/season_length triple per sub-type and is untested under the unified path. |
Either implement the sub-type split in the wheat preprocessor (with the per-sub-type season-DOY triples) or remove the unsupported entries from configs and document WHEAT as the only produced type. Treat sub-type reintroduction as orthogonal to unification. |
[PLACEHOLDER: owner needed] | MEMORY.md Known Issues — Wheat sub-types, commit c7041b68, configs/wheat_usa.yaml:214, session A6 |
| 7 | Sugar / non-US preprocessing missing geometries | High | Medium | Default CROP_YIELD_GEOBOUNDARIES_FILE=/data/processing/yield_forecast/raw/boundaries/geometry.parquet is US-only and has no IND/GHA/BRA rows; non-US runs need the 52 k-row all_geoboundaries_processed.parquet. Wrong env var ⇒ silently empty geometry joins. |
Document the override; add a config-time check that asserts geography rows exist for the configured commodity; default Brazil/India/Ghana configs should set the path. | [PLACEHOLDER: owner needed] | MEMORY.md Known Issues — Sugar/non-US preprocessing |
| 8 | Two upward-import layering violations in delivery/ |
Medium | Medium | Two modules in delivery/ import from the stages/ layer rather than from the shared lib/ layer, breaking the single-direction import DAG (DESIGN.md Clause 19). Risks reintroducing the cycle that PR-339 spent nine phases to break. |
Move shared helpers (conformal apply, unit conversion) into lib/; have both delivery/ and stages/run_meta_models.py import from there. Tracked in BOUNDED_CONTEXTS.md. |
[PLACEHOLDER: owner needed] | LINT_REPORT.md§<section>, wiki/commodity_hindcast/sources/prs/PR-339.md, wiki/commodity_hindcast/sources/prs/PR-353.md open question §9 |
| 9 | Long-range climo stub silently retrendless beyond zarr | Medium | Medium | forecast_long_range_stub.py (PR-369) fills missing z-score features for season_year beyond the materialised climo zarr's coverage with per-county trailing-3-year medians. Long-range forecasts collapse to trend-only output by design (season_doy_weather_weight = 0); this is correct only as long as the schedule is honoured. The filename is deliberately temporary — removal blocked on extending the climo zarr. |
Extend the materialised climo zarr horizon; remove the stub when no caller imports from it. Until then, keep the three WARNING log lines and the docstring removal criteria visible. Use materialise_for_forecast(...) (centralised) rather than ad-hoc DOY collapse. |
[PLACEHOLDER: owner needed] | wiki/commodity_hindcast/sources/prs/PR-369.md, features/forecast_long_range_stub.py, MEMORY.md/feedback_centralised_climo.md |
| 10 | Forecast walk-forward has no per-fold checkpoint | Medium | Low | Walk-forward CV restart re-does all earlier folds; an interrupted multi-year hindcast wastes hours of compute. Tracked as a §9 Open Question in DOMAIN_MODEL2.md. | Add per-fold completion sentinels under run_dir/; resume by skipping folds whose artefacts already exist and pass schema validation. |
[PLACEHOLDER: owner needed] | wiki/.../sources/prs/PR-353.md §9 Open Questions, DOMAIN_MODEL2.md §9 |
| 11 | Streamlit dashboard regressions invisible until manual launch | Medium | Medium | The app/ package has no automated test that exercises streamlit run startup. PR-360's reference-data refactor silently broke dashboard imports until PR-363 patched three independent bugs by hand. The required PYTHONPATH + two env vars are also fragile — streamlit run bypasses uv's package discovery. |
Add a smoke test that imports app/app.py under the same PYTHONPATH shape Streamlit uses; document the launch incantation in the runbook. |
[PLACEHOLDER: owner needed] | wiki/commodity_hindcast/sources/prs/PR-363.md, wiki/commodity_hindcast/sources/prs/PR-360.md, MEMORY.md/project_streamlit_app_launch.md |
| 12 | QA RDS unreachable from dev EC2 forces local_data symlink | Medium | High | Fresh worktrees that run dev_tools/run_model_local.py time out fetching geometry from QA Postgres because the SG/route blocks the dev box. The script only hits the DB when local_data/<model_id>/geometry.parquet is missing — every worktree needs the symlink ritual. |
Mirror the cached geometry.parquet files into a documented shared location; or open the SG route from dev EC2; or make the script fail fast with the symlink instruction. |
[PLACEHOLDER: owner needed] | MEMORY.md/project_local_data_symlink.md |
| 13 | Wiki broken-link backlog (84 broken refs across 47 files) | Medium | High | Lint pass found 84 broken relative links and 6 high-value missing pages (notably pipelines/hindcast.md, referenced by 19 entity pages). New onboarding engineers will hit dead-ends across the entity catalogue. |
Write pipelines/hindcast.md first (highest fan-in); fix PR-345/361/369/372 cross-ref paths; backfill concepts/experiment_protocol.md and concepts/season_doy.md (17 and 25 inbound refs respectively). |
[PLACEHOLDER: owner needed] | LINT_REPORT.md§Broken Links, LINT_REPORT.md§Recommended Next Pass |
| 14 | entities/EditRule.md vs entities/EditRuleConfig.md naming ambiguity |
Low | Low | PR-353.md references EditRule.md while only EditRuleConfig.md exists. May be one missing page (the rule operation, distinct from the rule config) or a stale name. |
Decide whether EditRule (operation) is a separate entity from EditRuleConfig (declarative config); write the missing page or update the cross-ref. |
[PLACEHOLDER: owner needed] | LINT_REPORT.md§<section>, wiki/commodity_hindcast/sources/prs/PR-353.md |
| 15 | tau2 floor 24 orders of magnitude tighter than QUBE | High | Medium | TMI sets tau2 = max(..., 1e-30) in partial-pooling EB shrinkage; the in-file comment at models/detrend/partial_pooling_detrend.py:326-330 explicitly notes QUBE uses 1e-6 and that the TMI floor produces λ ≈ 0 (pure national prior) where QUBE produces λ ≈ 1e-6/SE². On full production panels (~927 counties) τ² ≈ 0.44 so the floor is never active, but on small/synthetic/degenerate-slope panels the TMI floor is effectively zero and shrinkage collapses to a single county's slope. The risk is acknowledged in the source comment but the floor has not been raised. |
Raise floor to 1e-6 to match QUBE; add a regression test that fits an EB shrinkage on a degenerate slope panel and asserts the shrinkage does not collapse to a single county. Pin a synthetic-panel snapshot of the resulting eb_lambdas so the QUBE-equivalent regime is testable in CI. |
[PLACEHOLDER: owner needed] | models/detrend/partial_pooling_detrend.py:334 (floor), :326-330 (comment), :138, :465, :493 (self._tau2), session A2 (3804f325) |
| 16 | Per-commodity yield-range bounds are the only delivery-side defensive net | Medium | Medium | config.py:324 defines yield_range: tuple[float, float] on CommodityConfig ("A new commodity MUST declare its own range; this crashes early at config load if omitted"). The canonical clip helper clip_yield_to_delivery_range(df, yield_range, value_cols, log_tag=...) lives at lib/unit_utils.py:93-130 and is called once at the delivery boundary by delivery/conversions.py:396-398 (import at :47). Per-commodity values: corn [50.0, 250.0] (configs/corn_usa.yaml:88), wheat [0.0, 260.0] (configs/wheat_usa.yaml:95), cotton [400.0, 1200.0] (configs/cotton_usa.yaml:90). Wheat's 0.0 lower bound is intentionally permissive (county panels include failed-crop years), but it admits the 94 bu/ac wheat-mean-from-misaligned-climo defect that A6 originally hit; the clip would not have caught it. run/preflight.py performs file-existence checks only — no value-quality / null-rate / z-score-std checks. The dashboard at app/_dashboard_config.py:143, 161, 181, 204, 219 re-uses the same yield_range for axis bounds, so loosening the YAML silently widens the dashboard charts too. |
Document yield_range as a load-bearing guardrail in DESIGN.md; tighten ranges where defensible (e.g. wheat lower bound could be 5.0 if the failed-crop-year edge case is handled upstream); add upstream feature-quality assertions (climo null-rate ceiling, z-score std ceiling) so the clip is not the sole net; add a unit test that exercises clip_yield_to_delivery_range on out-of-range delivery rows and asserts both the clip behaviour and the warn-on-clip log line via log_tag. |
[PLACEHOLDER: owner needed] | config.py:324, lib/unit_utils.py:93-130, delivery/conversions.py:47, :396-398, configs/corn_usa.yaml:88, configs/wheat_usa.yaml:91-95, configs/cotton_usa.yaml:90, app/_dashboard_config.py, session A6 (6f8c9256) |
| 17 | climo_lag_days regression watch — unification shifted coefficients across all commodities | High | Low | Pre-fix, climo_lag_days = 30 was applied to harvest-init training rows, not just inference. The training row at harvest sdoy=184 saw a window [1..154] for corn — i.e. 30 days of end-of-season weather missing from training, contradicting DESIGN.md ("SHALL fit on harvest-time data"). Post-fix (Option B): harvest-row uses lag=0; other rows use lag=1. Reported coefficient shifts: 10-95% on most features (gdd_zscore_gstd -16.65 → -32.12, +93%; tavg_zscore_gstd -42.21 → -57.54, +36%; stress_score_lag1 -4.87 → +2.00 sign flip). Affects every commodity's forecasts. Live default at config.py:350 (climo_lag_days: int = 1); call site at features/builders/climo.py:123. |
Add a regression test that asserts harvest-init training rows use lag=0 and other rows use lag=1; pin a coefficient-sign baseline on a fixture; assert config.commodity.climo_lag_days >= 1 at config load (negatives already guarded by df7ea52f, but no test). Material change; warrants a release-note callout. |
[PLACEHOLDER: owner needed] | config.py:350, features/builders/climo.py:123, domain-modelling/schema.yaml:219, fixed in commit c7041b68 (2026-04-17), guard added in df7ea52f, session A6 |
| 18 | Leap-year off-by-one in legacy season-array slicer | Medium | Low | Pre-fix legacy code did zarr[year=Y, dayofyear=start..366] regardless of leap-year state. For non-leap years (e.g. 2022 Oct 1-Dec 31 = 92 days) the slice gave 93 positions; the legacy path silently let the season array be one slot longer than season_length. Caught by the unified prototype's shape assertion. The _legacy variant has been removed; all callers go through features/builders/climo.py:34 (_build_season_array), features/builders/ndvi.py:97, or features/builders/weather.py:55 (build_season_array_from_daily_zarr). features/builders/weather_stress.py:29 imports the same shared engine. |
Verify no consumer outside the unified path still calls a calendar-DOY zarr slice without checking is_leap_year(Y). Add a property-based test (hypothesis) that for any season window [start, end] and year Y, the returned array has length end - start + 1 regardless of leap-year state. |
[PLACEHOLDER: owner needed] | Removed in c7041b68; current call sites in features/builders/{weather,climo,ndvi,weather_stress}.py, session A6 |
| 19 | Silent config drift between TMI and QUBE-parity baseline | High | Medium | Three corn config knobs were silently misaligned with QUBE per A5: (a) correction_shrinkage defaulted to 1.0 vs QUBE 0.3 (3.3x larger weather corrections); (b) season_doy_weather_weight ramp commented out, effectively 1.0 everywhere vs QUBE's active ramp; combined effect ~45x larger weather weight in early season; (c) weather_correction_fit_level was at one point ADM2 vs QUBE ADM0. 2023/2024 symptom: opposite-sign weather corrections vs QUBE. Fix collapsed max wx-correction diff from 6.36 bu/ac (with sign flips) to 0.071 bu/ac. Live configs/corn_usa.yaml:283 pins weather_correction_fit_level: ADM0; season_doy_weather_weight: is present at :326 (block-mapping form, not exhaustively verified — could still be a no-op stub). correction_shrinkage no longer appears in any configs/*.yaml — [PLACEHOLDER: knob may have been renamed, absorbed into the per-row weight ramp, or deleted; lineage needs git log -- configs/corn_usa.yaml]. Cross-commodity coverage (soy, cotton, wheat) is unverified by this round. |
Add a config-parity regression test pinning weather_correction_fit_level=ADM0 and the active season_doy_weather_weight ramp against a snapshot YAML; trace what happened to correction_shrinkage and document the rename / removal in DESIGN.md; add an equivalent parity check for soy, cotton, wheat configs. |
[PLACEHOLDER: owner needed] | configs/corn_usa.yaml:283, :326, issues/20260415_tmi_qube_weather_correction_and_trend_alignment.md, session A5 (8f327031) |
| 20 | 1980 row preservation — fix landed but no test | Medium | Low | Pre-fix if not prior_mask.any(): continue dropped 1980 entirely (972 county-rows), per A5 accounting for ~89% of the trend drift between TMI and QUBE. The detrender uses raw NASS yield not lagged features, so 1980 was always usable. Live (verified): features/builders/yields.py:185-199 emits 1980 with NaN lagged features (yield_last, yield_avg_3, yield_avg_5 set to np.full(n, np.nan)) instead of skipping. Comment at :191-193 codifies the rationale. |
Add a unit test asserting 1980 rows survive _compute_yield_features with NaN lags rather than being dropped; assert the row count of the panel equals n_geos × n_years. |
[PLACEHOLDER: owner needed] | features/builders/yields.py:165, :185-199, session A5 |
| 21 | union_fit_pred_for_production_ranking sweeps unpopulated pred years |
Medium | Medium | stages/run_hindcast.py:135 calls prod_panel = union_fit_pred_for_production_ranking(fit_data, pred_data) followed by select_by_production(prod_panel, ..., max_year=max(config.experiment_protocol.test_years)) at :139-145. Helper defined at lib/geo/selection.py:10. The bug shape: the union sweeps in the unpopulated pred year, and max_year references a year with all-NaN production, biasing the top-95% production ranking by ~13 counties. A2/A3's claim that "primary worktree at src/main.py:87-92 already passes fit_data only" does NOT hold in the live tree on tl/bra-soy-update; the live tree still passes the union. |
Filter all-NaN-production years before ranking inside union_fit_pred_for_production_ranking, or change the call site to pass fit_data only and explicitly set max_year=int(fit["year"].max()); add a unit test on the helper that constructs a union frame with a trailing all-NaN year and asserts the ranking is unchanged versus a fit-only frame. |
[PLACEHOLDER: owner needed] | stages/run_hindcast.py:135, :139-145, lib/geo/selection.py:10, sessions A2, A3 |
| 22 | Multi-worktree drift on shared files (process risk) | Medium | Medium | Sibling worktrees still exist on disk: treefera-market-insights-commodity-hindcast/, treefera-market-insights-commodity-hindcast-minim-impl-model-update/, treefera-market-insights-corn-yield-productionisation-v2/, treefera-market-insights-mergediag/, treefera-market-insights-forecast-wt/, treefera-market-insights.wt-validation-reports/. The drift pattern persists in principle: any fix landed in one worktree but not the others silently keeps the bug. A2 also documented a concurrent "TREND_AXIS refactor" session editing the same files mid-orchestration; the TrendAxis machinery now lives at models/detrend/time_axis.py:12, so the axis refactor did land — but cross-worktree consistency is unverified. |
Consolidate to a single worktree; if multiple worktrees are required, document them and add a CI check that diffs critical files (config.py, models/detrend/partial_pooling_detrend.py, stages/run_hindcast.py, stages/run_fit.py) across worktrees and fails the build on divergence outside an explicit allowlist. |
[PLACEHOLDER: owner needed] | parallel copies of stages/run_*.py, models/detrend/*.py, config.py across worktrees, sessions A2, A3 |
| 23 | Test coverage gap at tier-1 ADR surfaces (walk-forward driver, conformal modes) | Medium | High | The test suite at tests/unit/commodity_hindcast/ (83 .py files) and tests/integration/commodity_hindcast/ exists and is healthy in aggregate, but two tier-1 surfaces are unexercised: (1) Walk-forward driver (ADR-001) — run_walk_forward (run/runner.py:27) and _predict_fold_rolling (run/runner.py:86) have zero direct test coverage (grep -rn "run_walk_forward\|_predict_fold_rolling" tests/ returns no matches). These are the rolling-fold entry points the hindcast pipeline drives through, so a regression here would only surface in end-to-end runs. (2) Conformal residual modes (ADR-002) — only 2 of the 4 supported modes appear in any test; out_of_sample_per_year and hindcast_oos_pooled are completely uncovered. ADR-003 (validate_residual_mode) IS covered by tests/integration/commodity_hindcast/test_forecast_residual_mode_validation.py. |
Add a unit test for run_walk_forward over a small synthetic panel asserting fold-by-fold prediction shape and that _predict_fold_rolling advances training years monotonically. Parametrise the existing conformal test over all four ResidualMode values so out_of_sample_per_year and hindcast_oos_pooled are exercised. Cross-link to R15 (EB shrinkage path needs its own regression test) and R16 (delivery-clip helper needs a test). |
[PLACEHOLDER: owner needed] | run/runner.py:27, :86, tests/unit/commodity_hindcast/test_apply_conformal_experiment.py, tests/unit/commodity_hindcast/test_postprocess.py, ADR-001/002/003 cross-check, verification 2026-05-08 |
Risks deferred¶
- Code-style feedback items (
feedback_fstrings.md,feedback_no_backwards_compat.md,feedback_no_claude_attribution.md) — these are review-time conventions, not production risks; covered in the contributor guide. feedback_qa_leave_conab_columns.md— agent-behaviour guidance for QA reports, not a pipeline risk.- DESIGN.md "TODO: need to define the Delivery module job" (
DESIGN.md:110) — editorial gap rather than runtime risk; rolled into wiki backlog (Risk 13). - Custom exception hierarchy /
marketing_yearcollapse /forecast.mdSee Also (DOMAIN_MODEL2.md §9 Open Questions) — design open questions with no current incident pressure. delivery/conversions.pyaliases obs-yield tonass_actualregardless of geography (PR-360 follow-up) — values correct, label cosmetic; tracked there.- WASDE/
commodity_prefix path drift fixed by PR-361 — historical, no longer a live risk.
Fixed in flight¶
These items were diagnosed AND fixed within their session; the underlying files have since been restructured but the design intent survives. Citations re-anchored where possible. Kept here for institutional memory; CLOSED, not open risks.
gen_report.pysilent unit double-labelling (A4 NEW-1) —convert_metrics_to_bu_acrere-units 7+ metric columns includingselection_bias_kg_ha,mae,rmsewhile leaving the_kg_hasuffix in place. Fixed in commit6f7132cf(now appends_bu_ac). The filegen_report.pyno longer exists anywhere in the tree; DESIGN.md:117 still referencesgen_report.py:convert_metrics_to_bu_acreas the canonical converter, but the implementation has moved. [PLACEHOLDER: locate the current renamer in the post-restructure layout.] Residual transition risk only — any consumer joiningmetrics_table.csvby old column names will silently miss columns post-rename.included_geosdefaulting to single test-fold (A4 NEW-2) — Pre-fixeval.py:173builtincluded_geosfromtest_data["geo_identifier"](one fold's split). Fixed by6f7132cf(derive fromfit_data_full) and reinforced by2b5545fa(required kwarg, no fallback). The fileeval.pyno longer exists; the contract survives at DESIGN.md:114 ("required keyword argument", "no default", "no fallback"). Runtime enforcement path needs re-verification post-restructure to confirm the kwarg is still threaded throughevaluate_model→gen_metrics→estimate_walk_forward_selection_bias_kg_ha→compute_selection_bias_for_year_kg_ha.- DESIGN.md unit-discipline contract (A4 NEW-3) — kg/ha as canonical internal unit;
included_geo_identifiersas the single required parameter name. Verified at DESIGN.md:114-117. PcaRidgeRegressornational-modefillna(0)made explicit (A2 Risk 8) — class lives atmodels/regression/pca_ridge_regressor.py:65. [PLACEHOLDER: re-anchor "fillna(0) made explicit" comment to specific line.]- Imputer re-export shim removed and
extract_sample_weightinlined (A5 finding 12) — imputation utilities live atlib/edit_and_imputation/imputation.py;partition_groups_by_valid_obsat:146is consumed directly bymodels/detrend/partial_pooling_detrend.py:25. Re-export shim absence is consistent with the "no backwards compatibility patterns" rule. - TMI
PartialPoolingDetrendandPcaRidgeRegressorproven QUBE-equivalent (A5 finding 7) — historical equivalence claim. No current code change required. aggregate_weighted_frame()↔ QUBE_aggregate_national()cross-test bit-for-bit identical (A5 finding 8) — historical, settled.- QUBE stale feature cache identified (A5 finding 9) — TMI is correct; QUBE has the data bug. Documented as a known TMI-vs-QUBE divergence.
- QUBE
MultiStageEstimator.predict()silent county drop (A5 finding 10) — TMI is arguably more correct; documented divergence. - Wheat dim-order crash (A6 N1) — pre-fix
ds[var_name].values[:, mask]assumed(geoid, time)butconus_adm2_wheat.zarris(time, geoid). Fix verified live atfeatures/builders/weather.py:75-76:var_da = var_da.transpose(geo_id_col, time_dim); raw = var_da.values[:, mask]. Transpose-to-canonical in place; landed inc7041b68. - QUBE wheat climo silently wrong (A6 N3) —
gstd.sdoy_start = 91was re-interpreted by QUBE as calendar DOY 91 = April 1, missing the autumn vegetative phase entirely (74% null rate, std=7 z-scores, 94 bu/ac county mean). Fixed by unification inc7041b68(build_climoover season-DOY for all commodities); the redundantweather_builderconfig field was deleted. Wheat now uses the same season-DOY path as corn and soy. - Redundant
july_*_countyfeatures (A6 N10) —july_edd_county/july_precip_countyremoved. Live configs useedd_jul/precip_julonly. - Imputer audit confirmed no yield imputation exists (A5 finding 5) —
partial_pooling_detrend.py:233, 268usepartition_groups_by_valid_obs(row filter, not impute). TheImputerplumbing fills only the trend line; all regressors enforcenan_policy: raise(verified atmodels/regression/pca_ridge_regressor.py:79, 97, 114, 138). [PLACEHOLDER:_assert_no_raw_yield_in_featuresclaimed as guardrail inruntime.pycould not be located via grep; the guard may live elsewhere, have been renamed, or been removed entirely.] edd_zscore_apr_julzero-fill discussion (A2 Risk 2) — historical design discussion.- XGBoost native-NaN handling routed through median imputer (A2 Risk 10) — historical design trade-off.
regression_params: dict[str, Any]typed-schema gap (A2 Risk 9) — partial protection only; full mitigation out of scope.- Pre-commit hook silently rewrote files (A2 Risk 7) — process lesson.
- Stop-hook
E902ruff cwd issue (A6 N11) —~/.claude/hooks/lint-check.pyruns ruff from the session cwd, not from the git toplevel. Patch sketched in A6 but not applied as of the verification round. Developer-experience drag, not a code regression.
Heatmap¶

The current PNG renders only R1..R14 from the original register; rows R15..R23 are not yet plotted. [NOTE: heatmap re-render required after this merge to include R15..R23.]