Diagnostic Plots¶

Overview¶

The diagnostic plots subsystem implements a two-layer, declarative architecture under market_insights_models/src/commodity_hindcast/diagnostics/plots/.

Layer 1 — Registry. PlotGroup and PlotSpec dataclasses (defined in registry.py) are the atomic units. A PlotGroup groups one or more PlotSpec entries that share a common data-preparation callable, so the I/O-heavy prepare_data call is made once per group rather than once per plot. Each group has a scope of either "cross_fold" (runs once on the full ExperimentResult) or "per_fold" (runs separately for every HindcastSlice). The full registry of eight PlotGroup instances is assembled lazily by build_plot_registry() (registry.py:77), which defers all function imports to avoid circular dependencies. On first call, get_plot_registry() (registry.py:356) caches the result in the module-level _PLOT_REGISTRY singleton.

Layer 2 — Runner. PlotRunner (runner.py:59) owns every I/O concern. Its run() method iterates the registry, calls prepare_data for each group, then delegates to _run_spec() per PlotSpec. _run_spec resolves dynamic kwargs via spec.resolve_kwargs(result), calls the pure plot function spec.plot_fn(df, **kwargs), and passes the returned Figure (or list[Figure]) to _save_figures(). Figures are persisted as PNG via _save_png() (runner.py:37).

Persistence strategy. Local destinations use an atomic write — a .tmp.png is written first, then renamed — to avoid partial files. Cloud destinations (s3://, gs://, etc.) are handled by encoding the PNG to bytes in memory and calling CloudPath.write_bytes(), because matplotlib cannot write directly to remote URIs and rename would trigger superfluous HEAD requests. DPI is fixed at 150.

Output layout. Per-fold plots ({fold_label}_*.png) are saved under run_dir/reports/hindcast/. Cross-fold plots (rolling_forecast.png, improvement_heatmap.png, stage7*_*.png, etc.) are saved directly under run_dir/reports/. The public entry point generate_plots() (__init__.py:21) loads an ExperimentResult from disk, constructs PlotRunner, calls runner.run(), and returns the list of saved paths.

Pure-function contract. Plot functions in fns/ never touch disk, never load config objects, and never access ExperimentResult directly. They accept a pd.DataFrame plus typed keyword arguments and return a matplotlib.figure.Figure (or list[Figure] for chunked plots such as PDP feature-space). All I/O, ExperimentResult access, and kwargs resolution live in the runner.

Spine¶

`init.py`¶

Exposes generate_plots(run_dir, *, only, skip) as the sole public entry point (__init__.py:21). Internally constructs ExperimentResult.from_run_dir(run_dir), canonicalises output_dir = result.run_dir / "reports", and delegates entirely to PlotRunner. Individual plot groups can be included or excluded by name via only/skip.

`runner.py` — `PlotRunner`¶

Symbol	Location	Role
`PlotRunner.__init__`	`runner.py:62`	Stores `ExperimentResult` and `AnyPath` output directory
`PlotRunner.run`	`runner.py:66`	Iterates registry; dispatches `per_fold` / `cross_fold` branches
`PlotRunner._run_spec`	`runner.py:111`	Resolves kwargs, calls plot fn, delegates to `_save_figures`
`PlotRunner._save_figures`	`runner.py:131`	Derives filename from template; calls `_save_png`
`_save_png`	`runner.py:37`	Atomic local write or `CloudPath.write_bytes` for remote URIs

`registry.py` — `PlotSpec`, `PlotGroup`, `get_plot_registry`¶

Symbol	Location	Role
`PlotSpec`	`registry.py:29`	Frozen dataclass: `name`, `plot_fn`, `filename_template`, `resolve_kwargs`
`PlotGroup`	`registry.py:39`	Frozen dataclass: `name`, `prepare_data`, `scope`, `plots: tuple[PlotSpec]`
`build_plot_registry`	`registry.py:77`	Assembles the eight `PlotGroup` instances with deferred imports
`get_plot_registry`	`registry.py:356`	Lazily builds and caches `_PLOT_REGISTRY`; called by `PlotRunner.run`
`_build_truth_obs_by_fold`	`registry.py:53`	Extracts per-fold NASS survey yield (bu/ac) from `ExperimentResult`
`_primary_reference_name`	`registry.py:65`	Returns first `cfg.reference_data` spec name (default `"wasde"`)

Adding a new plot. Create a fns/<name>.py and optionally prep/<name>.py. Define a PlotGroup in build_plot_registry() with one or more PlotSpec entries pointing to the new functions. No other registration step is required — get_plot_registry() picks it up on next call.

Plot inventory¶

fn module	Plot fn(s)	Prep module	Scope	What it shows
`fns/rolling_forecast.py`	`plot_rolling_forecast_v_reference`	`prep/rolling_forecast.py`	cross_fold	One subplot per fold: Treefera forecast vs reference series (e.g. WASDE) vs truth (bu/acre), with planting/harvest date markers and MAE verdict per fold
`fns/improvement_heatmap.py`	`plot_improvement_heatmap`	`prep/rolling_forecast.py`	cross_fold	Year × init heatmap of `(\|e_REF\| − \|e_Treefera\|) / truth × 100`; green = Treefera closer, red = reference closer
`fns/information_advantage.py`	`plot_information_advantage`	`prep/rolling_forecast.py`	cross_fold	Mean signed advantage vs reference per init, overlaid per-year traces, green/red fill; shows how early in the season Treefera diverges favourably
`fns/benchmark_grid.py`	`plot_benchmark_grid`	`prep/rolling_forecast.py`	cross_fold	4 × 3 MAE bar grid (timing columns: Planting/Mid-season/Harvest/End-of-season; rows: vs NASS area-weighted / vs NASS prod÷area / vs reference final); three bars per cell: Treefera, reference, Persistence
`fns/detrend.py`	`plot_detrend_quality`	`prep/detrend.py`	per_fold	Multi-panel detrend quality: national residual trend, per-county residual-slope histogram, residual-vs-trend scatter, plus per-state raw-yield/fitted-trend/residual rows for the top-N states by production
`fns/trend_evolution.py`	`plot_trend_fit_grid`, `plot_trend_fit_detrended`	`prep/trend_evolution.py`	per_fold	Small-multiples grid of 12 sampled counties + national panel showing observed yield + fitted trend (`_grid`) or detrended residuals (`_detrended`); in-sample vs OOS periods shaded
`fns/residual_predictability.py`	`plot_residual_predictability`	`prep/residual_predictability.py`	per_fold	Three-panel residual predictability audit at ADM2 or ADM0 level: LOO-by-year Ridge actual-vs-predicted, per-year residual boxplots, top-feature correlation scatters
`fns/scatter.py`	`plot_train_test_scatter`	`prep/scatter.py`	cross_fold	Per-fold train/test predicted-vs-observed scatter (county level, area-weighted); MAE, RMSE, R² annotated
`fns/detrended_scatter.py`	`plot_detrended_forecast_feature_scatter`	`prep/detrended_scatter.py`	cross_fold	Pooled feature-vs-detrended-forecast scatters ordered by
`fns/pdp.py`	`plot_pdp_feature_space` (returns `list[Figure]`)	`prep/pdp.py`	per_fold	Feature-space partial dependence curves chunked into groups of 5; per-fold traces + mean line
`fns/pdp.py`	`plot_pdp_pc_space`	`prep/pdp.py`	cross_fold	PC-space PDP curves for PCA–Ridge regressors; one panel per PC, fold-coloured lines; placeholder if non-PCA regressor
`fns/delivery.py`	`plot_delivery_forecast` (stage7a)	`prep/delivery.py`	cross_fold	Forecast + 50% CI band + reference in-season traces + truth actual, one subplot per delivery year
`fns/delivery.py`	`plot_delivery_uncertainty` (stage7b)	`prep/delivery.py`	cross_fold	CI halfwidths centred on zero per year, plus an overlay panel comparing all years at the narrowest CI band
`fns/delivery.py`	`plot_delivery_reference_comparison` (stage7c)	`prep/delivery.py`	cross_fold	CI bands + reference vs Treefera with MAE badge per year; supports multiple `*_in_season` reference columns
`fns/delivery.py`	`plot_delivery_weather_correction` (stage7d)	`prep/delivery.py`	cross_fold	Two-row grid: forecast evolution + CI bands (row 0) and `weather_correction_bu_ac` coloured bars with changepoint annotations (row 1)

Note: plot_pdp_feature_space is bound to the "pdp" group (per_fold) and plot_pdp_pc_space to the "pdp_pc_space" group (cross_fold); both share prep/pdp.py but are separate PlotGroup instances because their prep functions differ (prepare_pdp_df vs prepare_pdp_pc_df).

Prep module roles¶

Each prep module is a pure consumer of ExperimentResult (or ExperimentResult + HindcastSlice for per-fold groups). They read persisted artefacts from disk — parquet prediction files, fitted detrender states, delivery CSVs — and produce a tidy pd.DataFrame that the corresponding plot function expects. No models are re-fitted, no predictions are recomputed; the prep layer is READ-only (DESIGN.md §71).

Key prep patterns:

prep/rolling_forecast.py — Aggregates per-fold walk_forward_preds.parquet to ADM0 area-weighted means via lib.geo.aggregation, joins per-init reference release values from lib.reference_data.loader, and appends NASS benchmark constants. Shared by four PlotSpec entries in the rolling_forecast group.
prep/delivery.py — Reads the persisted ADM0 hindcast delivery CSV (run_dir/delivery/Treefera_{experiment_key}_ADM0_Hindcast_*.csv) whose schema mirrors the client-facing file exactly.
prep/pdp.py — Calls each fold's regressor .predict() over a grid sweep per feature (marginal PDP) or per PC axis. Uses prepare_model_input and MedianImputer from the production inference path to ensure the same feature preparation as the live model.
prep/residual_predictability.py — Loads train + test predictions for a fold, extracts fitted-trend values from the detrender, emits both ADM2 county rows and an area-weighted ADM0 row distinguished by a level column.

Cross-references¶

Domain entities consumed:

ExperimentResult — the top-level result object passed to PlotRunner; carries run_dir, config, and hindcast_slices.
HindcastSlice — one per-fold slice iterated in per_fold groups; exposes fold_label and per-fold artefact paths.
CommodityConfig — sourced via result.config; provides bushel_weight_lbs, actuals_source_label, reference_data, regression_feature_columns().

Concepts:

Walk-forward cross-validation — the fold structure that drives per_fold scoping in the registry.
Reference data (WASDE / CONAB) — the per-init release series joined by prep/rolling_forecast.py and visualised by the four rolling_forecast group plots.
Delivery CSV schema — the client-facing columnar format read directly by prep/delivery.py.

Sibling subsystems:

diagnostics/runners.py — provides yield_asof_array_from_releases consumed by prep/rolling_forecast.py.
lib/geo/aggregation.py — area-weighted ADM0 aggregation used in rolling-forecast and residual-predictability prep.
lib/unit_utils.py — kg_ha_to_bu_acre* helpers used extensively in plot functions.
lib/reference_data/ — NASS benchmark lookups and reference loaders consumed in prep and registry kwargs resolvers.

Relationships¶

generate_plots()          (__init__.py:21)
  └─ ExperimentResult.from_run_dir()
  └─ PlotRunner(result, output_dir)
       └─ run(only, skip)
            └─ get_plot_registry()          (registry.py:356)
                 └─ build_plot_registry()   (registry.py:77)
                      └─ PlotGroup(name, prepare_data, scope, plots)
                           └─ PlotSpec(name, plot_fn, filename_template, resolve_kwargs)
            └─ [per_fold] for fold in result.hindcast_slices:
                 group.prepare_data(result, fold) -> DataFrame
                 _run_spec(spec, df, fold_label)
            └─ [cross_fold]
                 group.prepare_data(result) -> DataFrame
                 _run_spec(spec, df)
            └─ _save_png(fig, dest)   (runner.py:37)
                 local:  .tmp.png -> rename (atomic)
                 cloud:  CloudPath.write_bytes(bytes)