Skip to content

Diagnostic Plots

Overview

The diagnostic plots subsystem implements a two-layer, declarative architecture under market_insights_models/src/commodity_hindcast/diagnostics/plots/.

Layer 1 — Registry. PlotGroup and PlotSpec dataclasses (defined in registry.py) are the atomic units. A PlotGroup groups one or more PlotSpec entries that share a common data-preparation callable, so the I/O-heavy prepare_data call is made once per group rather than once per plot. Each group has a scope of either "cross_fold" (runs once on the full ExperimentResult) or "per_fold" (runs separately for every HindcastSlice). The full registry of eight PlotGroup instances is assembled lazily by build_plot_registry() (registry.py:77), which defers all function imports to avoid circular dependencies. On first call, get_plot_registry() (registry.py:356) caches the result in the module-level _PLOT_REGISTRY singleton.

Layer 2 — Runner. PlotRunner (runner.py:59) owns every I/O concern. Its run() method iterates the registry, calls prepare_data for each group, then delegates to _run_spec() per PlotSpec. _run_spec resolves dynamic kwargs via spec.resolve_kwargs(result), calls the pure plot function spec.plot_fn(df, **kwargs), and passes the returned Figure (or list[Figure]) to _save_figures(). Figures are persisted as PNG via _save_png() (runner.py:37).

Persistence strategy. Local destinations use an atomic write — a .tmp.png is written first, then renamed — to avoid partial files. Cloud destinations (s3://, gs://, etc.) are handled by encoding the PNG to bytes in memory and calling CloudPath.write_bytes(), because matplotlib cannot write directly to remote URIs and rename would trigger superfluous HEAD requests. DPI is fixed at 150.

Output layout. Per-fold plots ({fold_label}_*.png) are saved under run_dir/reports/hindcast/. Cross-fold plots (rolling_forecast.png, improvement_heatmap.png, stage7*_*.png, etc.) are saved directly under run_dir/reports/. The public entry point generate_plots() (__init__.py:21) loads an ExperimentResult from disk, constructs PlotRunner, calls runner.run(), and returns the list of saved paths.

Pure-function contract. Plot functions in fns/ never touch disk, never load config objects, and never access ExperimentResult directly. They accept a pd.DataFrame plus typed keyword arguments and return a matplotlib.figure.Figure (or list[Figure] for chunked plots such as PDP feature-space). All I/O, ExperimentResult access, and kwargs resolution live in the runner.

Spine

__init__.py

Exposes generate_plots(run_dir, *, only, skip) as the sole public entry point (__init__.py:21). Internally constructs ExperimentResult.from_run_dir(run_dir), canonicalises output_dir = result.run_dir / "reports", and delegates entirely to PlotRunner. Individual plot groups can be included or excluded by name via only/skip.

runner.pyPlotRunner

Symbol Location Role
PlotRunner.__init__ runner.py:62 Stores ExperimentResult and AnyPath output directory
PlotRunner.run runner.py:66 Iterates registry; dispatches per_fold / cross_fold branches
PlotRunner._run_spec runner.py:111 Resolves kwargs, calls plot fn, delegates to _save_figures
PlotRunner._save_figures runner.py:131 Derives filename from template; calls _save_png
_save_png runner.py:37 Atomic local write or CloudPath.write_bytes for remote URIs

registry.pyPlotSpec, PlotGroup, get_plot_registry

Symbol Location Role
PlotSpec registry.py:29 Frozen dataclass: name, plot_fn, filename_template, resolve_kwargs
PlotGroup registry.py:39 Frozen dataclass: name, prepare_data, scope, plots: tuple[PlotSpec]
build_plot_registry registry.py:77 Assembles the eight PlotGroup instances with deferred imports
get_plot_registry registry.py:356 Lazily builds and caches _PLOT_REGISTRY; called by PlotRunner.run
_build_truth_obs_by_fold registry.py:53 Extracts per-fold NASS survey yield (bu/ac) from ExperimentResult
_primary_reference_name registry.py:65 Returns first cfg.reference_data spec name (default "wasde")

Adding a new plot. Create a fns/<name>.py and optionally prep/<name>.py. Define a PlotGroup in build_plot_registry() with one or more PlotSpec entries pointing to the new functions. No other registration step is required — get_plot_registry() picks it up on next call.

Plot inventory

fn module Plot fn(s) Prep module Scope What it shows
fns/rolling_forecast.py plot_rolling_forecast_v_reference prep/rolling_forecast.py cross_fold One subplot per fold: Treefera forecast vs reference series (e.g. WASDE) vs truth (bu/acre), with planting/harvest date markers and MAE verdict per fold
fns/improvement_heatmap.py plot_improvement_heatmap prep/rolling_forecast.py cross_fold Year × init heatmap of (|e_REF| − |e_Treefera|) / truth × 100; green = Treefera closer, red = reference closer
fns/information_advantage.py plot_information_advantage prep/rolling_forecast.py cross_fold Mean signed advantage vs reference per init, overlaid per-year traces, green/red fill; shows how early in the season Treefera diverges favourably
fns/benchmark_grid.py plot_benchmark_grid prep/rolling_forecast.py cross_fold 4 × 3 MAE bar grid (timing columns: Planting/Mid-season/Harvest/End-of-season; rows: vs NASS area-weighted / vs NASS prod÷area / vs reference final); three bars per cell: Treefera, reference, Persistence
fns/detrend.py plot_detrend_quality prep/detrend.py per_fold Multi-panel detrend quality: national residual trend, per-county residual-slope histogram, residual-vs-trend scatter, plus per-state raw-yield/fitted-trend/residual rows for the top-N states by production
fns/trend_evolution.py plot_trend_fit_grid, plot_trend_fit_detrended prep/trend_evolution.py per_fold Small-multiples grid of 12 sampled counties + national panel showing observed yield + fitted trend (_grid) or detrended residuals (_detrended); in-sample vs OOS periods shaded
fns/residual_predictability.py plot_residual_predictability prep/residual_predictability.py per_fold Three-panel residual predictability audit at ADM2 or ADM0 level: LOO-by-year Ridge actual-vs-predicted, per-year residual boxplots, top-feature correlation scatters
fns/scatter.py plot_train_test_scatter prep/scatter.py cross_fold Per-fold train/test predicted-vs-observed scatter (county level, area-weighted); MAE, RMSE, R² annotated
fns/detrended_scatter.py plot_detrended_forecast_feature_scatter prep/detrended_scatter.py cross_fold Pooled feature-vs-detrended-forecast scatters ordered by
fns/pdp.py plot_pdp_feature_space (returns list[Figure]) prep/pdp.py per_fold Feature-space partial dependence curves chunked into groups of 5; per-fold traces + mean line
fns/pdp.py plot_pdp_pc_space prep/pdp.py cross_fold PC-space PDP curves for PCA–Ridge regressors; one panel per PC, fold-coloured lines; placeholder if non-PCA regressor
fns/delivery.py plot_delivery_forecast (stage7a) prep/delivery.py cross_fold Forecast + 50% CI band + reference in-season traces + truth actual, one subplot per delivery year
fns/delivery.py plot_delivery_uncertainty (stage7b) prep/delivery.py cross_fold CI halfwidths centred on zero per year, plus an overlay panel comparing all years at the narrowest CI band
fns/delivery.py plot_delivery_reference_comparison (stage7c) prep/delivery.py cross_fold CI bands + reference vs Treefera with MAE badge per year; supports multiple *_in_season reference columns
fns/delivery.py plot_delivery_weather_correction (stage7d) prep/delivery.py cross_fold Two-row grid: forecast evolution + CI bands (row 0) and weather_correction_bu_ac coloured bars with changepoint annotations (row 1)

Note: plot_pdp_feature_space is bound to the "pdp" group (per_fold) and plot_pdp_pc_space to the "pdp_pc_space" group (cross_fold); both share prep/pdp.py but are separate PlotGroup instances because their prep functions differ (prepare_pdp_df vs prepare_pdp_pc_df).

Prep module roles

Each prep module is a pure consumer of ExperimentResult (or ExperimentResult + HindcastSlice for per-fold groups). They read persisted artefacts from disk — parquet prediction files, fitted detrender states, delivery CSVs — and produce a tidy pd.DataFrame that the corresponding plot function expects. No models are re-fitted, no predictions are recomputed; the prep layer is READ-only (DESIGN.md §71).

Key prep patterns:

  • prep/rolling_forecast.py — Aggregates per-fold walk_forward_preds.parquet to ADM0 area-weighted means via lib.geo.aggregation, joins per-init reference release values from lib.reference_data.loader, and appends NASS benchmark constants. Shared by four PlotSpec entries in the rolling_forecast group.

  • prep/delivery.py — Reads the persisted ADM0 hindcast delivery CSV (run_dir/delivery/Treefera_{experiment_key}_ADM0_Hindcast_*.csv) whose schema mirrors the client-facing file exactly.

  • prep/pdp.py — Calls each fold's regressor .predict() over a grid sweep per feature (marginal PDP) or per PC axis. Uses prepare_model_input and MedianImputer from the production inference path to ensure the same feature preparation as the live model.

  • prep/residual_predictability.py — Loads train + test predictions for a fold, extracts fitted-trend values from the detrender, emits both ADM2 county rows and an area-weighted ADM0 row distinguished by a level column.

Cross-references

Domain entities consumed:

  • ExperimentResult — the top-level result object passed to PlotRunner; carries run_dir, config, and hindcast_slices.
  • HindcastSlice — one per-fold slice iterated in per_fold groups; exposes fold_label and per-fold artefact paths.
  • CommodityConfig — sourced via result.config; provides bushel_weight_lbs, actuals_source_label, reference_data, regression_feature_columns().

Concepts:

  • Walk-forward cross-validation — the fold structure that drives per_fold scoping in the registry.
  • Reference data (WASDE / CONAB) — the per-init release series joined by prep/rolling_forecast.py and visualised by the four rolling_forecast group plots.
  • Delivery CSV schema — the client-facing columnar format read directly by prep/delivery.py.

Sibling subsystems:

  • diagnostics/runners.py — provides yield_asof_array_from_releases consumed by prep/rolling_forecast.py.
  • lib/geo/aggregation.py — area-weighted ADM0 aggregation used in rolling-forecast and residual-predictability prep.
  • lib/unit_utils.pykg_ha_to_bu_acre* helpers used extensively in plot functions.
  • lib/reference_data/ — NASS benchmark lookups and reference loaders consumed in prep and registry kwargs resolvers.

Relationships

generate_plots()          (__init__.py:21)
  └─ ExperimentResult.from_run_dir()
  └─ PlotRunner(result, output_dir)
       └─ run(only, skip)
            └─ get_plot_registry()          (registry.py:356)
                 └─ build_plot_registry()   (registry.py:77)
                      └─ PlotGroup(name, prepare_data, scope, plots)
                           └─ PlotSpec(name, plot_fn, filename_template, resolve_kwargs)
            └─ [per_fold] for fold in result.hindcast_slices:
                 group.prepare_data(result, fold) -> DataFrame
                 _run_spec(spec, df, fold_label)
            └─ [cross_fold]
                 group.prepare_data(result) -> DataFrame
                 _run_spec(spec, df)
            └─ _save_png(fig, dest)   (runner.py:37)
                 local:  .tmp.png -> rename (atomic)
                 cloud:  CloudPath.write_bytes(bytes)