Diagnostic Plots¶
Overview¶
The diagnostic plots subsystem implements a two-layer, declarative architecture under
market_insights_models/src/commodity_hindcast/diagnostics/plots/.
Layer 1 — Registry. PlotGroup and PlotSpec dataclasses (defined in registry.py)
are the atomic units. A PlotGroup groups one or more PlotSpec entries that share a
common data-preparation callable, so the I/O-heavy prepare_data call is made once per
group rather than once per plot. Each group has a scope of either "cross_fold" (runs
once on the full ExperimentResult) or "per_fold" (runs separately for every
HindcastSlice). The full registry of eight PlotGroup instances is assembled lazily by
build_plot_registry() (registry.py:77), which defers all function imports to avoid
circular dependencies. On first call, get_plot_registry() (registry.py:356) caches the
result in the module-level _PLOT_REGISTRY singleton.
Layer 2 — Runner. PlotRunner (runner.py:59) owns every I/O concern. Its run()
method iterates the registry, calls prepare_data for each group, then delegates to
_run_spec() per PlotSpec. _run_spec resolves dynamic kwargs via
spec.resolve_kwargs(result), calls the pure plot function spec.plot_fn(df, **kwargs),
and passes the returned Figure (or list[Figure]) to _save_figures(). Figures are
persisted as PNG via _save_png() (runner.py:37).
Persistence strategy. Local destinations use an atomic write — a .tmp.png is written
first, then renamed — to avoid partial files. Cloud destinations (s3://, gs://, etc.)
are handled by encoding the PNG to bytes in memory and calling
CloudPath.write_bytes(), because matplotlib cannot write directly to remote URIs and
rename would trigger superfluous HEAD requests. DPI is fixed at 150.
Output layout. Per-fold plots ({fold_label}_*.png) are saved under
run_dir/reports/hindcast/. Cross-fold plots (rolling_forecast.png,
improvement_heatmap.png, stage7*_*.png, etc.) are saved directly under
run_dir/reports/. The public entry point generate_plots() (__init__.py:21) loads an
ExperimentResult from disk, constructs PlotRunner, calls runner.run(), and returns the
list of saved paths.
Pure-function contract. Plot functions in fns/ never touch disk, never load config
objects, and never access ExperimentResult directly. They accept a pd.DataFrame plus
typed keyword arguments and return a matplotlib.figure.Figure (or
list[Figure] for chunked plots such as PDP feature-space). All I/O,
ExperimentResult access, and kwargs resolution live in the runner.
Spine¶
__init__.py¶
Exposes generate_plots(run_dir, *, only, skip) as the sole public entry point
(__init__.py:21). Internally constructs ExperimentResult.from_run_dir(run_dir),
canonicalises output_dir = result.run_dir / "reports", and delegates entirely to
PlotRunner. Individual plot groups can be included or excluded by name via only/skip.
runner.py — PlotRunner¶
| Symbol | Location | Role |
|---|---|---|
PlotRunner.__init__ |
runner.py:62 |
Stores ExperimentResult and AnyPath output directory |
PlotRunner.run |
runner.py:66 |
Iterates registry; dispatches per_fold / cross_fold branches |
PlotRunner._run_spec |
runner.py:111 |
Resolves kwargs, calls plot fn, delegates to _save_figures |
PlotRunner._save_figures |
runner.py:131 |
Derives filename from template; calls _save_png |
_save_png |
runner.py:37 |
Atomic local write or CloudPath.write_bytes for remote URIs |
registry.py — PlotSpec, PlotGroup, get_plot_registry¶
| Symbol | Location | Role |
|---|---|---|
PlotSpec |
registry.py:29 |
Frozen dataclass: name, plot_fn, filename_template, resolve_kwargs |
PlotGroup |
registry.py:39 |
Frozen dataclass: name, prepare_data, scope, plots: tuple[PlotSpec] |
build_plot_registry |
registry.py:77 |
Assembles the eight PlotGroup instances with deferred imports |
get_plot_registry |
registry.py:356 |
Lazily builds and caches _PLOT_REGISTRY; called by PlotRunner.run |
_build_truth_obs_by_fold |
registry.py:53 |
Extracts per-fold NASS survey yield (bu/ac) from ExperimentResult |
_primary_reference_name |
registry.py:65 |
Returns first cfg.reference_data spec name (default "wasde") |
Adding a new plot. Create a fns/<name>.py and optionally prep/<name>.py. Define a
PlotGroup in build_plot_registry() with one or more PlotSpec entries pointing to the
new functions. No other registration step is required — get_plot_registry() picks it up
on next call.
Plot inventory¶
| fn module | Plot fn(s) | Prep module | Scope | What it shows |
|---|---|---|---|---|
fns/rolling_forecast.py |
plot_rolling_forecast_v_reference |
prep/rolling_forecast.py |
cross_fold | One subplot per fold: Treefera forecast vs reference series (e.g. WASDE) vs truth (bu/acre), with planting/harvest date markers and MAE verdict per fold |
fns/improvement_heatmap.py |
plot_improvement_heatmap |
prep/rolling_forecast.py |
cross_fold | Year × init heatmap of (|e_REF| − |e_Treefera|) / truth × 100; green = Treefera closer, red = reference closer |
fns/information_advantage.py |
plot_information_advantage |
prep/rolling_forecast.py |
cross_fold | Mean signed advantage vs reference per init, overlaid per-year traces, green/red fill; shows how early in the season Treefera diverges favourably |
fns/benchmark_grid.py |
plot_benchmark_grid |
prep/rolling_forecast.py |
cross_fold | 4 × 3 MAE bar grid (timing columns: Planting/Mid-season/Harvest/End-of-season; rows: vs NASS area-weighted / vs NASS prod÷area / vs reference final); three bars per cell: Treefera, reference, Persistence |
fns/detrend.py |
plot_detrend_quality |
prep/detrend.py |
per_fold | Multi-panel detrend quality: national residual trend, per-county residual-slope histogram, residual-vs-trend scatter, plus per-state raw-yield/fitted-trend/residual rows for the top-N states by production |
fns/trend_evolution.py |
plot_trend_fit_grid, plot_trend_fit_detrended |
prep/trend_evolution.py |
per_fold | Small-multiples grid of 12 sampled counties + national panel showing observed yield + fitted trend (_grid) or detrended residuals (_detrended); in-sample vs OOS periods shaded |
fns/residual_predictability.py |
plot_residual_predictability |
prep/residual_predictability.py |
per_fold | Three-panel residual predictability audit at ADM2 or ADM0 level: LOO-by-year Ridge actual-vs-predicted, per-year residual boxplots, top-feature correlation scatters |
fns/scatter.py |
plot_train_test_scatter |
prep/scatter.py |
cross_fold | Per-fold train/test predicted-vs-observed scatter (county level, area-weighted); MAE, RMSE, R² annotated |
fns/detrended_scatter.py |
plot_detrended_forecast_feature_scatter |
prep/detrended_scatter.py |
cross_fold | Pooled feature-vs-detrended-forecast scatters ordered by |
fns/pdp.py |
plot_pdp_feature_space (returns list[Figure]) |
prep/pdp.py |
per_fold | Feature-space partial dependence curves chunked into groups of 5; per-fold traces + mean line |
fns/pdp.py |
plot_pdp_pc_space |
prep/pdp.py |
cross_fold | PC-space PDP curves for PCA–Ridge regressors; one panel per PC, fold-coloured lines; placeholder if non-PCA regressor |
fns/delivery.py |
plot_delivery_forecast (stage7a) |
prep/delivery.py |
cross_fold | Forecast + 50% CI band + reference in-season traces + truth actual, one subplot per delivery year |
fns/delivery.py |
plot_delivery_uncertainty (stage7b) |
prep/delivery.py |
cross_fold | CI halfwidths centred on zero per year, plus an overlay panel comparing all years at the narrowest CI band |
fns/delivery.py |
plot_delivery_reference_comparison (stage7c) |
prep/delivery.py |
cross_fold | CI bands + reference vs Treefera with MAE badge per year; supports multiple *_in_season reference columns |
fns/delivery.py |
plot_delivery_weather_correction (stage7d) |
prep/delivery.py |
cross_fold | Two-row grid: forecast evolution + CI bands (row 0) and weather_correction_bu_ac coloured bars with changepoint annotations (row 1) |
Note: plot_pdp_feature_space is bound to the "pdp" group (per_fold) and
plot_pdp_pc_space to the "pdp_pc_space" group (cross_fold); both share prep/pdp.py
but are separate PlotGroup instances because their prep functions differ
(prepare_pdp_df vs prepare_pdp_pc_df).
Prep module roles¶
Each prep module is a pure consumer of ExperimentResult (or ExperimentResult +
HindcastSlice for per-fold groups). They read persisted artefacts from disk — parquet
prediction files, fitted detrender states, delivery CSVs — and produce a tidy
pd.DataFrame that the corresponding plot function expects. No models are re-fitted, no
predictions are recomputed; the prep layer is READ-only (DESIGN.md §71).
Key prep patterns:
-
prep/rolling_forecast.py— Aggregates per-foldwalk_forward_preds.parquetto ADM0 area-weighted means vialib.geo.aggregation, joins per-init reference release values fromlib.reference_data.loader, and appends NASS benchmark constants. Shared by fourPlotSpecentries in therolling_forecastgroup. -
prep/delivery.py— Reads the persisted ADM0 hindcast delivery CSV (run_dir/delivery/Treefera_{experiment_key}_ADM0_Hindcast_*.csv) whose schema mirrors the client-facing file exactly. -
prep/pdp.py— Calls each fold's regressor.predict()over a grid sweep per feature (marginal PDP) or per PC axis. Usesprepare_model_inputandMedianImputerfrom the production inference path to ensure the same feature preparation as the live model. -
prep/residual_predictability.py— Loads train + test predictions for a fold, extracts fitted-trend values from the detrender, emits both ADM2 county rows and an area-weighted ADM0 row distinguished by alevelcolumn.
Cross-references¶
Domain entities consumed:
ExperimentResult— the top-level result object passed toPlotRunner; carriesrun_dir,config, andhindcast_slices.HindcastSlice— one per-fold slice iterated inper_foldgroups; exposesfold_labeland per-fold artefact paths.CommodityConfig— sourced viaresult.config; providesbushel_weight_lbs,actuals_source_label,reference_data,regression_feature_columns().
Concepts:
- Walk-forward cross-validation — the fold structure
that drives
per_foldscoping in the registry. - Reference data (WASDE / CONAB) — the per-init release series joined by
prep/rolling_forecast.pyand visualised by the fourrolling_forecastgroup plots. - Delivery CSV schema — the client-facing columnar format read directly by
prep/delivery.py.
Sibling subsystems:
diagnostics/runners.py— providesyield_asof_array_from_releasesconsumed byprep/rolling_forecast.py.lib/geo/aggregation.py— area-weighted ADM0 aggregation used in rolling-forecast and residual-predictability prep.lib/unit_utils.py—kg_ha_to_bu_acre*helpers used extensively in plot functions.lib/reference_data/— NASS benchmark lookups and reference loaders consumed in prep and registry kwargs resolvers.
Relationships¶
generate_plots() (__init__.py:21)
└─ ExperimentResult.from_run_dir()
└─ PlotRunner(result, output_dir)
└─ run(only, skip)
└─ get_plot_registry() (registry.py:356)
└─ build_plot_registry() (registry.py:77)
└─ PlotGroup(name, prepare_data, scope, plots)
└─ PlotSpec(name, plot_fn, filename_template, resolve_kwargs)
└─ [per_fold] for fold in result.hindcast_slices:
group.prepare_data(result, fold) -> DataFrame
_run_spec(spec, df, fold_label)
└─ [cross_fold]
group.prepare_data(result) -> DataFrame
_run_spec(spec, df)
└─ _save_png(fig, dest) (runner.py:37)
local: .tmp.png -> rename (atomic)
cloud: CloudPath.write_bytes(bytes)