Source: Streamlit Dashboard (`app/`)¶

Overview¶

The commodity-hindcast dashboard is a Streamlit web application that reads saved hindcast run artefacts directly from disk (no API layer) and presents five interactive chart sections for comparing Treefera seasonal-yield forecasts against USDA NASS actuals and WASDE in-season estimates.

Launched by hand from a developer machine:

uv run streamlit run market_insights_models/src/commodity_hindcast/app/app.py

The app is commodity-agnostic: it discovers available runs via a scanned directory, populates the sidebar selectors, and adapts every chart, metric card, and table to the selected commodity's phenology calendar. Four commodities are supported — corn, wheat, soybeans, cotton.

Key design facts:

Reads RunDir artefacts (delivery/Treefera_<commodity>_ADM0_Hindcast_*.csv) directly — no database or API intermediary.
PR #363 fixed three startup bugs (wrong _CONFIGS_DIR depth, renamed WasdeLoader API, missing data/ prefix in WASDE path).
PR #340 added window-aware MAPE cards that restrict the WASDE comparison to folds where WASDE actively forecasts (May+ for all four commodities), plus a configurable truth source selector and generic vintage-subset decision-point table.

Modules¶

`app.py` (~905 lines) — main entrypoint¶

Entry function: the module-level code is the Streamlit entrypoint; there is no main() guard.

Startup sequence:

st.set_page_config then inject_font_styles() (app.py:53–59).
_discover_available_commodities() (app.py:64–73) calls list_runs() and collects the set of commodity strings; returns [] on FileNotFoundError so the sidebar shows a graceful error rather than a traceback.
Sidebar: commodity selectbox (wheat default), truth-source selectbox, model filter text input + model selectbox (app.py:94–173).
_load_predictions(commodity) (app.py:79–89, @st.cache_data(ttl=60)) calls load_all_deliveries; returns None on FileNotFoundError.
load_results(selected_model, commodity) (app.py:181–211, @st.cache_data(ttl=60)) slices the concatenated frame to one model, strips retired folds, renames columns to canonical names (mean → treefera_forecast, wasde_in_season → wasde_estimate), looks up wasde_jan from load_wasde_jan_actuals, then calls score_hindcast.

Truth-source recomputation (app.py:237–253): score_hindcast scores against NASS by default. When the user selects a different truth source (e.g. wasde_jan), treefera_error and wasde_error are recomputed in-place against results["truth"] so every downstream metric and chart reflects the chosen benchmark.

WASDE-comparable fold filtering (app.py:289–320): only folds where WASDE actively publishes a new-crop forecast are included in MAPE headline cards and in WASDE-side bar charts. For all four commodities wasde_comparison_start_month = 5 (May). Fold positions are compared via schedule.fold_order so cross-year crops (winter wheat) correctly exclude the November/January dormancy folds even though their calendar months are ≥ 5.

Pre/post-survey split (app.py:323–333): folds with cutoff_doy ≤ wasde_survey_doy are "pre-survey" (before the USDA field survey). The dashboard renders two additional metric cards — Pre-Survey Edge (% error reduction vs WASDE) and Pre-Survey Wins (forecast windows where Treefera is closer to truth) — only when pre-survey folds exist in the data.

Chart sections rendered (in order):

Section	Builder	Description
Performance Summary	`metric_card`	Treefera MAPE, WASDE MAPE, Pre-Survey Edge, Pre-Survey Wins
Seasonal Accuracy	`build_fold_breakdown_chart`	Per-fold MAE grouped bars
Forecast Edge	`build_improvement_heatmap`	Year × fold heatmap (% of yield)
Advantage Over WASDE	`build_information_advantage_chart`	Signed advantage per fold, mean line
Forecast Evolution	`build_forecast_evolution_chart`	Continuous timeline with stars
Direction Correctness	`build_direction_correctness_chart`	Pre→harvest trajectory check

Expanders: Detailed Performance Comparison, Vintage Accuracy Table (Subset / Full tabs), Per-Year Accuracy Breakdown, Leave-One-Out Stability, Detailed Results.

Vintage Accuracy Table (app.py:527–632): derives eight "decision points" anchored to the commodity's growing season (pre-planting −30 d, planting, 20/40/60/80% through season, pre-harvest −14 d, harvest). Each anchor snaps to the nearest configured fold by absolute day distance using schedule.fold_to_init_date. Works correctly for cross-year crops by detecting gs_start_doy > gs_end_doy and offsetting the planting year by −1.

Leave-One-Out Stability (app.py:818–855): for each year, removes it from the WASDE-comparable rows and reports the change in overall MAE, labelling years as "hardest" (removing improves MAE) or "best" (removing raises MAE).

`run_loader.py` (~155 lines) — run discovery¶

RunDescriptor (run_loader.py:56–64): frozen dataclass holding commodity, timestamp, model, csv_path (Path), and run_dir (Path).

list_runs(runs_dir) (run_loader.py:102–137): scans HINDCAST_RUNS_DIR for directories matching the regex ^(\d{8}_\d{6})_([a-z]+)_yield_prediction$, locates the ADM0 delivery CSV in <run_dir>/delivery/, reads the model name from the first row, and returns a list of RunDescriptor sorted newest-first by timestamp. Directories with no matching CSV are skipped with a warning.

load_all_deliveries(commodity) (run_loader.py:140–155): concatenates every run for a commodity into one DataFrame, overriding the model column with the run directory name (e.g. 20260422_170936_corn_yield_prediction) so individual runs populate the "Select Model" picker.

load_delivery_csv(csv_path, commodity) (run_loader.py:26–48): parses init_date as datetime, optionally filters by commodity (case-insensitive), then derives a fold column via FoldSchedule.init_date_to_fold. The fold derivation uses season-DOY arithmetic rather than strftime("%m-%d") to avoid off-by-one drift on seasons that cross 29 February.

HINDCAST_RUNS_DIR (_dashboard_config.py:42–49): resolved at module import from $HINDCAST_RUNS_DIR (override) or require_input_data_dir() / "runs". Uses AnyPath so S3 paths are supported.

`_dashboard_config.py` (~319 lines) — display configuration¶

COMMODITY_CONFIG (_dashboard_config.py:127–192): dict keyed by lowercase commodity name. Per-commodity fields:

Field	Purpose
`display_name`	Human-readable name shown in UI titles
`region_label`	Spatial coverage label (e.g. "US Corn Belt")
`wasde_comparison_start_month`	First month WASDE publishes a new-crop forecast (May = 5 for all four)
`wasde_survey_doy`	DOY of the first NASS field survey (224 for corn/soybeans/cotton, 152 for wheat)
`first_survey_month`	Calendar month of first survey
`retired_folds`	Fold names excluded from comparison tables (corn: `{"03-01", "01-10"}`)
`yield_range`	`(lo, hi)` sourced from `CommodityConfig.yield_range` (canonical YAML, not duplicated)
`phenology_labels`	`{"MM-DD": "Mon — Stage"}` used for chart axis labels
`gs_start_doy`, `gs_end_doy`	Growing season bounds; `gs_end_doy < gs_start_doy` signals a cross-year crop
`wasde_final_at_reveal`	`True` for wheat — pins the WASDE-Jan gold star at the season reveal date, not mid-January of year+1

FoldSchedule (_dashboard_config.py:198–251): frozen dataclass. Provides:

fold_to_init_date(fold, season_year) -> str — ISO date for (fold, year), delegates to CommodityConfig.to_date(sdoy, season_year) for correct cross-year arithmetic.
init_date_to_fold(init_date, season_year) -> str — inverse lookup via season-DOY; raises KeyError if the date does not correspond to any configured fold.

build_fold_schedule(commodity) (_dashboard_config.py:261–298): constructs a FoldSchedule from the commodity's CommodityConfig.hindcast_init_season_doys. Fold labels are "MM-DD" derived from a fixed reference year (2024). Duplicate MM-DDs (rare) are collapsed to the first occurrence in season order.

_load_commodity_config(commodity) (_dashboard_config.py:64–75): @functools.lru_cache(maxsize=8). Parses only the commodity subtree of the experiment YAML, bypassing ExperimentConfig's BaseSettings machinery; the dashboard needs only calendar and metadata fields.

inject_font_styles() (_dashboard_config.py:304–319): injects Inter font CSS via st.markdown. Self-contained replacement for the previously required streamlit_applets_common sibling-repo dependency (removed in PR #363).

`charts.py` (~446 lines) — accuracy and comparison charts¶

All functions are pure: they accept a results-shaped DataFrame and keyword arguments, and return a go.Figure.

build_fold_breakdown_chart (charts.py:196–313): grouped bar chart of per-fold MAE (Treefera vs WASDE). Adapts to fold count: >12 folds → date x-axis with monthly ticks; ≤12 folds → categorical x-axis with phenology labels. WASDE bars are zeroed (NaN) for folds outside wasde_comparable_folds.

build_improvement_heatmap (charts.py:316–446): year × season-stage heatmap. Cell value is (wasde_error − treefera_error) / truth × 100 (% of yield). Green = Treefera closer; red = WASDE closer. Uses explicit categorical axes to prevent Plotly from interpreting year labels as dates. WASDE milestone vline positioned by _nearest_fold_index for categorical axes, or by _WASDE_VLINE_OFFSET-adjusted index for date axes.

build_information_advantage_chart (charts.py:32–193): signed advantage (WASDE |error| − Treefera |error|) per fold. Individual year traces in light grey behind a thick mean line. Green/red fill between mean and y=0, with interpolated zero-crossings computed by _split_fill.

`charts_evolution.py` (~409 lines) — evolution and direction charts¶

build_forecast_evolution_chart (charts_evolution.py:29–242): continuous multi-year timeline. Per-year traces:

Treefera forecast — solid green line.
WASDE in-season estimate — grey dotted diamonds.
WASDE-Jan final — gold star. For same-year crops, a trailing segment extends to mid-January of year+1. For cross-year crops (wasde_final_at_reveal = True, e.g. wheat), the star is pinned at the season's last fold date.
Truth — blue star at the last fold, drawn from the user-selected truth_col.

A "WASDE starts" dotted vline and annotation appear at the first non-NaN WASDE point when fold count >12.

build_direction_correctness_chart (charts_evolution.py:245–409): checks whether Treefera's forecast moves towards truth over the season. For each year: dashed black line from WASDE planting value to truth (the "required direction"), solid green Treefera trajectory, truth star. A year is "correct" when |t_last − truth| < |t_first − truth|. Returns (fig, n_correct, n_years).

`app_utils.py` (~176 lines) — UI helpers and re-exports¶

metric_card(label, value, status, risk_level) (app_utils.py:123–144): renders a self-contained HTML card with a large numeric value and a traffic-light dot (green/yellow/red). No external CSS dependency.

text_box(text) (app_utils.py:147–153): rounded grey panel for inline HTML narrative; used for pre/post-survey interpretation text.

apply_plotly_fonts(fig) (app_utils.py:159–170): applies Inter font at consistent sizes to all Plotly figure axes and legend.

MODEL_INFO (app_utils.py:36–114): static dict keyed by model name (matching the model column in delivery CSVs). Each entry has label, description, and change_from_b fields rendered in the info box above the metric cards. Covers Model A through Model B+ v6 and the 2F delivery variant.

Re-exports all chart builders from charts.py and charts_evolution.py, and constants/helpers from _chart_helpers.py, as a single import surface for app.py.

`_chart_helpers.py` (~272 lines) — shared chart utilities¶

wasde_milestones_doy(commodity) (_chart_helpers.py:62–65): returns a list of (doy, label, y_paper) tuples for WASDE milestone vlines. Currently one vline per commodity at wasde_survey_doy − _WASDE_VLINE_OFFSET. _WASDE_VLINE_OFFSET = 4 places the line between the last pre-survey bar and the first post-survey bar (half a weekly bar width).

fold_labels_for_data(folds, schedule, commodity) (_chart_helpers.py:123–141): returns (fold→label map, ordered labels, ordered fold names). For cross-year crops (wheat), delegates to _display_fold_order which rotates the fold list so the active growing season (March onwards) comes first in chart display order, with dormancy/final folds at the right edge.

fold_to_date(fold, year, schedule) (_chart_helpers.py:147–164): converts a fold name to dt.datetime for chart x-axes via FoldSchedule.fold_to_init_date. Returns None for unknown folds.

_split_fill(xs, ys) (_chart_helpers.py:240–272): splits a line series into positive and negative segments with interpolated zero-crossings, enabling Plotly's fill="tonexty" to produce correct green/red shading in the information-advantage chart.

_monthly_tick_config(data_folds, label_order, schedule) (_chart_helpers.py:209–237): computes (tick_vals, tick_text) for weekly-fold heatmaps, placing one month abbreviation label at the central fold of each month.

`_eval_shim.py` (~99 lines) — dashboard-side evaluation adapter¶

load_wasde_jan_actuals(commodity) (_eval_shim.py:49–72): loads the WASDE yield CSV via WasdeLoader(WasdeRefSpec(...)), filters to releases strictly before Feb 1 of harvest_year + 1, converts from kg/ha to bu/ac via kg_ha_to_bu_acre_series, and returns a pd.Series indexed by harvest year. Path: require_input_data_dir() / "data" / "wasde" / "wasde_{commodity}_us_yield.csv" (PR #363 added the data/ prefix).

score_hindcast(df) (_eval_shim.py:75–99): adds treefera_error, wasde_error, and improvement_pct columns (all vs nass_actual). The caller may overwrite treefera_error/wasde_error when a different truth source is selected; improvement_pct remains fixed against NASS as a stable reference in the Detailed Results table.

Cross-references¶

orchestration — CLI that writes the RunDir artefacts this dashboard reads
regression — models whose predictions populate delivery CSVs

Relationships¶

app.py
  ├── run_loader.py          (discovers RunDescriptors, loads delivery CSVs)
  ├── _dashboard_config.py   (COMMODITY_CONFIG, FoldSchedule, build_fold_schedule)
  ├── _eval_shim.py          (load_wasde_jan_actuals, score_hindcast)
  └── app_utils.py           (metric_card, text_box, apply_plotly_fonts, MODEL_INFO)
        ├── charts.py        (build_fold_breakdown_chart, build_improvement_heatmap,
        │                     build_information_advantage_chart)
        └── charts_evolution.py  (build_forecast_evolution_chart,
                                  build_direction_correctness_chart)
              └── _chart_helpers.py  (fold_to_date, wasde_milestones_doy,
                                      fold_labels_for_data, _split_fill, …)

External pipeline dependencies (read-only):

commodity_hindcast.config.CommodityConfig — yield range, init-date season-DOYs, bushel weight.
commodity_hindcast.lib.reference_data.wasde.WasdeLoader — WASDE CSV reader (ported in PR #363 from a now-deleted free function).
commodity_hindcast.lib.unit_utils.kg_ha_to_bu_acre_series — unit conversion.
$INPUT_DATA_DIR env var (via require_input_data_dir()) — locates WASDE CSVs and the runs directory.
$HINDCAST_RUNS_DIR env var (optional override) — points at an alternative runs tree.

Source: Streamlit Dashboard (app/)¶

Overview¶

Modules¶

app.py (~905 lines) — main entrypoint¶

run_loader.py (~155 lines) — run discovery¶

_dashboard_config.py (~319 lines) — display configuration¶

charts.py (~446 lines) — accuracy and comparison charts¶

charts_evolution.py (~409 lines) — evolution and direction charts¶

app_utils.py (~176 lines) — UI helpers and re-exports¶

_chart_helpers.py (~272 lines) — shared chart utilities¶

_eval_shim.py (~99 lines) — dashboard-side evaluation adapter¶