Pipeline: Dashboard¶

Purpose¶

The commodity-hindcast dashboard is a Streamlit web application that reads saved hindcast run artefacts directly from disk (no API layer, no database) and renders six interactive chart sections comparing Treefera seasonal-yield forecasts against USDA NASS actuals and WASDE/CONAB in-season estimates. It supports four commodities (corn, wheat, soybeans, cotton) and adapts all charts, metric cards, and fold labels to the selected commodity's phenology calendar. It is a read-only consumer of the delivery pipeline outputs.

Launch command:

uv run streamlit run market_insights_models/src/commodity_hindcast/app/app.py

Inputs¶

Artefact	Source	Reader
`delivery/Treefera_{commodity}_ADM0_Hindcast_*.csv`	Each run_dir	`run_loader.load_all_deliveries`
`config_resolved.yaml`	Each run_dir	`_dashboard_config._load_commodity_config` (only `commodity` subtree)
WASDE CSV	`$INPUT_DATA_DIR/data/wasde/wasde_{commodity}_us_yield.csv`	`_eval_shim.load_wasde_jan_actuals`
`$HINDCAST_RUNS_DIR` or `$INPUT_DATA_DIR/runs/`	env var	`_dashboard_config._dashboard_config.py:42–49`

No write paths. The dashboard never mutates run_dir artefacts.

Key Modules¶

Module	Lines	Responsibility
`app.py`	~905	Main Streamlit entrypoint; sidebar, chart dispatch, metric cards
`run_loader.py`	~155	Run discovery (`list_runs`), CSV loading, `RunDescriptor`
`_dashboard_config.py`	~319	`COMMODITY_CONFIG`, `FoldSchedule`, font injection
`_eval_shim.py`	~99	`load_wasde_jan_actuals`, `score_hindcast`
`charts.py`	~446	Accuracy, heatmap, information-advantage Plotly figures
`charts_evolution.py`	~409	Forecast evolution timeline, direction-correctness chart
`app_utils.py`	~176	`metric_card`, `text_box`, `apply_plotly_fonts`, `MODEL_INFO`
`_chart_helpers.py`	~272	Shared fold-label, tick, fill utilities

Step-by-step¶

1. Startup¶

st.set_page_config + inject_font_styles() (app.py:53–59) — Inter font injected via inline CSS; no external streamlit_applets_common dependency (removed PR #363).
_discover_available_commodities() (app.py:64–73) — calls list_runs(HINDCAST_RUNS_DIR), collects the commodity strings. Returns [] on FileNotFoundError so the sidebar shows a graceful error rather than a traceback.
Sidebar rendered: commodity selectbox (wheat default), truth-source selectbox, model filter text input + model selectbox.

2. Run discovery — `run_loader.py`¶

RunDescriptor (run_loader.py:56–64): frozen dataclass holding commodity, timestamp, model, csv_path (Path), and run_dir (Path).

list_runs(runs_dir) (run_loader.py:102–137) scans HINDCAST_RUNS_DIR for directories matching ^(\d{8}_\d{6})_([a-z]+)_yield_prediction$, locates the ADM0 delivery CSV in <run_dir>/delivery/, reads the model name from the first row, and returns list[RunDescriptor] sorted newest-first by timestamp.

load_all_deliveries(commodity) (run_loader.py:140–155) concatenates every run for a commodity into one DataFrame, overriding the model column with the run directory name so individual runs populate the "Select Model" picker.

load_delivery_csv(csv_path, commodity) (run_loader.py:26–48) parses init_date as datetime and derives a fold column via FoldSchedule.init_date_to_fold. The fold derivation uses season-DOY arithmetic, not strftime, to avoid off-by-one drift on seasons that cross 29 February.

3. `COMMODITY_CONFIG` and `FoldSchedule`¶

COMMODITY_CONFIG (_dashboard_config.py:127–192) is a dict keyed by lowercase commodity name. Per-commodity fields include display_name, region_label, wasde_comparison_start_month (May = 5 for all four), wasde_survey_doy, retired_folds, phenology_labels, gs_start_doy, gs_end_doy, and wasde_final_at_reveal. The yield_range field is sourced from the canonical YAML CommodityConfig.yield_range and not duplicated.

FoldSchedule (_dashboard_config.py:198–251) provides fold_to_init_date and init_date_to_fold using CommodityConfig.to_date(sdoy, season_year) for correct cross-year arithmetic. build_fold_schedule(commodity) constructs a FoldSchedule from CommodityConfig.hindcast_init_season_doys, labelling folds as "MM-DD" derived from a fixed reference year (2024).

_load_commodity_config(commodity) (_dashboard_config.py:64–75) is @functools.lru_cache(maxsize=8). It parses only the commodity subtree of the experiment YAML, bypassing ExperimentConfig's BaseSettings machinery — the dashboard needs only calendar and metadata fields, not the full pipeline configuration.

4. Window-aware MAPE (PR #340)¶

score_hindcast(df) (_eval_shim.py:75–99) adds treefera_error, wasde_error, and improvement_pct columns. treefera_error and wasde_error are recomputed against results["truth"] when the user selects a different truth source.

WASDE-comparable fold filtering (app.py:289–320): only folds where WASDE actively publishes a new-crop forecast are included in MAPE headline cards. For all four commodities wasde_comparison_start_month = 5 (May). Fold positions are compared via schedule.fold_order — cross-year crops (winter wheat) correctly exclude November/January dormancy folds even though their calendar months are ≥ 5.

Pre/post-survey split (app.py:323–333): folds with cutoff_doy ≤ wasde_survey_doy are "pre-survey". Two additional metric cards appear — Pre-Survey Edge (% error reduction vs WASDE) and Pre-Survey Wins (forecast windows where Treefera is closer to truth) — only when pre-survey folds exist in the data.

5. Chart sections¶

Section	Builder	Description
Performance Summary	`metric_card` (`app_utils.py:123`)	Treefera MAPE, WASDE MAPE, Pre-Survey Edge, Pre-Survey Wins
Seasonal Accuracy	`build_fold_breakdown_chart` (`charts.py:196`)	Per-fold MAE grouped bars
Forecast Edge	`build_improvement_heatmap` (`charts.py:316`)	Year × fold heatmap (% of yield); green = Treefera closer
Advantage Over WASDE	`build_information_advantage_chart` (`charts.py:32`)	Signed advantage per fold, mean line, green/red fill
Forecast Evolution	`build_forecast_evolution_chart` (`charts_evolution.py:29`)	Continuous timeline with truth stars and WASDE-Jan gold star
Direction Correctness	`build_direction_correctness_chart` (`charts_evolution.py:245`)	Pre→harvest trajectory: does Treefera move towards truth?

Expanders (not in the main render path): Detailed Performance Comparison, Vintage Accuracy Table (Subset / Full tabs), Per-Year Accuracy Breakdown, Leave-One-Out Stability, Detailed Results.

Vintage Accuracy Table (app.py:527–632): derives eight decision-point anchors (pre-planting −30 d, planting, 20/40/60/80% through season, pre-harvest −14 d, harvest), each snapped to the nearest configured fold via schedule.fold_to_init_date. Cross-year crops are handled by detecting gs_start_doy > gs_end_doy and offsetting the planting year by −1.

Leave-One-Out Stability (app.py:818–855): removes each year from the WASDE-comparable rows and reports the change in overall MAE, labelling years as "hardest" or "best".

Mermaid Flow¶

flowchart TD
    ENV["$HINDCAST_RUNS_DIR\nor $INPUT_DATA_DIR/runs/"]
    LR["list_runs(runs_dir)\nrun_loader.py:102\n→ list[RunDescriptor]"]
    LAD["load_all_deliveries(commodity)\nrun_loader.py:140\n→ concat DataFrame"]
    LCC["_load_commodity_config(commodity)\n_dashboard_config.py:64\n(lru_cached, YAML commodity subtree only)"]
    FS["build_fold_schedule(commodity)\n_dashboard_config.py:261\n→ FoldSchedule"]
    SCORE["score_hindcast(df)\n_eval_shim.py:75\ntreefera_error, wasde_error, improvement_pct"]
    WFILT["WASDE-comparable fold filter\napp.py:289\nwasde_comparison_start_month=5"]
    CARDS["metric_card()\napp_utils.py:123\nTreefera MAPE / WASDE MAPE\nPre-Survey Edge / Wins"]
    CHARTS["Chart sections × 6\ncharts.py + charts_evolution.py\nfold_breakdown | heatmap |\ninformation_advantage |\nevolution | direction"]
    EXP["Expanders\nVintage Accuracy Table\nLeave-One-Out Stability\nDetailed Results"]
    RUNS["run_dir/delivery/\nTreefera_*_ADM0_Hindcast_*.csv\n(read-only)"]

    ENV --> LR
    RUNS --> LR
    LR --> LAD
    LCC --> FS
    LAD --> SCORE
    FS --> SCORE
    SCORE --> WFILT
    WFILT --> CARDS
    WFILT --> CHARTS
    CHARTS --> EXP

Invariants¶

The dashboard never writes to run_dir. All paths are read-only.
_load_commodity_config is cached; it never reloads the YAML within a session unless the cache is evicted (maxsize=8 covers all four commodities).
load_all_deliveries reads only ADM0 CSVs. ADM1 and ADM2 files are present in delivery/ but ignored by the dashboard.
MAPE headline cards restrict to wasde_comparison_start_month=5 folds for all four commodities. Including pre-May folds (no WASDE estimate) would inflate MAPE for WASDE but is meaningless for comparison.
fold_to_init_date and init_date_to_fold use season-DOY arithmetic, not strftime, to avoid February-29 drift on crops that span a calendar year boundary.
weather_correction_bu_ac and improvement_pct reference NASS as the stable benchmark regardless of the user's truth-source selection. The user-selectable truth source only affects treefera_error and wasde_error.

Failure Modes¶

FileNotFoundError on startup (app.py:65–73): HINDCAST_RUNS_DIR does not exist or is empty. The sidebar shows a graceful error rather than a traceback. Fix: set $HINDCAST_RUNS_DIR to the runs root.
No ADM0 CSV in delivery/: list_runs skips the run with a warning log. The run will not appear in the model picker.
WASDE CSV missing (_eval_shim.py:49–72): load_wasde_jan_actuals raises FileNotFoundError. The gold-star WASDE-Jan point will be absent from the forecast evolution chart. Fix: ensure $INPUT_DATA_DIR/data/wasde/wasde_{commodity}_us_yield.csv exists.
Cross-year crop fold ordering: _display_fold_order in _chart_helpers.py rotates the fold list so the active growing season (March+) comes first in chart display order. If gs_start_doy > gs_end_doy is misconfigured, dormancy folds will appear in the wrong chart position.
PR #363 startup regression: three startup bugs were fixed in PR #363 (wrong _CONFIGS_DIR depth, renamed WasdeLoader API, missing data/ prefix in WASDE path). Reverting any of these three changes will break startup.
@st.cache_data(ttl=60) staleness: _load_predictions and load_results are cached with a 60-second TTL. A newly completed hindcast run will not appear in the model picker until the cache expires.

Cross-references¶

dashboard.md — full source-level detail for all app/ modules
deliver.md — writes the Treefera_*_ADM0_Hindcast_*.csv files this dashboard reads
evaluate.md — produces metrics_table.csv and text reports; dashboard does not read these directly

PRs¶

PR #363 — fixed three startup bugs; removed streamlit_applets_common dependency; ported WasdeLoader API.
PR #340 — added window-aware MAPE cards restricting WASDE comparison to May+ folds; added configurable truth-source selector; added generic vintage-subset decision-point table.