Source: TODO.md — Backlog¶

What it is¶

The in-package TODO.md is a mixed-format backlog tracking both completed ([x]) and open ([ ]) work items. It covers structural refactoring, cross-pipeline dependency violations, naming conventions, area-imputation consolidation, and new design decisions that emerged from the large agentic refactoring session (claude --resume 66016a63-22c3-4573-bd68-fc402782fd7b). Several items are verbatim design notes written during live sessions and have not yet been cleaned up.

In-progress items¶

Area imputation¶

[ ] Review whether area imputation at the delivery step is needed if it is already done at the feature step.
[ ] There should be ONE location that does the edit and imputation steps (currently duplicated).

Refactoring session (large orchestrator)¶

[ ] The refactoring big orchestrator session id: claude --resume 66016a63-22c3-4573-bd68-fc402782fd7b from /data/processing/github/tmi-tl-refactor-commodity-hindcast.

Conformal intervals¶

[ ] FIT conformal intervals using MAPIE not our hand-rolled implementation.

Naming and organisation¶

[ ] Rename reference_data/ to benchmarks/.
[ ] RENAME "rolling_preds or rolling_forecasts" etc. to something more precisely talking to the point-in-time / cross-validation exercise being undertaken.
[ ] Rename rolling_preds/rolling_forecasts — 136 occurrences untouched (duplicate of previous item).
[ ] Why is benchmark_grid nan for Treefera forecasts?
[ ] Move delivery.py into delivery/.
[ ] Structural TODOs in tfds/config.py:413,454 (why builders and dropna are in config).

ExperimentResult / slice design¶

[ ] Why can the slice not also contain POINTERS/loaders for their feature matrices? Slices are usable interfaces for finding artefacts created from a run in one place. A run_dir holds one ExperimentResult which can contain: hindcasts and artefacts (trained models predictions plots etc.), forecasts (predictions, plots etc.).

Plots consuming pipeline logic¶

[ ] Issues: plots/prep/* still re-runs pipeline logic — plots/prep/rolling_forecast.py and plots/prep/delivery.py remain open. Plots should consume artefacts, not recompute them.

Cross-pipeline dependency violations (STILL PRESENT — verbatim)¶

[ ] CROSS PIPELINE dependencies delivery/convert.py → postprocess (conformal_half_widths) — STILL PRESENT at convert.py:37-38.
[ ] CROSS PIPELINE dependencies delivery/convert.py → metrics (nass_national_* benchmarks) — STILL PRESENT at convert.py:33-36.
[ ] reporting/metrics_tables.py → metrics (gen_metrics) — STILL PRESENT at metrics_tables.py:23-24.
[ ] plots/prep/rolling_forecast.py → metrics, reporting (nass_*, aggregate_rolling_forecast_data_from_experiment) — STILL PRESENT at rolling_forecast.py:19-24.
[ ] plots/prep/delivery.py → delivery (HindcastDelivery, *_to_dataframe, schemas, transforms) — STILL PRESENT at delivery.py:16-24.
[ ] plots/registry.py → metrics (nass_national_survey_yield_area_weighted_kg_ha) — STILL PRESENT as late import at registry.py:49-50 inside _build_truth_obs_by_fold().

Deliver stage unification¶

[ ] T7 unified DAG: make deliver_experiment (steps/deliver.py:34) a single loop over hindcast_slices that includes production — currently branches on modes=("hindcast","forecast"); convert.py filters fold_label != "production" for hindcast and reads result.production separately for forecast.

Shared utilities / `AnyPath`¶

[ ] SHARING imports/info from .shared.utils #4 — treefera_market_insights.shared.utils unused — 15+ pd.read_parquet / .to_parquet sites, zero vfs_open usage.
[ ] SHARING imports/info from .shared.utils #5 — Bare pathlib.Path in 27+ files where AnyPath is needed for S3 support.
[ ] SHARING imports/info from .shared.utils #7 — main.py:114,117-120,141-142 mutates ExperimentConfig after construction and re-wraps AnyPath in Path. Root cause of #18.
[ ] SHARING imports/info from .shared.utils #8 — Business logic in __init__.py — steps/metrics/__init__.py (90 lines), steps/regression/__init__.py (83 lines), steps/plots/__init__.py (49 lines). Violates DESIGN.md line 51.
[ ] SHARING imports/info from .shared.utils #9 — Commodity-keyed dicts bypass CommodityConfig: YIELD_RANGE, COMMODITY_STRESS_VARIABLES, _KEY_TIMING_MONTHS. Scaffold (CommodityConfig + four YAMLs) is already in place; just needs 3 fields + rewire.

New design decisions (from live sessions)¶

[ ] run_predict(fold_label=...) doesn't eliminate branching — it relocates it from three call sites into _load_prediction_inputs. Worth a one-line acknowledgement in ruling 13.
[ ] Generalise run_predict for hindcast to not take fold_label but instead have season_year and init_date (same args as forecast).

Completed items¶

Data shape / tidy vs wide¶

[x] CONFIRM cross-module dependencies — they should be independent except for well-defined INTERFACES (e.g. ExperimentResults).
[x] CONVERT CANONICAL FORM TO WIDE FORMAT EXCEPT IN DELIVERY LAYER. Wide is canonical at the fold level because downstream consumers naturally work in wide: sim_yield_kg_ha * area_harvested_ha, groupby init_date.
[x] Rename rolling_preds_tidy_path → rolling_preds_path on FoldResult (experiment_result.py:39).
[x] Save wide-format parquet in experiment_runner.py (experiment_runner.py:80).
[x] load_rolling_preds() → wide DataFrame. Single loader, format not in the name (experiment_result.py:101).
[x] Move predictions_to_tidy(...) to the delivery step — resolved by deleting predictions_to_tidy entirely; delivery stays wide through to CSV.
[x] _tidy_to_wide/_wide_to_tidy — MOOT: both helpers deleted entirely, no tidy conversion anywhere, nothing left to centralise.
[x] Delete the predict_rolling re-call from reporting/rolling_forecast_metrics.py:82 — now consumes rolling_preds.parquet only.

Cross-pipeline dependency removals¶

[x] conformal_half_widths moved to postprocess/conformalise.py — import removed from predict/experiment.py.
[x] reporting/rolling_forecast_metrics.py is no longer an orchestrator — all three orchestration imports (predict, postprocess, regression.runtime) gone.
[x] metrics → postprocess and postprocess → predict directional inversions — both removed.
[x] metrics/benchmarks.py → postprocess (estimate_walk_forward_selection_bias_kg_ha) — import removed.
[x] postprocess/orchestrator.py → predict (postprocessed_to_tidy) — import removed.

Fold and stage artefacts¶

[x] T2: production fold writes full fold contract: models/{experiment_key}/production/{detrender.pkl, feature_fill_values.parquet, model.*}, preds/{experiment_key}/production/{rolling_preds.parquet, year_data.parquet}.
[x] T6: CLI has required --season-year INT and --init-date YYYY-MM-DD, no asof resolution (cli.py:178-188).
[x] T7: delete county_forecast.parquet / national_forecast.parquet as persisted artefacts — gone (only in-memory DataFrame field remains on ForecastResult).
[x] T7: predict_experiment runs PREDICT on the production fold writing rolling_preds.parquet + year_data.parquet (predict/experiment.py:326).

Plot organisation¶

[x] Nest per-fold plots under runs/{rundir}/reports/hindcast/; metrics_table.csv and stage5_metrics*.txt remain at reports/. PlotRunner._save_figures routes writes by fold_label; MLflow artifact_path aligns automatically via the recursive log_artifacts(reports, artifact_path="reports").

Location / naming cleanup¶

[x] Find shared code location for aggregation.py, geo_selection.py, missing_data_utils.py, units.py.
[x] convert.py _aggregate_to_level should come from aggregation.py.
[x] plots/detrended_scatter.py → predict import removed.
[x] Confirm cross dependencies between modules — they should be independent except for well-defined interfaces.
[x] Rename steps/ → stages/.
[x] Create stages/experiment/ with experiment.py, experiment_result.py, experiment_runner.py.
[x] Move detrend/ and regression/ into models/detrend/ and models/regression/.
[x] Rename reporting → diagnostics.
[x] require_input_data_dir() → config.py (documented home post-tfds/ deletion).
[x] Phase 12 cycle risk resolved: if TYPE_CHECKING: from ...meta_models.bias_correction import AbstractBiasCorrector pattern.
[x] save_included_geo_identifiers/load_included_geo_identifiers → methods on RunResult in lib/results/run_result.py.
[x] The forecast/ module is entirely thin wrappers — replaced with substantive logic.

Cross-pipeline dependency notes (verbatim)¶

These verbatim notes from the TODO are the most load-bearing signals for where layering violations remain:

delivery/conversions.py:23 imports diagnostics.metrics
delivery/convert.py → postprocess (conformal_half_widths) — STILL PRESENT at convert.py:37-38
delivery/convert.py → metrics (nass_national_* benchmarks) — STILL PRESENT at convert.py:33-36
reporting/metrics_tables.py → metrics (gen_metrics) — STILL PRESENT at metrics_tables.py:23-24
plots/prep/rolling_forecast.py → metrics, reporting — STILL PRESENT at rolling_forecast.py:19-24
plots/prep/delivery.py → delivery — STILL PRESENT at delivery.py:16-24
plots/registry.py → metrics — STILL PRESENT as late import at registry.py:49-50

What this document is NOT¶

TODO.md is not a design specification — DESIGN.md is. It is not a feature roadmap — experiments.md is that. It does not describe the domain model.

Cross-references¶

DESIGN.md — the rules that the open items violate or implement
experiments.md — model experiment ideas (separate from structural backlog)
in_package_DOMAIN_MODEL.md — entity model that drives the refactoring targets