Skip to content

Source: TODO.md — Backlog

What it is

The in-package TODO.md is a mixed-format backlog tracking both completed ([x]) and open ([ ]) work items. It covers structural refactoring, cross-pipeline dependency violations, naming conventions, area-imputation consolidation, and new design decisions that emerged from the large agentic refactoring session (claude --resume 66016a63-22c3-4573-bd68-fc402782fd7b). Several items are verbatim design notes written during live sessions and have not yet been cleaned up.

In-progress items

Area imputation

  • [ ] Review whether area imputation at the delivery step is needed if it is already done at the feature step.
  • [ ] There should be ONE location that does the edit and imputation steps (currently duplicated).

Refactoring session (large orchestrator)

  • [ ] The refactoring big orchestrator session id: claude --resume 66016a63-22c3-4573-bd68-fc402782fd7b from /data/processing/github/tmi-tl-refactor-commodity-hindcast.

Conformal intervals

  • [ ] FIT conformal intervals using MAPIE not our hand-rolled implementation.

Naming and organisation

  • [ ] Rename reference_data/ to benchmarks/.
  • [ ] RENAME "rolling_preds or rolling_forecasts" etc. to something more precisely talking to the point-in-time / cross-validation exercise being undertaken.
  • [ ] Rename rolling_preds/rolling_forecasts — 136 occurrences untouched (duplicate of previous item).
  • [ ] Why is benchmark_grid nan for Treefera forecasts?
  • [ ] Move delivery.py into delivery/.
  • [ ] Structural TODOs in tfds/config.py:413,454 (why builders and dropna are in config).

ExperimentResult / slice design

  • [ ] Why can the slice not also contain POINTERS/loaders for their feature matrices? Slices are usable interfaces for finding artefacts created from a run in one place. A run_dir holds one ExperimentResult which can contain: hindcasts and artefacts (trained models predictions plots etc.), forecasts (predictions, plots etc.).

Plots consuming pipeline logic

  • [ ] Issues: plots/prep/* still re-runs pipeline logic — plots/prep/rolling_forecast.py and plots/prep/delivery.py remain open. Plots should consume artefacts, not recompute them.

Cross-pipeline dependency violations (STILL PRESENT — verbatim)

  • [ ] CROSS PIPELINE dependencies delivery/convert.py → postprocess (conformal_half_widths) — STILL PRESENT at convert.py:37-38.
  • [ ] CROSS PIPELINE dependencies delivery/convert.py → metrics (nass_national_* benchmarks) — STILL PRESENT at convert.py:33-36.
  • [ ] reporting/metrics_tables.py → metrics (gen_metrics) — STILL PRESENT at metrics_tables.py:23-24.
  • [ ] plots/prep/rolling_forecast.py → metrics, reporting (nass_*, aggregate_rolling_forecast_data_from_experiment) — STILL PRESENT at rolling_forecast.py:19-24.
  • [ ] plots/prep/delivery.py → delivery (HindcastDelivery, *_to_dataframe, schemas, transforms) — STILL PRESENT at delivery.py:16-24.
  • [ ] plots/registry.py → metrics (nass_national_survey_yield_area_weighted_kg_ha) — STILL PRESENT as late import at registry.py:49-50 inside _build_truth_obs_by_fold().

Deliver stage unification

  • [ ] T7 unified DAG: make deliver_experiment (steps/deliver.py:34) a single loop over hindcast_slices that includes production — currently branches on modes=("hindcast","forecast"); convert.py filters fold_label != "production" for hindcast and reads result.production separately for forecast.

Shared utilities / AnyPath

  • [ ] SHARING imports/info from .shared.utils #4 — treefera_market_insights.shared.utils unused — 15+ pd.read_parquet / .to_parquet sites, zero vfs_open usage.
  • [ ] SHARING imports/info from .shared.utils #5 — Bare pathlib.Path in 27+ files where AnyPath is needed for S3 support.
  • [ ] SHARING imports/info from .shared.utils #7 — main.py:114,117-120,141-142 mutates ExperimentConfig after construction and re-wraps AnyPath in Path. Root cause of #18.
  • [ ] SHARING imports/info from .shared.utils #8 — Business logic in __init__.pysteps/metrics/__init__.py (90 lines), steps/regression/__init__.py (83 lines), steps/plots/__init__.py (49 lines). Violates DESIGN.md line 51.
  • [ ] SHARING imports/info from .shared.utils #9 — Commodity-keyed dicts bypass CommodityConfig: YIELD_RANGE, COMMODITY_STRESS_VARIABLES, _KEY_TIMING_MONTHS. Scaffold (CommodityConfig + four YAMLs) is already in place; just needs 3 fields + rewire.

New design decisions (from live sessions)

  • [ ] run_predict(fold_label=...) doesn't eliminate branching — it relocates it from three call sites into _load_prediction_inputs. Worth a one-line acknowledgement in ruling 13.
  • [ ] Generalise run_predict for hindcast to not take fold_label but instead have season_year and init_date (same args as forecast).

Completed items

Data shape / tidy vs wide

  • [x] CONFIRM cross-module dependencies — they should be independent except for well-defined INTERFACES (e.g. ExperimentResults).
  • [x] CONVERT CANONICAL FORM TO WIDE FORMAT EXCEPT IN DELIVERY LAYER. Wide is canonical at the fold level because downstream consumers naturally work in wide: sim_yield_kg_ha * area_harvested_ha, groupby init_date.
  • [x] Rename rolling_preds_tidy_pathrolling_preds_path on FoldResult (experiment_result.py:39).
  • [x] Save wide-format parquet in experiment_runner.py (experiment_runner.py:80).
  • [x] load_rolling_preds() → wide DataFrame. Single loader, format not in the name (experiment_result.py:101).
  • [x] Move predictions_to_tidy(...) to the delivery step — resolved by deleting predictions_to_tidy entirely; delivery stays wide through to CSV.
  • [x] _tidy_to_wide/_wide_to_tidy — MOOT: both helpers deleted entirely, no tidy conversion anywhere, nothing left to centralise.
  • [x] Delete the predict_rolling re-call from reporting/rolling_forecast_metrics.py:82 — now consumes rolling_preds.parquet only.

Cross-pipeline dependency removals

  • [x] conformal_half_widths moved to postprocess/conformalise.py — import removed from predict/experiment.py.
  • [x] reporting/rolling_forecast_metrics.py is no longer an orchestrator — all three orchestration imports (predict, postprocess, regression.runtime) gone.
  • [x] metrics → postprocess and postprocess → predict directional inversions — both removed.
  • [x] metrics/benchmarks.py → postprocess (estimate_walk_forward_selection_bias_kg_ha) — import removed.
  • [x] postprocess/orchestrator.py → predict (postprocessed_to_tidy) — import removed.

Fold and stage artefacts

  • [x] T2: production fold writes full fold contract: models/{experiment_key}/production/{detrender.pkl, feature_fill_values.parquet, model.*}, preds/{experiment_key}/production/{rolling_preds.parquet, year_data.parquet}.
  • [x] T6: CLI has required --season-year INT and --init-date YYYY-MM-DD, no asof resolution (cli.py:178-188).
  • [x] T7: delete county_forecast.parquet / national_forecast.parquet as persisted artefacts — gone (only in-memory DataFrame field remains on ForecastResult).
  • [x] T7: predict_experiment runs PREDICT on the production fold writing rolling_preds.parquet + year_data.parquet (predict/experiment.py:326).

Plot organisation

  • [x] Nest per-fold plots under runs/{rundir}/reports/hindcast/; metrics_table.csv and stage5_metrics*.txt remain at reports/. PlotRunner._save_figures routes writes by fold_label; MLflow artifact_path aligns automatically via the recursive log_artifacts(reports, artifact_path="reports").

Location / naming cleanup

  • [x] Find shared code location for aggregation.py, geo_selection.py, missing_data_utils.py, units.py.
  • [x] convert.py _aggregate_to_level should come from aggregation.py.
  • [x] plots/detrended_scatter.py → predict import removed.
  • [x] Confirm cross dependencies between modules — they should be independent except for well-defined interfaces.
  • [x] Rename steps/stages/.
  • [x] Create stages/experiment/ with experiment.py, experiment_result.py, experiment_runner.py.
  • [x] Move detrend/ and regression/ into models/detrend/ and models/regression/.
  • [x] Rename reportingdiagnostics.
  • [x] require_input_data_dir()config.py (documented home post-tfds/ deletion).
  • [x] Phase 12 cycle risk resolved: if TYPE_CHECKING: from ...meta_models.bias_correction import AbstractBiasCorrector pattern.
  • [x] save_included_geo_identifiers/load_included_geo_identifiers → methods on RunResult in lib/results/run_result.py.
  • [x] The forecast/ module is entirely thin wrappers — replaced with substantive logic.

Cross-pipeline dependency notes (verbatim)

These verbatim notes from the TODO are the most load-bearing signals for where layering violations remain:

delivery/conversions.py:23 imports diagnostics.metrics
delivery/convert.py → postprocess (conformal_half_widths) — STILL PRESENT at convert.py:37-38
delivery/convert.py → metrics (nass_national_* benchmarks) — STILL PRESENT at convert.py:33-36
reporting/metrics_tables.py → metrics (gen_metrics) — STILL PRESENT at metrics_tables.py:23-24
plots/prep/rolling_forecast.py → metrics, reporting — STILL PRESENT at rolling_forecast.py:19-24
plots/prep/delivery.py → delivery — STILL PRESENT at delivery.py:16-24
plots/registry.py → metrics — STILL PRESENT as late import at registry.py:49-50

What this document is NOT

TODO.md is not a design specification — DESIGN.md is. It is not a feature roadmap — experiments.md is that. It does not describe the domain model.

Cross-references