Source: TODO.md — Backlog¶
What it is¶
The in-package TODO.md is a mixed-format backlog tracking both completed ([x]) and open ([ ]) work items. It covers structural refactoring, cross-pipeline dependency violations, naming conventions, area-imputation consolidation, and new design decisions that emerged from the large agentic refactoring session (claude --resume 66016a63-22c3-4573-bd68-fc402782fd7b). Several items are verbatim design notes written during live sessions and have not yet been cleaned up.
In-progress items¶
Area imputation¶
[ ]Review whether area imputation at the delivery step is needed if it is already done at the feature step.[ ]There should be ONE location that does the edit and imputation steps (currently duplicated).
Refactoring session (large orchestrator)¶
[ ]The refactoring big orchestrator session id:claude --resume 66016a63-22c3-4573-bd68-fc402782fd7b from /data/processing/github/tmi-tl-refactor-commodity-hindcast.
Conformal intervals¶
[ ]FIT conformal intervals using MAPIE not our hand-rolled implementation.
Naming and organisation¶
[ ]Renamereference_data/tobenchmarks/.[ ]RENAME "rolling_preds or rolling_forecasts" etc. to something more precisely talking to the point-in-time / cross-validation exercise being undertaken.[ ]Renamerolling_preds/rolling_forecasts— 136 occurrences untouched (duplicate of previous item).[ ]Why isbenchmark_gridnan for Treefera forecasts?[ ]Movedelivery.pyintodelivery/.[ ]Structural TODOs intfds/config.py:413,454(why builders and dropna are in config).
ExperimentResult / slice design¶
[ ]Why can the slice not also contain POINTERS/loaders for their feature matrices? Slices are usable interfaces for finding artefacts created from a run in one place.A run_dir holds one ExperimentResult which can contain: hindcasts and artefacts (trained models predictions plots etc.), forecasts (predictions, plots etc.).
Plots consuming pipeline logic¶
[ ]Issues:plots/prep/*still re-runs pipeline logic —plots/prep/rolling_forecast.pyandplots/prep/delivery.pyremain open. Plots should consume artefacts, not recompute them.
Cross-pipeline dependency violations (STILL PRESENT — verbatim)¶
[ ]CROSS PIPELINE dependenciesdelivery/convert.py→ postprocess (conformal_half_widths) — STILL PRESENT atconvert.py:37-38.[ ]CROSS PIPELINE dependenciesdelivery/convert.py→ metrics (nass_national_*benchmarks) — STILL PRESENT atconvert.py:33-36.[ ]reporting/metrics_tables.py→ metrics (gen_metrics) — STILL PRESENT atmetrics_tables.py:23-24.[ ]plots/prep/rolling_forecast.py→ metrics, reporting (nass_*,aggregate_rolling_forecast_data_from_experiment) — STILL PRESENT atrolling_forecast.py:19-24.[ ]plots/prep/delivery.py→ delivery (HindcastDelivery,*_to_dataframe, schemas, transforms) — STILL PRESENT atdelivery.py:16-24.[ ]plots/registry.py→ metrics (nass_national_survey_yield_area_weighted_kg_ha) — STILL PRESENT as late import atregistry.py:49-50inside_build_truth_obs_by_fold().
Deliver stage unification¶
[ ]T7 unified DAG: makedeliver_experiment(steps/deliver.py:34) a single loop overhindcast_slicesthat includes production — currently branches onmodes=("hindcast","forecast");convert.pyfiltersfold_label != "production"for hindcast and readsresult.productionseparately for forecast.
Shared utilities / AnyPath¶
[ ]SHARING imports/info from.shared.utils#4 —treefera_market_insights.shared.utilsunused — 15+pd.read_parquet/.to_parquetsites, zerovfs_openusage.[ ]SHARING imports/info from.shared.utils#5 — Barepathlib.Pathin 27+ files whereAnyPathis needed for S3 support.[ ]SHARING imports/info from.shared.utils#7 —main.py:114,117-120,141-142mutatesExperimentConfigafter construction and re-wrapsAnyPathinPath. Root cause of #18.[ ]SHARING imports/info from.shared.utils#8 — Business logic in__init__.py—steps/metrics/__init__.py(90 lines),steps/regression/__init__.py(83 lines),steps/plots/__init__.py(49 lines). Violates DESIGN.md line 51.[ ]SHARING imports/info from.shared.utils#9 — Commodity-keyed dicts bypassCommodityConfig:YIELD_RANGE,COMMODITY_STRESS_VARIABLES,_KEY_TIMING_MONTHS. Scaffold (CommodityConfig+ four YAMLs) is already in place; just needs 3 fields + rewire.
New design decisions (from live sessions)¶
[ ]run_predict(fold_label=...)doesn't eliminate branching — it relocates it from three call sites into_load_prediction_inputs. Worth a one-line acknowledgement in ruling 13.[ ]Generaliserun_predictfor hindcast to not takefold_labelbut instead haveseason_yearandinit_date(same args as forecast).
Completed items¶
Data shape / tidy vs wide¶
[x]CONFIRM cross-module dependencies — they should be independent except for well-defined INTERFACES (e.g.ExperimentResults).[x]CONVERT CANONICAL FORM TO WIDE FORMAT EXCEPT IN DELIVERY LAYER. Wide is canonical at the fold level because downstream consumers naturally work in wide:sim_yield_kg_ha * area_harvested_ha,groupby init_date.[x]Renamerolling_preds_tidy_path→rolling_preds_pathonFoldResult(experiment_result.py:39).[x]Save wide-format parquet inexperiment_runner.py(experiment_runner.py:80).[x]load_rolling_preds()→ wide DataFrame. Single loader, format not in the name (experiment_result.py:101).[x]Movepredictions_to_tidy(...)to the delivery step — resolved by deletingpredictions_to_tidyentirely; delivery stays wide through to CSV.[x]_tidy_to_wide/_wide_to_tidy— MOOT: both helpers deleted entirely, no tidy conversion anywhere, nothing left to centralise.[x]Delete thepredict_rollingre-call fromreporting/rolling_forecast_metrics.py:82— now consumesrolling_preds.parquetonly.
Cross-pipeline dependency removals¶
[x]conformal_half_widthsmoved topostprocess/conformalise.py— import removed frompredict/experiment.py.[x]reporting/rolling_forecast_metrics.pyis no longer an orchestrator — all three orchestration imports (predict, postprocess,regression.runtime) gone.[x]metrics → postprocessandpostprocess → predictdirectional inversions — both removed.[x]metrics/benchmarks.py→ postprocess (estimate_walk_forward_selection_bias_kg_ha) — import removed.[x]postprocess/orchestrator.py→ predict (postprocessed_to_tidy) — import removed.
Fold and stage artefacts¶
[x]T2: production fold writes full fold contract:models/{experiment_key}/production/{detrender.pkl, feature_fill_values.parquet, model.*},preds/{experiment_key}/production/{rolling_preds.parquet, year_data.parquet}.[x]T6: CLI has required--season-year INTand--init-date YYYY-MM-DD, no asof resolution (cli.py:178-188).[x]T7: deletecounty_forecast.parquet/national_forecast.parquetas persisted artefacts — gone (only in-memory DataFrame field remains onForecastResult).[x]T7:predict_experimentruns PREDICT on the production fold writingrolling_preds.parquet+year_data.parquet(predict/experiment.py:326).
Plot organisation¶
[x]Nest per-fold plots underruns/{rundir}/reports/hindcast/;metrics_table.csvandstage5_metrics*.txtremain atreports/.PlotRunner._save_figuresroutes writes byfold_label; MLflowartifact_pathaligns automatically via the recursivelog_artifacts(reports, artifact_path="reports").
Location / naming cleanup¶
[x]Find shared code location foraggregation.py,geo_selection.py,missing_data_utils.py,units.py.[x]convert.py_aggregate_to_levelshould come fromaggregation.py.[x]plots/detrended_scatter.py→ predict import removed.[x]Confirm cross dependencies between modules — they should be independent except for well-defined interfaces.[x]Renamesteps/→stages/.[x]Createstages/experiment/withexperiment.py,experiment_result.py,experiment_runner.py.[x]Movedetrend/andregression/intomodels/detrend/andmodels/regression/.[x]Renamereporting→diagnostics.[x]require_input_data_dir()→config.py(documented home post-tfds/deletion).[x]Phase 12 cycle risk resolved:if TYPE_CHECKING: from ...meta_models.bias_correction import AbstractBiasCorrectorpattern.[x]save_included_geo_identifiers/load_included_geo_identifiers→ methods onRunResultinlib/results/run_result.py.[x]The forecast/ module is entirely thin wrappers — replaced with substantive logic.
Cross-pipeline dependency notes (verbatim)¶
These verbatim notes from the TODO are the most load-bearing signals for where layering violations remain:
delivery/conversions.py:23 imports diagnostics.metrics
delivery/convert.py → postprocess (conformal_half_widths) — STILL PRESENT at convert.py:37-38
delivery/convert.py → metrics (nass_national_* benchmarks) — STILL PRESENT at convert.py:33-36
reporting/metrics_tables.py → metrics (gen_metrics) — STILL PRESENT at metrics_tables.py:23-24
plots/prep/rolling_forecast.py → metrics, reporting — STILL PRESENT at rolling_forecast.py:19-24
plots/prep/delivery.py → delivery — STILL PRESENT at delivery.py:16-24
plots/registry.py → metrics — STILL PRESENT as late import at registry.py:49-50
What this document is NOT¶
TODO.md is not a design specification — DESIGN.md is. It is not a feature roadmap — experiments.md is that. It does not describe the domain model.
Cross-references¶
- DESIGN.md — the rules that the open items violate or implement
- experiments.md — model experiment ideas (separate from structural backlog)
- in_package_DOMAIN_MODEL.md — entity model that drives the refactoring targets