ADR-005: fit_production is a separate orchestrator entrypoint¶
Status: Accepted (retroactively documented 2026-05-08)
Date: 2026-04-28 (introduced in commit 643d548e, "Phase 7.1 — create hindcast.py orchestrator")
Authors: [PLACEHOLDER]
Context¶
The forecast pipeline needs a freshly trained production model on demand —
e.g. when an existing hindcast run_dir is reused but the production fold
must be retrained against newer feature parquets. Running the full
stages/run_hindcast.py:197 run() orchestrator to obtain only that
artefact is wasteful: walk-forward CV across past seasons, scenarios,
postprocess, deliver, and evaluate are all skippable when only the
production-fold model is required (wiki/commodity_hindcast/sources/code/stages.md:284).
The shared inner helper _run_production_fit_phase(config, fit_data)
(stages/run_hindcast.py:191) is private — it does not mint a run_root,
does not wrap the call in MLflow, and does not run preflight. Exposing it
directly to forecast callers would force them to duplicate that scaffolding.
Naive alternatives:
- a production_only=True flag on run() would gate large branches with
a boolean and obscure intent;
- a forecast-side reimplementation would diverge from the hindcast layout
contract that downstream consumers (e.g. ExperimentResult.from_run_dir
at stages/run_hindcast.py:161) depend on.
Decision¶
fit_production(config_path) is its own top-level orchestrator at
stages/run_hindcast.py:239, sibling to run() at stages/run_hindcast.py:197.
It reuses the same scaffolding helpers as run():
- _prepare_config (stages/run_hindcast.py:68) — config + workdir resolution;
- _create_run_root (stages/run_hindcast.py:81) — fresh timestamped run_root;
- run_preflight(preflight_paths_for_hindcast(config))
(run/preflight.py:59) — same fit/pred parquet preconditions;
- prepare_hindcast_mlflow + hindcast_mlflow_run for tracking;
- _run_production_fit_phase (stages/run_hindcast.py:191) — the shared
production fit;
- _persist_included (stages/run_hindcast.py:158) — the included-geo set
consumed downstream.
It deliberately omits _run_walk_forward_phase, investigate_experiment,
postprocess_experiment, deliver_experiment, and evaluate_experiment
(see the diff at stages/run_hindcast.py:223-234 vs the
fit_production body at stages/run_hindcast.py:262-265).
_load_and_preprocess accepts load_references=False
(stages/run_hindcast.py:108) so fit_production skips the reference
dispatch tables built by build_references_by_harvest_year — those are
evaluation-only inputs that the production fit never consumes
(stages/run_hindcast.py:117-119, stages/run_hindcast.py:154).
The CLI surface mirrors the split: cli run hindcast calls run(),
cli run fit-production calls fit_production (cli.py:334,
cli.py:347).
Consequences¶
Positive¶
- Forecast callers can mint a fresh production model without paying for
walk-forward, postprocess, evaluate, or deliver
(
wiki/commodity_hindcast/sources/code/stages.md:284). - The shared internal helper
_run_production_fit_phasestays private — external callers go through an entrypoint that mintsrun_root, runs preflight, and wraps the call in MLflow. - Skipping reference loaders (
load_references=False) saves I/O on the forecast critical path. - Both entrypoints return the same
run_rootshape, so downstream code (forecast.run,ExperimentResult.from_run_dir) cannot tell which one produced the directory.
Negative¶
- Two orchestrator entrypoints to keep in sync. Any change to preflight,
run_rootlayout, or MLflow wrapping must be applied to both. - A reader has to know which entrypoint they invoked:
run_roots minted byfit_productioncontain onlymodels/<key>/production/artefacts (no fold predictions, no postprocessed national frame, no delivery CSVs, no evaluation reports). - The
load_references=Falsebranch is a quiet optimisation that future contributors may overlook when adding new production-only paths.
Alternatives considered¶
- Single
run()with aproduction_only=Trueflag: rejected — flag-driven branches across seven phases mask intent and grow conditional debt at every stage call site. - Expose
_run_production_fit_phasedirectly: rejected — leaks a private helper, forces every caller to re-implement preflight,run_rootcreation, and MLflow wrapping. - Make the forecast pipeline mint its own
run_rootand call the inner helper itself: rejected — forecast must reuse the hindcast directory layout contract thatExperimentResultand downstream consumers rely on; duplicating it courts drift.
Verification¶
cli run fit-productionsmoke test: invoke against a corn/soy config and assert thatrun_rootcontainsmodels/<experiment_key>/production/but nopreds/<experiment_key>/walk_forward_preds.parquet, nopostprocessed/, nodelivery/, noreports/.- The two entrypoints share
preflight_paths_for_hindcast(run/preflight.py:59); preflight's docstring already names both consumers, so any new precondition added there propagates by construction. - [PLACEHOLDER: cite tests/ — no targeted regression test for the
fit_productionentrypoint at the time of writing.]
References¶
market_insights_models/src/commodity_hindcast/stages/run_hindcast.py:197—run()(full orchestrator)market_insights_models/src/commodity_hindcast/stages/run_hindcast.py:239—fit_production()(production-only entrypoint)market_insights_models/src/commodity_hindcast/stages/run_hindcast.py:191—_run_production_fit_phase(shared helper)market_insights_models/src/commodity_hindcast/stages/run_hindcast.py:105—_load_and_preprocess(..., load_references=False)market_insights_models/src/commodity_hindcast/stages/run_hindcast.py:158—_persist_includedmarket_insights_models/src/commodity_hindcast/run/preflight.py:59—preflight_paths_for_hindcast(shared)market_insights_models/src/commodity_hindcast/cli.py:334—cli run fit-productioncommandwiki/commodity_hindcast/pipelines/hindcast.md— production fit phase narrativewiki/commodity_hindcast/sources/code/stages.md:282—fit_productionfast-path description- Commit
643d548e(2026-04-28) — "Phase 7.1 — create hindcast.py orchestrator" introducesfit_production