Skip to content

ADR-005: fit_production is a separate orchestrator entrypoint

Status: Accepted (retroactively documented 2026-05-08) Date: 2026-04-28 (introduced in commit 643d548e, "Phase 7.1 — create hindcast.py orchestrator") Authors: [PLACEHOLDER]

Context

The forecast pipeline needs a freshly trained production model on demand — e.g. when an existing hindcast run_dir is reused but the production fold must be retrained against newer feature parquets. Running the full stages/run_hindcast.py:197 run() orchestrator to obtain only that artefact is wasteful: walk-forward CV across past seasons, scenarios, postprocess, deliver, and evaluate are all skippable when only the production-fold model is required (wiki/commodity_hindcast/sources/code/stages.md:284).

The shared inner helper _run_production_fit_phase(config, fit_data) (stages/run_hindcast.py:191) is private — it does not mint a run_root, does not wrap the call in MLflow, and does not run preflight. Exposing it directly to forecast callers would force them to duplicate that scaffolding.

Naive alternatives: - a production_only=True flag on run() would gate large branches with a boolean and obscure intent; - a forecast-side reimplementation would diverge from the hindcast layout contract that downstream consumers (e.g. ExperimentResult.from_run_dir at stages/run_hindcast.py:161) depend on.

Decision

fit_production(config_path) is its own top-level orchestrator at stages/run_hindcast.py:239, sibling to run() at stages/run_hindcast.py:197.

It reuses the same scaffolding helpers as run(): - _prepare_config (stages/run_hindcast.py:68) — config + workdir resolution; - _create_run_root (stages/run_hindcast.py:81) — fresh timestamped run_root; - run_preflight(preflight_paths_for_hindcast(config)) (run/preflight.py:59) — same fit/pred parquet preconditions; - prepare_hindcast_mlflow + hindcast_mlflow_run for tracking; - _run_production_fit_phase (stages/run_hindcast.py:191) — the shared production fit; - _persist_included (stages/run_hindcast.py:158) — the included-geo set consumed downstream.

It deliberately omits _run_walk_forward_phase, investigate_experiment, postprocess_experiment, deliver_experiment, and evaluate_experiment (see the diff at stages/run_hindcast.py:223-234 vs the fit_production body at stages/run_hindcast.py:262-265).

_load_and_preprocess accepts load_references=False (stages/run_hindcast.py:108) so fit_production skips the reference dispatch tables built by build_references_by_harvest_year — those are evaluation-only inputs that the production fit never consumes (stages/run_hindcast.py:117-119, stages/run_hindcast.py:154).

The CLI surface mirrors the split: cli run hindcast calls run(), cli run fit-production calls fit_production (cli.py:334, cli.py:347).

Consequences

Positive

  • Forecast callers can mint a fresh production model without paying for walk-forward, postprocess, evaluate, or deliver (wiki/commodity_hindcast/sources/code/stages.md:284).
  • The shared internal helper _run_production_fit_phase stays private — external callers go through an entrypoint that mints run_root, runs preflight, and wraps the call in MLflow.
  • Skipping reference loaders (load_references=False) saves I/O on the forecast critical path.
  • Both entrypoints return the same run_root shape, so downstream code (forecast.run, ExperimentResult.from_run_dir) cannot tell which one produced the directory.

Negative

  • Two orchestrator entrypoints to keep in sync. Any change to preflight, run_root layout, or MLflow wrapping must be applied to both.
  • A reader has to know which entrypoint they invoked: run_roots minted by fit_production contain only models/<key>/production/ artefacts (no fold predictions, no postprocessed national frame, no delivery CSVs, no evaluation reports).
  • The load_references=False branch is a quiet optimisation that future contributors may overlook when adding new production-only paths.

Alternatives considered

  • Single run() with a production_only=True flag: rejected — flag-driven branches across seven phases mask intent and grow conditional debt at every stage call site.
  • Expose _run_production_fit_phase directly: rejected — leaks a private helper, forces every caller to re-implement preflight, run_root creation, and MLflow wrapping.
  • Make the forecast pipeline mint its own run_root and call the inner helper itself: rejected — forecast must reuse the hindcast directory layout contract that ExperimentResult and downstream consumers rely on; duplicating it courts drift.

Verification

  • cli run fit-production smoke test: invoke against a corn/soy config and assert that run_root contains models/<experiment_key>/production/ but no preds/<experiment_key>/walk_forward_preds.parquet, no postprocessed/, no delivery/, no reports/.
  • The two entrypoints share preflight_paths_for_hindcast (run/preflight.py:59); preflight's docstring already names both consumers, so any new precondition added there propagates by construction.
  • [PLACEHOLDER: cite tests/ — no targeted regression test for the fit_production entrypoint at the time of writing.]

References

  • market_insights_models/src/commodity_hindcast/stages/run_hindcast.py:197run() (full orchestrator)
  • market_insights_models/src/commodity_hindcast/stages/run_hindcast.py:239fit_production() (production-only entrypoint)
  • market_insights_models/src/commodity_hindcast/stages/run_hindcast.py:191_run_production_fit_phase (shared helper)
  • market_insights_models/src/commodity_hindcast/stages/run_hindcast.py:105_load_and_preprocess(..., load_references=False)
  • market_insights_models/src/commodity_hindcast/stages/run_hindcast.py:158_persist_included
  • market_insights_models/src/commodity_hindcast/run/preflight.py:59preflight_paths_for_hindcast (shared)
  • market_insights_models/src/commodity_hindcast/cli.py:334cli run fit-production command
  • wiki/commodity_hindcast/pipelines/hindcast.md — production fit phase narrative
  • wiki/commodity_hindcast/sources/code/stages.md:282fit_production fast-path description
  • Commit 643d548e (2026-04-28) — "Phase 7.1 — create hindcast.py orchestrator" introduces fit_production