Skip to content

MLflow Tracking

What it is

Every run hindcast, run fit-production, and run forecast invocation opens an MLflow run. The tracking layer is implemented as a set of helpers in lib/tracking/ that isolate stage code from direct MLflow API calls. The key components are:

  • tracking_uri_anchored — pins the SQLite database path next to runs/ and features/
  • configure_tracking — sets the MLflow tracking URI and experiment name
  • prepare_hindcast_mlflow — seeds RNG, writes run YAMLs, sets up tracking
  • hindcast_mlflow_run — context manager that wraps mlflow.start_run with tags, params, and initial artefact uploads
  • log_artifact / log_artifacts — S3-aware wrappers that stage cloud paths to a temp directory before calling mlflow.log_artifact*

Design authority: DESIGN.md Clause 7.

Where it lives

Symbol File Line
tracking_uri_anchored lib/tracking/decorators.py 43
configure_tracking lib/tracking/decorators.py 81
mlflow_fold_report_logger lib/tracking/decorators.py 87
prepare_hindcast_mlflow lib/tracking/decorators.py 102
hindcast_mlflow_run lib/tracking/decorators.py 129
log_artifact lib/tracking/log.py 48
log_artifacts lib/tracking/log.py 75
bounded_hindcast_params lib/tracking/log.py 112
log_hindcast_dataset_artifacts lib/tracking/log.py 129
data_file_sha256_prefix lib/tracking/log.py 96
capture_git lib/tracking/log.py 151
capture_environment lib/tracking/log.py 175
seed_everything lib/tracking/log.py 204

SQLite default backend

The default mlflow_tracking_uri in ExperimentConfig is sqlite:///mlruns.db. tracking_uri_anchored (decorators.py:43) detects that this is a relative SQLite URI and anchors it at config.data_root (i.e. INPUT_DATA_DIR):

if not p.is_absolute():
    resolved = (anchor / p).resolve()
    return f"sqlite:///{resolved.as_posix()}"

This keeps mlruns.db next to runs/ and features/ rather than resolving against whichever directory the CLI was invoked from (a historical source of scattered files).

Exception: when data_root is a CloudPath (QA environment with INPUT_DATA_DIR=s3://...), SQLite cannot live on object storage. The function detects isinstance(anchor, CloudPath), logs a warning, and leaves the URI cwd-relative. The warning text tells operators to set an absolute local path (e.g. sqlite:////tmp/mlruns.db) or an HTTP(S) MLflow tracking server.

To inspect runs locally: uv run mlflow ui --backend-store-uri sqlite:///mlruns.db.

hindcast_mlflow_run — the main context manager

decorators.py:129 is a @contextmanager that opens one MLflow run per pipeline invocation:

@contextmanager
def hindcast_mlflow_run(*, run_name, config, run_root, config_path, git_meta, env_meta):
    tags = {"stage": "hindcast", "run_dir": str(run_root), "config_path": str(config_path)}
    tags.update({k: str(v) for k, v in git_meta.items()})
    tags.update({k: str(v) for k, v in env_meta.items()})
    with mlflow.start_run(run_name=run_name):
        mlflow.set_tags(tags)
        mlflow.log_params(bounded_hindcast_params(config))
        log_artifact(str(run_root / "config_resolved.yaml"))
        log_artifact(str(run_root / "metadata.yaml"))
        yield

Tags set on each run: - stage — always "hindcast" - run_dir — the absolute path (or S3 URI) of the run directory - config_path — the YAML config file path used for this invocation - git_commit, git_short, git_dirty — from capture_git() - Python version, platform, and key package versions — from capture_environment()

bounded_hindcast_params — the params logged

log.py:112 produces the small per-run param dict logged alongside the full YAML artefact:

{
    "random_seed": ...,
    "data_root": ...,
    "feature_start_year": ...,
    "feature_end_year": ...,
    "experiment_name": ...,
    "experiment_key": ...,
    "detrend": ...,
    "regression": ...,
    "production_cumulative_threshold": ...,
}

The full resolved config is always logged as config_resolved.yaml alongside these summary params (DESIGN.md Clause 12: dual persistence).

Artefact tagging convention

Artefacts are uploaded under namespaced artifact_path prefixes:

Path Contents Logged by
(root) config_resolved.yaml, metadata.yaml hindcast_mlflow_run
reports/folds/{fold_key}/ Per-fold detrend/report PNGs mlflow_fold_report_logger
datasets/ fit.parquet, pred.parquet log_hindcast_dataset_artifacts
datasets/{spec.name}/ Per-spec reference data files log_hindcast_dataset_artifacts

Multiple reference_data specs (e.g. CONAB-final + CONAB-LEV for Brazil soy) are separated by spec name so they do not collide on artefact path (log.py:139–148).

mlflow_fold_report_logger (decorators.py:87) returns an on_saved callback for per-fold PNG writes. If no MLflow run is active when on_saved fires, it logs a warning and skips the upload — this prevents crashes when plots are generated outside a tracking context.

S3-aware log_artifact / log_artifacts

MLflow's own log_artifact* functions only accept local paths. When run_dir is an S3 URI, all artefact paths are S3 paths. log.py:48 and log.py:75 handle this by staging the S3 object or directory to a tempfile.TemporaryDirectory before calling mlflow.log_artifact*:

with tempfile.TemporaryDirectory() as tmp:
    cp.download_to(dest)
    mlflow.log_artifact(str(dest), artifact_path=artifact_path)

DESIGN.md Clause 7 — key requirements

"WHEN an experiment runs, the system SHALL track it with MLflow (mlflow>=3, hard dependency). In create mode, a new MLflow run is started; in resume mode, the existing mlflow_run_id from metadata_<stage>.yaml is used to resume the same MLflow run."

Additional requirements from Clause 7: - MLflow params are prefixed by stage name (e.g. train/random_seed) to avoid write-once collisions on resume. - Training scripts should call mlflow.autolog(log_models=False) — models are not logged via autolog; callers use mlflow.<flavour>.log_model() directly. - run forecast --run-dir D resumes the MLflow run identified by the mlflow_run_id recorded in metadata_<stage>.yaml — no new run per init date.

Parallel-run DB-locking issue

Concurrent pipeline runs for the same commodity that share the same mlruns.db file can cause a SQLAlchemy OperationalError due to SQLite's single-writer lock. SQLite does not support concurrent writes from multiple processes. The symptom is a failed MLflow write in one of the concurrent processes, which causes that pipeline invocation to terminate early.

The technical mitigation is straightforward: run same-commodity pipelines sequentially rather than in parallel. Different-commodity pipelines sharing the same mlruns.db are less likely to conflict because SQLite uses file-level locking and acquisitions are brief, but concurrent hindcast runs for the same experiment key should be avoided.

Key invariants

  • tracking_uri_anchored is the only place that resolves a relative SQLite URI. Stage code never calls mlflow.set_tracking_uri directly.
  • hindcast_mlflow_run is a context manager; the yield point is where stage code runs. If the stage raises, MLflow marks the run as FAILED automatically.
  • bounded_hindcast_params produces a small, stable dict; the full config is always logged as a YAML artefact separately. This follows Clause 12 (dual persistence).

How it interacts with the pipeline

prepare_hindcast_mlflow is called once at the start of run_hindcast.run(). It seeds the RNG, writes config_resolved.yaml and metadata.yaml, and configures the MLflow tracking URI and experiment name. hindcast_mlflow_run is then entered and wraps the entire stage execution. Per-fold detrend plots are uploaded incrementally via the mlflow_fold_report_logger callback passed down to the fit stage.

Pitfalls

  • In QA (S3 data_root), the SQLite URI is left cwd-relative and mlruns.db resolves against the process cwd. This can scatter databases if the CLI is invoked from different directories across runs.
  • mlflow.autolog(log_models=False) is a recommendation in DESIGN.md, not enforced; if a stage calls mlflow.autolog() without log_models=False, large model artefacts may be uploaded unintentionally.

Open questions

  • Resume mode (mlflow_run_id in metadata_<stage>.yaml) is specified in DESIGN.md Clause 7 but the current hindcast_mlflow_run always starts a fresh run; resume appears to be implemented only for the forecast sub-pipeline.
  • There is no test asserting that bounded_hindcast_params keys are stable across config schema changes; a field rename could silently break the run comparison view in the MLflow UI.