Skip to content

commodity_hindcast handover — IMPLEMENTATION_PLAN.md

Status: DRAFT — awaiting orchestrator go/no-go from user. Master reference: this file. Updated by the Orchestrator at every accept/retry boundary. Scope (HARD): market_insights_models/src/commodity_hindcast/ only. No crop_yield, no area_forecast, no winter_wheat_area, no csb_pipeline. Cross-references to other models are forbidden in the deliverable. Last updated: 2026-05-07


1. Decisions captured

# Decision Value
D1 Output staging /data/processing/tmp/tmi-handover/ only. Nothing in the repo until user migrates.
D2 Unknown facts [PLACEHOLDER: <description>] markers; Critic enforces presence; Orchestrator collects all into placeholders.md for the user to fill.
D3 Critic strictness STRICT. Every factual claim must cite file.py:line, wiki/...md path, or 2026-05 source URL. Interpretive prose without grounding is rejected.
D4 Diagrams 5 runbook failure-mode flowcharts + 1 risk severity×likelihood heatmap, all D2 → PNG via the explainer-grid toolchain.
D5 Spawning Nested Claudes via unset CLAUDECODE && claude -p .... Actor + Critic are separate processes per doc.
D6 Concurrency Up to 3 actor/critic pairs in parallel where there are no dependencies (wave-based).
D7 Scope commodity_hindcast ONLY. Other models out of scope.

2. Deliverable tree

Actor/Critic orchestration plan

/data/processing/tmp/tmi-handover/
  IMPLEMENTATION_PLAN.md              # this file (master)
  orchestration.d2 + output/orchestration.png
  placeholders.md                     # collected [PLACEHOLDER] items for user to fill
  drafts/
    HANDOVER.md                       # top-level navigation
    data_lineage.md                   # source / owner / cadence / freshness / failure
    risk_register.md                  # consolidated risks + severity heatmap link
    decisions/
      ADR-001-walk-forward-bypass.md
      ADR-002-calibration-result-persistence.md
      ADR-003-mandatory-residual-mode.md
      ADR-004-forecast-path-restructure.md
      ADR-005-fit-production-endpoint.md
    runbook/
      daily_incremental.md
      full_hindcast_rerun.md
      forecast_per_init.md
      mlflow_db_recovery.md
      qa_to_prod_sync.md
    contacts.md                       # RACI scaffold w/ placeholders
    access.md                         # AWS, S3, MLflow, env vars, dashboards
    signoff.md                        # acceptance checklist
  diagrams/
    runbook_daily_incremental.d2 + .png
    runbook_full_hindcast_rerun.d2 + .png
    runbook_forecast_per_init.d2 + .png
    runbook_mlflow_db_recovery.d2 + .png
    runbook_qa_to_prod_sync.d2 + .png
    risk_heatmap.d2 + .png
  critique/
    <doc>_round_<N>.md                # critic verdicts; retained for audit

3. Atomic task list

Numbered for direct mapping to TaskCreate IDs. Each task has Actor goal, Critic check, dependencies, parallelisability.

ID Task Actor goal Critic check Depends on Wave
T1 data_lineage.md Walk every ResolvablePath field + config.reference_data spec; produce table source / owner / refresh cadence / freshness check / failure mode; ground every row in code or commodity YAML Verify each row maps to a real config field; reject any row with fabricated cadence/owner; placeholders OK none A
T2 risk_register.md Consolidate MEMORY.md known issues + LINT_REPORT.md backlog + DESIGN.md drift items + wiki/sources/prs/ open concerns; sever × likelihood × mitigation owner; cite each row Reject duplicates with existing wiki entity pages; sanity check severity ordering none A
T3 ADR-001 walk-forward bypass Retro-document why run_walk_forward bypasses stages.run_predict; cite runner.py:38-49 blind-overwrite comment, PR that introduced the kernel Verify cited PR + line refs exist; reject if rationale invents motivation absent from code none A
T4 ADR-002 CalibrationResult persistence Retro-document save/load design; cite conformalise.py:215, :225, PR-361, the entities/CalibrationResult.md page Verify all citations resolve none A
T5 ADR-003 mandatory residual_mode Retro-document PR-372 (forecast.residual_mode made required); cite the validator at validate_residual_mode lines 91-140 Verify citations none A
T6 ADR-004 forecast path restructure Retro-document PR-369 (forecast/{season_year}/{init_date}/); cite ForecastSlice line refs Verify none A
T7 ADR-005 fit_production endpoint Retro-document why fit_production is its own entrypoint; cite run_hindcast.py:239 Verify none A
T8 runbook/daily_incremental.md Author with 2026 metadata block; commands grounded in CLI + Makefile; failure modes from MEMORY.md + DESIGN.md Verify every command is real; last-tested honest blank code B
T9 runbook/full_hindcast_rerun.md Walk-forward + production fit + postprocess + deliver + evaluate; the make target / CLI sequence; resource expectations Verify code B
T10 runbook/forecast_per_init.md cli run forecast --season-year ... --init-date ...; multi-init loop pattern; failure recovery Verify code B
T11 runbook/mlflow_db_recovery.md The known SQLite locking issue (parallel same-commodity runs); recovery procedure; hindcast vs prevention Verify references the real bug from MEMORY.md code B
T12 runbook/qa_to_prod_sync.md The aws s3 sync ... --quiet pattern (recent commit 0e302410); destination buckets; ECR/ECS implications Verify against the recent commit code B
T13 5 runbook failure-mode flowcharts One D2 per runbook showing decision branches; render PNG Verify each branch corresponds to a documented failure T8-T12 C
T14 risk_heatmap.d2 Severity × likelihood heatmap of risk_register entries; D2 + PNG Verify it visualises T2 entries faithfully T2 C
T15 access.md AWS roles + S3 buckets + MLflow + ECR/ECS + env vars + dashboards; ground every entry in code or grep Verify env vars + bucket names appear in code; flag anything it cannot ground code A
T16 contacts.md RACI scaffold with placeholders for: business owner, model owner, on-call, MLflow admin, AWS contact, customer comms; structure only Verify structure matches incident response patterns none A
T17 signoff.md Acceptance checklist: fresh hindcast run; forecast run; deliver CSV regen; one drill of each runbook Each item demonstrable in <1 day; verify by tracing inputs T8-T12 D
T18 HANDOVER.md Top-level navigation: 3 audience tracks (User Support / QA / Technical); link to all other docs; one-line summary each Verify every link resolves; tracks make sense T1-T17 D
T19 placeholders.md Aggregate every [PLACEHOLDER] marker across all accepted drafts Verify completeness T1-T18 D

Wave plan: - Wave A (T1-T7, T15-T16): 9 docs, no dependencies. Run 3 actor/critic pairs in parallel, three batches. - Wave B (T8-T12): 5 runbooks, depend only on shared codebase knowledge. Run 3 in parallel. - Wave C (T13-T14): 6 diagrams. Render is fast; can do all in one batch. - Wave D (T17-T19): 3 finishing docs. Sequential (T17 then T18 then T19).

4. Actor/Critic protocol

ORCHESTRATOR (this session):
  for task in plan:
    spawn ACTOR (nested Claude, unset CLAUDECODE):
      prompt: task description + 'cite file:line or URL'
      tools: Read, Bash, Explore, Grep, Glob, Write (output to drafts/)
      output: drafts/<doc>.md
    spawn CRITIC (nested Claude, unset CLAUDECODE):
      prompt: critique drafts/<doc>.md against codebase
      tools: read-only
      output: critique/<doc>_round_N.md with verdict (ACCEPT|RETRY) + recommendations
    if verdict == ACCEPT:
      mark T# completed in IMPLEMENTATION_PLAN.md
    else:
      respawn ACTOR with critique attached; loop
    max retries: 3 per doc; if exceeded, escalate to user with context

5. Quality bar

  • Strict grounding — every fact must cite file.py:line, wiki/...md, or 2026-05 source URL.
  • British English throughout.
  • No emojis unless user requests.
  • Concise — runbooks ≤ 300 lines; ADRs ≤ 150 lines; HANDOVER.md ≤ 100 lines.
  • No fabricated facts — placeholders are the escape hatch. Critics reject inventions.
  • Same scope discipline — commodity_hindcast only; cross-model references blocked.

6. Status board

ID Status Owner Critique round Notes
T1 ACCEPTED actor-1 / critic-1 r1 data_lineage — 53 lines, 15 placeholders
T2 ACCEPTED actor-2 / critic-2 r2 risk_register — 14 risks, 15 placeholders; SBC stale file refs purged in r2
T3 ACCEPTED actor-3 / critic-3 r1 ADR walk-forward bypass — 132 lines, 21 citations
T4 ACCEPTED actor-4 / critic-4 r1 ADR CalibrationResult — 150 lines, 1 placeholder
T5 ACCEPTED actor-5 / critic-5 r1 ADR mandatory residual_mode — 146 lines, 1 placeholder
T6 ACCEPTED actor-6 / critic-6 r1 ADR forecast path restructure — 137 lines, 3 placeholders
T7 ACCEPTED actor-7 / critic-7 r1 ADR fit_production — 121 lines, 2 placeholders
T15 ACCEPTED actor-15 / critic-15 r1 access.md — 100 lines, 11 placeholders
T16 ACCEPTED actor-16 / critic-16 r1 contacts.md scaffold — 93 lines, 117 placeholders
T8 ACCEPTED actor-8 / critic-8 r1 runbook multi_year_forecast — 300 lines, 8 placeholders (revised from daily_incremental)
T9 ACCEPTED actor-9 / critic-9 r1 runbook full_hindcast_rerun — 213 lines, 7 placeholders
T10 ACCEPTED actor-10 / critic-10 r3 runbook forecast_per_init — 241 lines, 6 placeholders; r2→r3 fix added ADR-005 to References
T11 ACCEPTED actor-11 / critic-11 r1 runbook mlflow_db_recovery — 247 lines, 7 placeholders
T12 ACCEPTED actor-12 / critic-12 r1 runbook qa_to_prod_sync — 261 lines, 13 placeholders
T13 ACCEPTED orchestrator r1 5 D2 flowcharts rendered to diagrams/output/
T14 ACCEPTED orchestrator r1 matplotlib 4x3 grid heatmap; D2 attempt abandoned (elk doesn't grid)
T17 ACCEPTED actor-17 / orchestrator r1 signoff.md — 83 lines, 9 placeholders
T18 ACCEPTED actor-18 / orchestrator r1 HANDOVER.md — 70 lines, 0 placeholders; all linked docs verified
T19 ACCEPTED orchestrator r1 placeholders.md — aggregator over 117 markers across 13 files

7. Open questions for the user (block start)

None at this point — D1–D7 are settled. Awaiting GO to start Wave A.

8. Final completion (2026-05-08)

All 19 atomic tasks ACCEPTED. Round counts: - Round-1 ACCEPT: T1, T3, T4, T5, T6, T7, T9, T11, T12, T15, T16, T8, T17, T18, T19 (15 tasks) - Round-2 ACCEPT: T2 (risk_register fix to SBC stale citations) - Round-3 ACCEPT: T10 (forecast_per_init — round-1 critic flagged a missing References cross-ref; fix landed in round 3) - T13, T14 (diagrams): orchestrator-direct (no actor/critic loop — visual artefacts verified by render success and visual inspection)

Total nested Claude spawns: 19 actors + 14 critics = 33. Total wall-clock: roughly one extended session.

Deliverable inventory: - 13 markdown drafts (2,488 lines) under drafts/ - 6 PNGs under diagrams/output/ - 14 critique files under critique/ - This master plan.

Outstanding for the user: 117 placeholder markers across 13 files. See drafts/placeholders.md for the categorised fill workflow.