Skip to content

Wiki Maintenance Schema

You are an LLM acting on the commodity_hindcast knowledge base. This file defines how the wiki is organised, what the conventions are, and what workflows you follow when ingesting sources, writing pages, or answering queries.

Architecture

Three layers:

  1. Raw sources (immutable) — Python modules, configs, existing docs, git history, PR bodies inside market_insights_models/src/commodity_hindcast/ and the GitHub repo. You read from these. You never modify them.

  2. The wiki (this directory) — LLM-generated markdown. You own it.

  3. The schema — this file. Co-evolves with the user.

Page types

Type Directory Purpose
source sources/code/, sources/configs/, sources/docs/, sources/prs/, sources/commits/ Faithful summaries of an immutable source. Quote rather than paraphrase where claims matter.
entity entities/ A domain entity (Commodity, SeasonYear, ExperimentResult, …). Synthesised from source pages with citations.
concept concepts/ A cross-cutting idea (walk-forward CV, conformal modes, S3 path safety, …).
pipeline pipelines/ A stage-by-stage walkthrough of one workflow (feature build, hindcast, forecast, …).
synthesis synthesis/ Top-level overview, thesis, open questions.

Page format

Every page begins with YAML frontmatter:

---
name: <PageName>
description: <one-line hook used when the LLM scans the index for relevance>
type: <source|entity|concept|pipeline|synthesis|schema|plan>
last_updated: YYYY-MM-DD
sources:
  - relative/path/to/source-1.md
  - relative/path/to/source-2.md
---

Then the markdown body.

sources field semantics:

  • For type: source pages (sources/code/, sources/configs/, sources/docs/, sources/prs/, sources/commits/): list the immutable upstream artefact(s) that the page summarises. Acceptable forms include repo-relative paths to Python or YAML files (e.g. market_insights_models/src/commodity_hindcast/configs/soybeans_bra.yaml), GitHub PR URLs (e.g. https://github.com/treefera/treefera-market-insights/pull/372), and external URLs. Use sources: [] only when the page is a thematic synthesis of many upstream artefacts with no single source (e.g. commits/timeline.md).
  • For all other page types (entity, concept, pipeline, synthesis, schema, plan): list relative paths to wiki source pages under sources/. These pages should not reference raw repo paths directly — go through the source-page wrapper.

Hard formatting rules:

  • No --- standalone-line dividers in the body. Frontmatter --- at top and bottom is fine.
  • British English (programme, behaviour, organise, modelling, optimise).
  • Code blocks tagged with language. Use text for ASCII trees, formula blocks, and tabular layouts.
  • Cross-references use relative markdown links: [Commodity](../entities/Commodity.md).
  • File-and-line citations to Python source use path/to/file.py:LINE when the file is referenced for the first time on a page or the directory context is ambiguous. A short basename form (run_fit.py:55) is acceptable for repeat references on the same page once the full path has been established, or inside a section that anchors the directory in prose (e.g. "in stages/, run_fit.py:55 …"). Prefer the full path when in doubt.
  • YAML config citations may use either the line-number form (configs/wheat_usa.yaml:42) or a key-path form (configs/wheat_usa.yaml:commodity.builders.yields.filepath). Key paths are preferred when the value being cited is itself a config key whose location may shift across edits, since key paths are line-number-stable.

index.md

Catalogue of every page in the wiki. Organised by category. Two acceptable layouts:

  • Bullet form (default): - [Title](relative/path.md) — one-line hook (≤150 chars).
  • Table form: a markdown table with at minimum Page and Description columns, used when extra structured columns (e.g. Date, Theme, Status) genuinely aid navigation. Tables must still link each page in the first column and keep the description ≤150 chars.

Pick whichever shape makes the index more useful for the LLM scanning it; do not mix forms within a single category. Updated on every ingest and after the lint pass.

log.md

Append-only chronological record. Each entry begins with a line of the form:

## [YYYY-MM-DD] {ingest|query|lint} | <short title>

This makes the log greppable: grep "^## \[" log.md | tail -10.

Workflows

Ingest a new source

  1. Read the source.
  2. Write a sources/<category>/<name>.md page with frontmatter + faithful summary + key quotes.
  3. Update entity / concept / pipeline pages that the source touches; add sources entries to their frontmatter.
  4. Append to index.md and log.md.

Answer a query

  1. Read index.md to locate relevant pages.
  2. Read the relevant pages.
  3. Synthesise an answer with citations.
  4. If the answer is novel and useful, file it back into the wiki.

Lint pass

  1. Find orphans (pages with no inbound link).
  2. Find broken relative links.
  3. Find contradictions between pages.
  4. Find concepts mentioned without their own page.
  5. Produce LINT_REPORT.md. Apply fixes.

Constraints

  • No edits to source code or to the existing in-package domain model.
  • No commits, pushes, or PR creation by any agent reading this file.
  • No GitHub releases.
  • No Claude attribution.