Skip to content

Source: features/builders/README.md — Builder Protocol

What it is

A short (15-line) specification of the builder protocol that every build_X.py module under features/builders/ must satisfy. It defines the calling contract, the required output schema, the validation hook, and the supporting infrastructure (shared core, registry).

Section-by-section summary

Protocol contract

Every build_X.py must:

  • Accept arguments (path, cfg, years).
  • Return a DataFrame.
  • The DataFrame must have the following columns as join keys: ("year", "geo_identifier", "init_date").
  • The DataFrame must not have duplicate keys.

Validation

validate_builder_output() validates all builder outputs. It is called inside run_builder() in src/features/builders/registry.py, which itself is called from src/features/run.py.

Shared functionality

Shared builder utilities live in core.py.

Registry

A registry of all builders is defined in registry.py, mapping simple strings to the functions they call. This implements the DESIGN.md clause requiring registry-based dispatch rather than if builder_type == "..." chains.

Status note

The README includes an honest status caveat:

"TODO: this is currently agentically written / extracted from QUBE-sprint. we need to go through this cleanly and clearly to ensure we are happy with this logic."

This indicates the builder protocol documentation and possibly implementation were generated during an agentic session and have not been fully reviewed.

Notable claims (the load-bearing ones)

  • The three-column join key (year, geo_identifier, init_date) is the canonical join key for all builder outputs — it must match exactly for the inner join in assemble() to produce correct results.
  • No duplicate keys are permitted — this is validated by validate_builder_output() before the builder result is used.
  • registry.py is the single dispatch point: adding a new builder means registering it in the registry, not adding a branch elsewhere.
  • interface.py (referenced by DESIGN.md Clause 18) defines the formal AbstractBuilder protocol that build_X.py modules satisfy.

What this document is NOT

This README does not describe the assemble step — that is features/README.md. It does not specify the column schemas of each individual builder's output beyond the three join keys.

Cross-references