Source: features/builders/README.md — Builder Protocol¶
What it is¶
A short (15-line) specification of the builder protocol that every build_X.py module under features/builders/ must satisfy. It defines the calling contract, the required output schema, the validation hook, and the supporting infrastructure (shared core, registry).
Section-by-section summary¶
Protocol contract¶
Every build_X.py must:
- Accept arguments
(path, cfg, years). - Return a
DataFrame. - The
DataFramemust have the following columns as join keys:("year", "geo_identifier", "init_date"). - The
DataFramemust not have duplicate keys.
Validation¶
validate_builder_output() validates all builder outputs. It is called inside run_builder() in src/features/builders/registry.py, which itself is called from src/features/run.py.
Shared functionality¶
Shared builder utilities live in core.py.
Registry¶
A registry of all builders is defined in registry.py, mapping simple strings to the functions they call. This implements the DESIGN.md clause requiring registry-based dispatch rather than if builder_type == "..." chains.
Status note¶
The README includes an honest status caveat:
"TODO: this is currently agentically written / extracted from QUBE-sprint. we need to go through this cleanly and clearly to ensure we are happy with this logic."
This indicates the builder protocol documentation and possibly implementation were generated during an agentic session and have not been fully reviewed.
Notable claims (the load-bearing ones)¶
- The three-column join key
(year, geo_identifier, init_date)is the canonical join key for all builder outputs — it must match exactly for the inner join inassemble()to produce correct results. - No duplicate keys are permitted — this is validated by
validate_builder_output()before the builder result is used. registry.pyis the single dispatch point: adding a new builder means registering it in the registry, not adding a branch elsewhere.interface.py(referenced by DESIGN.md Clause 18) defines the formalAbstractBuilderprotocol thatbuild_X.pymodules satisfy.
What this document is NOT¶
This README does not describe the assemble step — that is features/README.md. It does not specify the column schemas of each individual builder's output beyond the three join keys.
Cross-references¶
- features_README.md —
build_featuresorchestrator andassemblefinaliser - DESIGN.md — Clause 20 (registry-based dispatch) and Clause 18 (interface.py contract)
- in_package_DOMAIN_MODEL.md —
Buildervalue-object union in the entity catalogue