Skip to content

DeliveryRow

Definition

DeliveryRow is a frozen Pydantic model that represents one row in the client-facing hindcast or forecast delivery CSV. It is the typed contract at the delivery boundary: every value written to a Treefera_*_Hindcast_*.csv or Treefera_*_Forecast_*.csv must pass through a DeliveryRow constructor. All yield values are in delivery units (bu/ac for grains; lbs/ac for cotton) — the conversion from internal kg/ha has already occurred before construction.

Kind

Value object (frozen Pydantic BaseModel, extra="forbid"). There is one class for all ADM levels. ADM0, ADM1, and ADM2 rows share the same schema; the geo_identifier field carries the level information (e.g. "adm0:usa" vs "adm1:usa/Iowa" vs "adm2:usa/Iowa/Story"). An earlier design considered separate ADM0Row, ADM1Row, ADM2Row classes; the code never implemented them.

Source of truth

delivery/schemas.py:109class DeliveryRow.

Key attributes

Field table

Field Type Optional Unit Semantics
commodity str No Upper-cased commodity name (e.g. "CORN")
year int No Harvest season year
init_date str No ISO-8601 date YYYY-MM-DD; the forecast issue date
geo_identifier str No (default "adm0:usa") Canonical ADM identifier; encodes level by prefix
variable str No (default "yield_bu_acre") Output variable name
model str No (default "commodity_hindcast") Model identifier for the warehouse
mean float No bu/ac (or lbs/ac for cotton) Area-weighted predicted yield in delivery units
weather_correction_bu_ac float No bu/ac always Detrended component sim_yield_kg_ha_detrended × scale; never converted back to kg/ha in export (see PR-331)
nass_actual float \| None Yes bu/ac Area-weighted observed NASS yield from predicted counties (model-subset)
nass_actual_area_weighted_all float \| None Yes bu/ac Full-universe NASS area-weighted yield (closest to USDA headline figure; populated at ADM0 only)
nass_actual_prod_div_area_all float \| None Yes bu/ac Full-universe production ÷ area (ADM0 only)
wasde_in_season float \| None Yes bu/ac As-of-init_date WASDE in-season estimate
conab_final_in_season float \| None Yes bu/ac As-of-init_date CONAB final estimate (Brazil soybean)
conab_lev_in_season float \| None Yes bu/ac As-of-init_date CONAB levantamento estimate (Brazil soybean)
lower_50lower_95 float \| None Yes bu/ac Conformal lower CI bands at 50/68/80/90/95 % levels
upper_50upper_95 float \| None Yes bu/ac Conformal upper CI bands at 50/68/80/90/95 % levels

Field count: 17 (6 identity + 1 mean + 1 weather correction + 3 NASS benchmarks + 3 in-season references + 10 CI band columns — 5 lower + 5 upper).

extra="forbid" strictness

ConfigDict(frozen=True, extra="forbid") is set at schemas.py:126. This is load-bearing: walk_forward_preds_to_delivery_rows materialises one column per cfg.reference_data spec named f"{spec.name}_in_season". Only wasde_in_season, conab_final_in_season, and conab_lev_in_season are declared. Without forbid, Pydantic's default extra="ignore" would silently discard any benchmark from a spec whose YAML name does not match a declared field. With forbid, the same situation raises a ValidationError immediately.

Lifecycle

  1. Assemblydelivery/conversions.py:walk_forward_preds_to_delivery_rows aggregates the walk-forward predictions to the requested ADM level, converts units, attaches conformal half-widths, and constructs DeliveryRow objects. Field validators and model validators fire at construction.
  2. Validation — Pydantic runs _validate_init_date_format (field validator), then _validate_ci_ordering and _validate_init_date_year (model validators) on every row.
  3. Collection — Validated rows are collected into a list[DeliveryRow] passed to HindcastDelivery.
  4. Serialisationdelivery/conversions.py:delivery_to_dataframe converts the list to a Polars DataFrame in canonical column order from build_delivery_column_order.
  5. Persistence — Written to a CSV under run_dir/delivery/ (hindcast) or run_dir/forecast/{season_year}/{init_date}/delivery/ (forecast).

CI ordering invariant

The model validator at schemas.py:176 enforces that all present (non-None) CI bands nest correctly around the mean:

@model_validator(mode="after")
def _validate_ci_ordering(self) -> DeliveryRow:
    """Enforce lower_95 <= ... <= mean <= ... <= upper_95 for present bands."""
    chain: list[tuple[str, float]] = []
    for field_name in _CI_LOWER_FIELDS:
        val = getattr(self, field_name)
        if val is not None:
            chain.append((field_name, val))
    chain.append(("mean", self.mean))
    for field_name in _CI_UPPER_FIELDS:
        val = getattr(self, field_name)
        if val is not None:
            chain.append((field_name, val))

    for i in range(len(chain) - 1):
        name_a, val_a = chain[i]
        name_b, val_b = chain[i + 1]
        if val_a > val_b:
            msg = f"CI ordering violation: {name_a}={val_a:.3f} > {name_b}={val_b:.3f}"
            raise ValueError(msg)
    return self

_CI_LOWER_FIELDS is ordered outermost-to-innermost (lower_95, lower_90, …, lower_50) and _CI_UPPER_FIELDS is ordered innermost-to-outermost (upper_50, …, upper_95). Absent bands are skipped, so partial CI subsets (e.g. only 90/95 levels) still pass. Violation raises ValueError at row construction time — the invariant cannot be silently violated.

Validators summary

Validator Kind Location Invariant
_validate_init_date_format field validator schemas.py:164 init_date matches ^\d{4}-\d{2}-\d{2}$
_validate_ci_ordering model validator schemas.py:176 lower_95 ≤ … ≤ lower_50 ≤ mean ≤ upper_50 ≤ … ≤ upper_95 for present bands
_validate_init_date_year model validator schemas.py:197 init_date calendar year in [year − 10, year + 1]; long-range horizon constant LONG_RANGE_HORIZON_YEARS = 10 at schemas.py:106

Relationships

  • Contained by: HindcastDelivery — holds list[DeliveryRow]
  • Produced by: delivery/conversions.py:walk_forward_preds_to_delivery_rows
  • Serialised by: delivery/conversions.py:delivery_to_dataframe
  • Column order governed by: schemas.py:build_delivery_column_order (prefix + CI columns + suffix)

Concepts and pipelines

  • Delivery pipeline — end-to-end CSV production
  • Unit conventions: all delivery-boundary fields in bu/ac; weather_correction_bu_ac stays in bu/ac even during warehouse re-ingestion (export.py:56)
  • nass_actual_area_weighted_all was added alongside weather_correction_bu_ac to expose the full-universe NASS figure (see PR-340)

PRs and commits

  • PR-331 — introduced weather_correction_bu_ac as a required float (previously float | None, always null); added P90 CI bands to wheat, cotton, and soybean configs
  • PR-340 — dashboard changes that surfaced nass_actual_area_weighted_all side-by-side with nass_actual

Open questions

  • nass_actual_area_weighted_all and nass_actual_prod_div_area_all are only meaningful at ADM0 level but the schema permits them at ADM1/ADM2. No validator currently enforces that they are None below national level.
  • variable defaults to "yield_bu_acre" even for cotton (which uses lbs/ac). Whether the default should be overridden per commodity is unresolved in the codebase.