Skip to content

Entity: InitDate

Definition

A specific calendar date on which a within-season crop forecast is issued, stored as an ISO YYYY-MM-DD string in delivery CSVs and as a column in parquet feature tables. Features included in pred.parquet are known up to init_date − lag_days (default lag_days = 1). In hindcast mode, init dates form a weekly grid derived from CommodityConfig.hindcast_init_season_doys; in forecast mode a single runtime-injected date overrides the grid.

Kind

Conceptual identifier (Python datetime.date in memory; str in persisted artefacts). No dedicated class or NewType. Represented as init_date: date on ForecastSlice and as the init_date column (str) in every feature parquet and delivery CSV row.

Source of truth

market_insights_models/src/commodity_hindcast/config.py:384CommodityConfig.hindcast_init_dates(season_year) returns the full weekly grid as a list[date]. config.py:381CommodityConfig.to_date(season_doy, season_year) converts a season DOY to a calendar date. config.py:706ExperimentConfig.init_dates_for(season_year) is the single dispatch point: returns [forecast.init_date] in forecast mode, or the full weekly grid in hindcast mode.

Key attributes / structure

Attribute Type Notes
Calendar date value date (in memory) / str (persisted) ISO YYYY-MM-DD
lag_days int Default 1; harvest-init training rows override to 0
Season DOY int CommodityConfig.to_season_doy(init_date, season_year) inverts the calendar
Filesystem role path component run_dir/forecast/{season_year}/{init_date}/ (PR-369 structure)

Cardinality: Multiple per season year in hindcast mode — typically ~30 per crop season, matching the length of hindcast_init_season_doys. One per ForecastSlice in forecast mode. In forecast mode the same init_date can coexist with multiple season_year values under the same run_dir (introduced in PR-369).

Lifecycle

Created: In hindcast mode, enumerated at feature-build time from CommodityConfig.hindcast_init_season_doys via init_dates_for(season_year). In forecast mode, injected at runtime via --init-date CLI flag; build_forecast_features sets ForecastConfig.init_date.

Consumed: - Feature assembly — the init_date column in fit.parquet / pred.parquet controls which weather and climo observations are visible (up to init_date − lag_days). - Walk-forward prediction — _predict_fold_rolling iterates over init dates to accumulate predictions (run/runner.py:86). - Delivery — DeliveryRow.init_date (ISO string) is an identity column in every delivery CSV; validated by _validate_init_date_format and _validate_init_date_year. - Conformal calibration — CalibrationResult with residual_mode = "hindcast_oos_per_init_date" keys half-widths by (month, day) of the init date.

Destroyed: Never destroyed; immutable once written to a parquet or CSV.

Relationships to other entities

  • SeasonYear — partitioned by — a single season year has many init dates; CommodityConfig.hindcast_init_dates(season_year) returns the grid
  • Commodity — generated by — hindcast_init_season_doys on CommodityConfig defines the weekly grid; to_date() converts season DOY to calendar date
  • Yield — indexes — every yield prediction is keyed by (geo_identifier, season_year, init_date); later init dates carry more complete weather information
  • Fold — scoped within — all init dates for a season year fall within one fold's test window

Concepts and pipelines that touch this entity

  • Pipeline: forecast (P5) — ForecastSlice is identified by (season_year, init_date); path lives at run_dir/forecast/{season_year}/{init_date}/
  • Pipeline: hindcast (P5) — weekly init-date grid drives the feature assembly loop
  • Concept: conformal calibration (P5) — hindcast_oos_per_init_date mode keys conformal half-widths by (month, day) of init_date

PRs and commits

  • PR-369 — Restructures forecast path from run_dir/forecast/{init_date}/ to run_dir/forecast/{season_year}/{init_date}/ to support multiple season years per init date; introduces long-range climo stub for distant future seasons
  • PR-372 — Makes ForecastConfig.residual_mode mandatory; adds validate_residual_mode gate that inspects CalibrationResult availability for the given init date before any feature compute

Open questions

  • lag_days = 1 means features at init_date itself are excluded; should this default be surfaced more prominently in configs, as it silently affects which weather day is the last visible observation?
  • The _validate_init_date_year validator on DeliveryRow allows init dates up to LONG_RANGE_HORIZON_YEARS = 10 before the target season year — is this horizon documented and understood by consumers of the delivery CSV?
  • There is no validation that an InitDate falls within the crop's growing season window; an init date after harvest is accepted by the config but produces meaningless features.
  • The conformal mode hindcast_oos_per_init_date keys on (month, day) — if a forecast init_date does not match any hindcast grid date exactly, the nearest key is used; the interpolation rule should be documented.