Entity: SeasonYear¶
Definition¶
The crop-year label — an integer identifying the harvest year. For same-year crops (corn, soybeans, cotton) SeasonYear = 2023 means the season whose harvest falls in calendar year 2023. For cross-year crops (winter wheat, brazil soybean) the season spans two calendar years; SeasonYear still names the harvest year, and season_start_year_offset = -1 shifts the season-start anchor back by one calendar year. SeasonYear is not a class; it is a plain int used as a join key throughout the pipeline.
Kind¶
Conceptual identifier (plain int). No dedicated class or NewType alias exists in the codebase. It is carried as year: int in every feature parquet index column, fold directory name, and delivery CSV row.
Source of truth¶
market_insights_models/src/commodity_hindcast/config.py:375 — CommodityConfig.season_start_date(season_year: int) -> date is the canonical conversion from an integer year to a calendar anchor. config.py:285 — season_start_year_offset controls cross-year season handling. config.py:483 — ExperimentProtocolConfig.test_years: list[int] enumerates the season years used as fold labels.
Key attributes / structure¶
| Attribute | Type | Notes |
|---|---|---|
| Integer year value | int |
E.g. 2023; no lower / upper bound enforced in code |
season_start_date(y) |
date |
CommodityConfig.season_start.in_year(y + season_start_year_offset) |
harvest_date(y) |
date |
CommodityConfig.to_date(harvest_season_doy, y) |
| Feature parquet column | year (int) |
Second dimension of the (year, geo_identifier, init_date) index triple |
| Fold label | str(season_year) |
Numeric folds encode season_year as a string, e.g. "2020" |
Cardinality: One integer per harvest year modelled. Hindcast runs span feature_start_year to feature_end_year (fields on ExperimentConfig, defaulting 1980–2025). Forecast runs target one or more future season years per init_date (PR-369 introduced multi-year outlooks).
Cross-year note: season_doy (int, days since season start, can exceed 365) is distinct from calendar_doy (1–366). A SeasonYear maps to the half-open interval [season_start_date(y), harvest_date(y)] on the calendar.
Lifecycle¶
Created: Enumerated by ExpandingFoldGenerator from ExperimentProtocolConfig.test_years during walk-forward hindcast (run/experiment_protocol.py:138). In forecast mode, one or more season_year values are passed as CLI arguments (--season-year) and injected into ForecastSlice.
Consumed: Used as a join key in every feature parquet, prediction parquet, and delivery CSV row. CommodityConfig.hindcast_init_dates(season_year) converts it to a list of InitDate objects. HindcastSlice.cutoff derives a date(int(fold_label), 1, 1) from the numeric fold label.
Destroyed: Never destroyed; once written into a parquet or CSV, the integer year is an immutable data attribute.
Relationships to other entities¶
- Commodity — parameterises —
CommodityConfig.season_start_date(season_year)converts the integer to a commodity-specific calendar date - InitDate — partitions — a single season year has many in-season init dates;
CommodityConfig.hindcast_init_dates(season_year)returns the full weekly grid - Fold — encoded in — numeric fold labels are
str(season_year);HindcastSlice.cutoffdecodes back todate(season_year, 1, 1) - Yield — indexes — every yield measurement or prediction is keyed by
(season_year, geo_identifier)or(season_year, geo_identifier, init_date)
Concepts and pipelines that touch this entity¶
- Pipeline: hindcast (P5) —
ExpandingFoldGeneratoriteratestest_years; each iteration defines one fold's training window - Pipeline: forecast (P5) —
ForecastSliceis identified by(season_year, init_date); PR-369 enabled multiple season years per init date - Concept: walk-forward CV (P5) — the expanding-window cut always falls on
date(season_year, 1, 1)
PRs and commits¶
- PR-369 — Extends
run_forecastto accept multipleseason_yearvalues perinit_date; restructuresrun_dir/forecast/from{init_date}/to{season_year}/{init_date}/to avoid path collisions - PR-339 — Structural refactor that canonicalised
yearas the parquet index column name throughout the package
Open questions¶
- Should
SeasonYearbe promoted to aNewType("SeasonYear", int)for type-safety, similar toGeoIdentifier? - The
feature_start_year/feature_end_yearrange onExperimentConfigis not validated againsttest_years; a misconfigured protocol could request folds outside the feature panel. - Long-range forecast mode (PR-369) can target
season_yearvalues many years ahead of the current calendar year; the plausibility horizon is currently unconstrained at the config level. - Cross-year crops store the harvest year as
SeasonYearbut the season starts in the prior calendar year — this can be surprising when filtering raw data by calendar year before the DOY pivot.