Skip to content

Entity: SeasonYear

Definition

The crop-year label — an integer identifying the harvest year. For same-year crops (corn, soybeans, cotton) SeasonYear = 2023 means the season whose harvest falls in calendar year 2023. For cross-year crops (winter wheat, brazil soybean) the season spans two calendar years; SeasonYear still names the harvest year, and season_start_year_offset = -1 shifts the season-start anchor back by one calendar year. SeasonYear is not a class; it is a plain int used as a join key throughout the pipeline.

Kind

Conceptual identifier (plain int). No dedicated class or NewType alias exists in the codebase. It is carried as year: int in every feature parquet index column, fold directory name, and delivery CSV row.

Source of truth

market_insights_models/src/commodity_hindcast/config.py:375CommodityConfig.season_start_date(season_year: int) -> date is the canonical conversion from an integer year to a calendar anchor. config.py:285season_start_year_offset controls cross-year season handling. config.py:483ExperimentProtocolConfig.test_years: list[int] enumerates the season years used as fold labels.

Key attributes / structure

Attribute Type Notes
Integer year value int E.g. 2023; no lower / upper bound enforced in code
season_start_date(y) date CommodityConfig.season_start.in_year(y + season_start_year_offset)
harvest_date(y) date CommodityConfig.to_date(harvest_season_doy, y)
Feature parquet column year (int) Second dimension of the (year, geo_identifier, init_date) index triple
Fold label str(season_year) Numeric folds encode season_year as a string, e.g. "2020"

Cardinality: One integer per harvest year modelled. Hindcast runs span feature_start_year to feature_end_year (fields on ExperimentConfig, defaulting 19802025). Forecast runs target one or more future season years per init_date (PR-369 introduced multi-year outlooks).

Cross-year note: season_doy (int, days since season start, can exceed 365) is distinct from calendar_doy (1–366). A SeasonYear maps to the half-open interval [season_start_date(y), harvest_date(y)] on the calendar.

Lifecycle

Created: Enumerated by ExpandingFoldGenerator from ExperimentProtocolConfig.test_years during walk-forward hindcast (run/experiment_protocol.py:138). In forecast mode, one or more season_year values are passed as CLI arguments (--season-year) and injected into ForecastSlice.

Consumed: Used as a join key in every feature parquet, prediction parquet, and delivery CSV row. CommodityConfig.hindcast_init_dates(season_year) converts it to a list of InitDate objects. HindcastSlice.cutoff derives a date(int(fold_label), 1, 1) from the numeric fold label.

Destroyed: Never destroyed; once written into a parquet or CSV, the integer year is an immutable data attribute.

Relationships to other entities

  • Commodity — parameterises — CommodityConfig.season_start_date(season_year) converts the integer to a commodity-specific calendar date
  • InitDate — partitions — a single season year has many in-season init dates; CommodityConfig.hindcast_init_dates(season_year) returns the full weekly grid
  • Fold — encoded in — numeric fold labels are str(season_year); HindcastSlice.cutoff decodes back to date(season_year, 1, 1)
  • Yield — indexes — every yield measurement or prediction is keyed by (season_year, geo_identifier) or (season_year, geo_identifier, init_date)

Concepts and pipelines that touch this entity

  • Pipeline: hindcast (P5) — ExpandingFoldGenerator iterates test_years; each iteration defines one fold's training window
  • Pipeline: forecast (P5) — ForecastSlice is identified by (season_year, init_date); PR-369 enabled multiple season years per init date
  • Concept: walk-forward CV (P5) — the expanding-window cut always falls on date(season_year, 1, 1)

PRs and commits

  • PR-369 — Extends run_forecast to accept multiple season_year values per init_date; restructures run_dir/forecast/ from {init_date}/ to {season_year}/{init_date}/ to avoid path collisions
  • PR-339 — Structural refactor that canonicalised year as the parquet index column name throughout the package

Open questions

  • Should SeasonYear be promoted to a NewType("SeasonYear", int) for type-safety, similar to GeoIdentifier?
  • The feature_start_year / feature_end_year range on ExperimentConfig is not validated against test_years; a misconfigured protocol could request folds outside the feature panel.
  • Long-range forecast mode (PR-369) can target season_year values many years ahead of the current calendar year; the plausibility horizon is currently unconstrained at the config level.
  • Cross-year crops store the harvest year as SeasonYear but the season starts in the prior calendar year — this can be surprising when filtering raw data by calendar year before the DOY pivot.