Skip to content

Detrender

Definition

A Detrender removes the long-run yield trend from a county-level feature table before regression, then re-applies it to convert detrended model predictions back to absolute yield scale. Every detrender inherits from AbstractDetrend (models/detrend/base.py:21) and operates in two phases:

  1. Fit phasefit_transform(features) estimates per-state or per-county trend parameters on the training panel and writes a target_detrended_col column to the returned DataFrame.
  2. Apply phasetransform(features) applies the fitted trend at predict time; inverse_transform(features, y_detrended) converts the regressor's detrended output back to kg/ha.

Fitted state is serialised to detrender.pkl under models/{commodity}/{fold_label}/ so downstream stages can reconstruct the detrender without re-fitting.

Kind

ABC (AbstractDetrend at models/detrend/base.py:21).

Source of truth

market_insights_models/src/commodity_hindcast/models/detrend/base.py:21

Trend axis convention

All three detrenders share a single project-wide singleton TREND_AXIS (models/detrend/time_axis.py:91):

  • Epoch: 1980-01-01
  • Unit: year (fractional calendar years, using 365.25 days/year for leap-year-exact conversion)

TREND_AXIS.to_x(years) (time_axis.py:69) converts integer calendar years to the x-axis values used in OLS fits. Slopes are always reported in yield-per-calendar-year units, matching the QUBE-sprint TrendFitter convention. Conversion helpers slope_per_year_to_native / slope_native_to_per_year (time_axis.py:79–85) are exact inverses.

Required interface

Abstract member Signature Purpose
fit_transform (features: DataFrame) → DataFrame Fit on training data; add target_detrended_col to a copy.
transform (features: DataFrame) → DataFrame Apply fitted trend; raises RuntimeError if not fitted.
inverse_transform (features: DataFrame, y_detrended: ndarray) → Series Re-trend detrended predictions to yield scale.
fitted_yield_series (features: DataFrame) → Series Model-implied trend level per row (for diagnostics).
is_fitted @property → bool Guards transform and save.
target_yield_column @property → str Raw yield column name in feature tables.
save (path: Path) → None Serialise fitted state; path supports s3:// via vfs.local_context.
load classmethod (path, config) → Self Reconstruct fitted detrender from disk.

Non-abstract concrete helpers on the base:

  • target_detrended_column (base.py:44) — delegates to config.commodity.target_detrended_col.
  • state_column (base.py:48) — returns the fixed string "state_name".

validate_detrender (base.py:77) checks that an object is an AbstractDetrend instance and raises TypeError otherwise.

Concrete implementations

Class Config key File Granularity Trend model Config-driven params When to use
LinearStateDetrend linear_state linear_state_detrend.py:35 State Per-state area-weighted OLS line (numpy.polyfit) over TREND_AXIS.to_x None Fast training baseline; appropriate where a linear trajectory is a reasonable approximation
GaussianWindowStateDetrend gaussian_state gaussian_window_state_detrend.py:117 State Per-state Gaussian-smoothed area-weighted annual mean; optional causal (half_left) kernel gaussian_sigma=4.0, gaussian_truncate=8.0, gaussian_mode="nearest", gaussian_kernel="symmetric", mean_yield_q_lo=0.25, mean_yield_q_hi=0.75 Non-linear technology trajectories; use half_left kernel near train/test boundary
PartialPoolingDetrend partial_pooling partial_pooling_detrend.py:206 County Empirical-Bayes (James-Stein) hierarchical OLS; national prior shrinks noisy counties min_obs=3, detrend_fixed_slope_bu_ac=None, mad_national_modified_z=None Recommended default; counties with few observations borrow strength from the national prior

Factory: build_detrender(config) at models/detrend/build.py:54 reads config.model.detrend and dispatches. Only partial_pooling accepts detrend_params from config; allowlisted keys are min_obs, detrend_fixed_slope_bu_ac, and mad_national_modified_z. Unknown keys emit a warning. Any other value raises ValueError.

National fallback: All three implementations use NationalFallbackTrendImputer (lib/edit_and_imputation/imputation.py:63) for rows whose state or county was not seen during training.

PartialPoolingDetrend diagnostics: The .parameters property (partial_pooling_detrend.py:159) returns a cached DataFrame with one row per county: slope_per_year, intercept, national_slope_per_year, raw_slope_per_year, eb_lambda, train_years, n_train_years, train_year_min, train_year_max. All slopes are expressed via TREND_AXIS.slope_native_to_per_year.

Lifecycle

Instantiation: build_detrender(config) at models/detrend/build.py:54, called from ExperimentConfig.build_detrender() (config.py:682) or directly by stage code.

Fit: run_fit calls fit_transform(train_features) on the training panel, producing a copy of the feature table with target_detrended_col added. The regressor then trains on this detrended column.

Persist: detrender.save(slice.detrender_path) writes detrender.pkl to models/{commodity}/{fold_label}/. S3 paths are handled transparently via vfs.local_context.

Rehydrate: HindcastSlice.load_detrender(config) dispatches on config.model.detrend key to call the correct class's load classmethod (results_slice.py:190).

Apply at predict: models/regression/runtime.predict calls detrender.inverse_transform(data_regression, sim_detrended) (runtime.py:176) to convert the regressor's output back to absolute kg/ha.

Tear-down: None — fitted state lives only in the persisted pickle.

Relationships

  • AbstractRegressionImpl (Regressor) — receives the detrended feature table produced by fit_transform; its output passes through inverse_transform on the way back.
  • ExperimentConfig — carries model.detrend (dispatch key), model.detrend_params (constructor kwargs for partial_pooling), and commodity.target_detrended_col.
  • HindcastSlice — owns detrender_path; provides load_detrender(config).
  • NationalFallbackTrendImputer (lib/edit_and_imputation/imputation.py:63) — shared fallback for unseen geographies across all three implementations.
  • TREND_AXIS (models/detrend/time_axis.py:91) — global singleton owning the epoch and unit; all OLS fits reference it.

Concepts and pipelines

  • source_detrend — detailed module-level breakdown.
  • Walk-forward cross-validation — the detrender is re-fitted on each fold's training data; a new detrender.pkl is written per fold.
  • target_detrended_col — the column written by fit_transform and read by the regressor; name comes from config.commodity.target_detrended_col.

PRs and commits

No detrender-specific PRs identified in the recent log. The GaussianWindowStateDetrend design is intentionally aligned with the legacy WASDE FittedYieldDetrendStateMA (gaussian_window_state_detrend.py:117–131).

Open questions

  • LinearStateDetrend and GaussianWindowStateDetrend operate at state granularity while PartialPoolingDetrend operates at county granularity. There is no mixed-granularity mode.
  • GaussianWindowStateDetrend indexes smoothed series by integer year directly, not via TREND_AXIS.to_x — the trend is looked up by year rather than interpolated at a fractional x value.
  • detrend_fixed_slope_bu_ac in PartialPoolingDetrend overrides the EB prior entirely; this hard-pins the national trend regardless of observed data and should be used with caution.