Detrender¶
Definition¶
A Detrender removes the long-run yield trend from a county-level feature table before regression, then re-applies it to convert detrended model predictions back to absolute yield scale. Every detrender inherits from AbstractDetrend (models/detrend/base.py:21) and operates in two phases:
- Fit phase —
fit_transform(features)estimates per-state or per-county trend parameters on the training panel and writes atarget_detrended_colcolumn to the returnedDataFrame. - Apply phase —
transform(features)applies the fitted trend at predict time;inverse_transform(features, y_detrended)converts the regressor's detrended output back to kg/ha.
Fitted state is serialised to detrender.pkl under models/{commodity}/{fold_label}/ so downstream stages can reconstruct the detrender without re-fitting.
Kind¶
ABC (AbstractDetrend at models/detrend/base.py:21).
Source of truth¶
market_insights_models/src/commodity_hindcast/models/detrend/base.py:21
Trend axis convention¶
All three detrenders share a single project-wide singleton TREND_AXIS (models/detrend/time_axis.py:91):
- Epoch:
1980-01-01 - Unit:
year(fractional calendar years, using 365.25 days/year for leap-year-exact conversion)
TREND_AXIS.to_x(years) (time_axis.py:69) converts integer calendar years to the x-axis values used in OLS fits. Slopes are always reported in yield-per-calendar-year units, matching the QUBE-sprint TrendFitter convention. Conversion helpers slope_per_year_to_native / slope_native_to_per_year (time_axis.py:79–85) are exact inverses.
Required interface¶
| Abstract member | Signature | Purpose |
|---|---|---|
fit_transform |
(features: DataFrame) → DataFrame |
Fit on training data; add target_detrended_col to a copy. |
transform |
(features: DataFrame) → DataFrame |
Apply fitted trend; raises RuntimeError if not fitted. |
inverse_transform |
(features: DataFrame, y_detrended: ndarray) → Series |
Re-trend detrended predictions to yield scale. |
fitted_yield_series |
(features: DataFrame) → Series |
Model-implied trend level per row (for diagnostics). |
is_fitted |
@property → bool |
Guards transform and save. |
target_yield_column |
@property → str |
Raw yield column name in feature tables. |
save |
(path: Path) → None |
Serialise fitted state; path supports s3:// via vfs.local_context. |
load |
classmethod (path, config) → Self |
Reconstruct fitted detrender from disk. |
Non-abstract concrete helpers on the base:
target_detrended_column(base.py:44) — delegates toconfig.commodity.target_detrended_col.state_column(base.py:48) — returns the fixed string"state_name".
validate_detrender (base.py:77) checks that an object is an AbstractDetrend instance and raises TypeError otherwise.
Concrete implementations¶
| Class | Config key | File | Granularity | Trend model | Config-driven params | When to use |
|---|---|---|---|---|---|---|
LinearStateDetrend |
linear_state |
linear_state_detrend.py:35 |
State | Per-state area-weighted OLS line (numpy.polyfit) over TREND_AXIS.to_x |
None | Fast training baseline; appropriate where a linear trajectory is a reasonable approximation |
GaussianWindowStateDetrend |
gaussian_state |
gaussian_window_state_detrend.py:117 |
State | Per-state Gaussian-smoothed area-weighted annual mean; optional causal (half_left) kernel |
gaussian_sigma=4.0, gaussian_truncate=8.0, gaussian_mode="nearest", gaussian_kernel="symmetric", mean_yield_q_lo=0.25, mean_yield_q_hi=0.75 |
Non-linear technology trajectories; use half_left kernel near train/test boundary |
PartialPoolingDetrend |
partial_pooling |
partial_pooling_detrend.py:206 |
County | Empirical-Bayes (James-Stein) hierarchical OLS; national prior shrinks noisy counties | min_obs=3, detrend_fixed_slope_bu_ac=None, mad_national_modified_z=None |
Recommended default; counties with few observations borrow strength from the national prior |
Factory: build_detrender(config) at models/detrend/build.py:54 reads config.model.detrend and dispatches. Only partial_pooling accepts detrend_params from config; allowlisted keys are min_obs, detrend_fixed_slope_bu_ac, and mad_national_modified_z. Unknown keys emit a warning. Any other value raises ValueError.
National fallback: All three implementations use NationalFallbackTrendImputer (lib/edit_and_imputation/imputation.py:63) for rows whose state or county was not seen during training.
PartialPoolingDetrend diagnostics: The .parameters property (partial_pooling_detrend.py:159) returns a cached DataFrame with one row per county: slope_per_year, intercept, national_slope_per_year, raw_slope_per_year, eb_lambda, train_years, n_train_years, train_year_min, train_year_max. All slopes are expressed via TREND_AXIS.slope_native_to_per_year.
Lifecycle¶
Instantiation: build_detrender(config) at models/detrend/build.py:54, called from ExperimentConfig.build_detrender() (config.py:682) or directly by stage code.
Fit: run_fit calls fit_transform(train_features) on the training panel, producing a copy of the feature table with target_detrended_col added. The regressor then trains on this detrended column.
Persist: detrender.save(slice.detrender_path) writes detrender.pkl to models/{commodity}/{fold_label}/. S3 paths are handled transparently via vfs.local_context.
Rehydrate: HindcastSlice.load_detrender(config) dispatches on config.model.detrend key to call the correct class's load classmethod (results_slice.py:190).
Apply at predict: models/regression/runtime.predict calls detrender.inverse_transform(data_regression, sim_detrended) (runtime.py:176) to convert the regressor's output back to absolute kg/ha.
Tear-down: None — fitted state lives only in the persisted pickle.
Relationships¶
AbstractRegressionImpl(Regressor) — receives the detrended feature table produced byfit_transform; its output passes throughinverse_transformon the way back.ExperimentConfig— carriesmodel.detrend(dispatch key),model.detrend_params(constructor kwargs forpartial_pooling), andcommodity.target_detrended_col.HindcastSlice— ownsdetrender_path; providesload_detrender(config).NationalFallbackTrendImputer(lib/edit_and_imputation/imputation.py:63) — shared fallback for unseen geographies across all three implementations.TREND_AXIS(models/detrend/time_axis.py:91) — global singleton owning the epoch and unit; all OLS fits reference it.
Concepts and pipelines¶
- source_detrend — detailed module-level breakdown.
- Walk-forward cross-validation — the detrender is re-fitted on each fold's training data; a new
detrender.pklis written per fold. target_detrended_col— the column written byfit_transformand read by the regressor; name comes fromconfig.commodity.target_detrended_col.
PRs and commits¶
No detrender-specific PRs identified in the recent log. The GaussianWindowStateDetrend design is intentionally aligned with the legacy WASDE FittedYieldDetrendStateMA (gaussian_window_state_detrend.py:117–131).
Open questions¶
LinearStateDetrendandGaussianWindowStateDetrendoperate at state granularity whilePartialPoolingDetrendoperates at county granularity. There is no mixed-granularity mode.GaussianWindowStateDetrendindexes smoothed series by integer year directly, not viaTREND_AXIS.to_x— the trend is looked up by year rather than interpolated at a fractional x value.detrend_fixed_slope_bu_acinPartialPoolingDetrendoverrides the EB prior entirely; this hard-pins the national trend regardless of observed data and should be used with caution.