domain-modelling/schema.yaml — LinkML Domain Schema
Purpose
Auto-generated from market_insights_models.src.commodity_hindcast/. Captures the structural shape of every config and runtime class in the package as LinkML classes and enums. As stated in the file header: "Structural only. Editorial information (role, bounded context, identifier) belongs in human-maintained Markdown."
Schema URI: https://treefera.com/schema/commodity-hindcast
Config classes
These classes map directly to YAML config stanzas:
| Class |
Source module |
Key attributes |
ExperimentConfig |
config |
data_root, random_seed, mlflow_tracking_uri, experiment_name, commodity, feature_start_year, feature_end_year, experiment_protocol, model, postprocess, delivery, forecast (optional) |
CommodityConfig |
config |
commodity, season_start, season_start_year_offset, harvest_season_doy, hindcast_init_season_doys, bushel_weight_lbs, delivery_unit, yield_range, feature_cols, climo_windows, weather_windows, builders |
ExperimentProtocolConfig |
config |
cv_strategy, test_years, production_cumulative_threshold, production_recent_years |
ModelConfig |
config |
detrend, detrend_params, regression, regression_params, weather_correction_fit_level, use_sample_weights, weight_column |
PostprocessConfig |
config |
bias_corrector (→ BiasCorrectorConfig) |
BiasCorrectorConfig |
config |
kind (coverage | none), n_lookback_years, reduction_method |
DeliveryConfig |
config |
model_public_name, ci_levels |
ForecastConfig |
config |
raw_obs_filepath, materialised_climo_filepath, init_date (optional, runtime-injected) |
EvaluationConfig |
config |
wasde_path |
SeasonWindow |
config |
name, sdoy_start, sdoy_end (optional) |
MonthDay |
config |
month, day |
AssembleStressConfig |
config |
indices_zarr, gs_start_doy, gs_end_doy, baseline_start, baseline_end, overwrite |
Builder classes
| Class |
Source module |
Description |
YieldsBuilder |
config |
Per-commodity yields parquet reader; county_col, state_col, production_col, optional crop_type, edits list |
WeatherBuilder |
config |
Weather-indices zarr reader; filepath, geo_id_col, time_dim |
ClimoBuilder |
config |
Climatology-indices zarr reader; optional geo-id remap via geo_lookup_* fields |
StressBuilder |
config |
Stress parquet reader; can regenerate from indices zarr via assemble_stress_from_indices; rename_map |
NDVIBuilder |
config |
NDVI reader; county_col, statefp_col |
Edit and imputation classes
| Class |
Source module |
Description |
NullImputeRule |
edit |
Fires when target is null |
PanelNullImputeRule |
edit |
Null-impute at panel level (group_by, order_by) |
RatioEditRule |
edit |
Fires when target / derive outside [1/tol, tol] |
RangeEditRule |
edit |
Fires when target outside [min, max] |
DeductiveImpute |
edit |
Replace target via pandas.DataFrame.eval(source) |
PanelTrailingMedian |
edit |
Impute via trailing-median helper |
Clip |
edit |
Winsorise to [min, max] |
Drop |
edit |
Remove firing rows |
Flag |
edit |
Record fire in report; leave value untouched |
Fail |
edit |
Raise ValueError if any row fires |
EditReport |
edit |
Per-rule fire counts and boolean flag frame |
Runtime/result classes
| Class |
Source module |
Description |
ExperimentResult |
lib.results.run_result |
Loaded from a run directory; handover between pipeline stages |
HindcastSlice |
lib.results.results_slice |
Lazy handle to one hindcast fold's artefacts on disk |
ForecastSlice |
lib.results.results_slice |
Canonical path + loader for one (season_year, init_date) slice |
PredictionInputs |
stages.run_predict |
Pure data struct for one prediction run (no I/O handles) |
HindcastDelivery |
delivery.schemas |
Container for a complete delivery dataset |
DeliveryRow |
delivery.schemas |
Single row in the hindcast delivery CSV; 10 CI bound columns |
Check |
run.preflight |
Result of one preflight check (name, passed, message, critical) |
Other domain classes
| Class |
Source module |
Description |
TrendAxis |
models.detrend.time_axis |
x-axis convention for yield-trend detrenters (epoch, unit) |
FitAggregationPolicy |
lib.geo.aggregation |
Weather correction fit level + weight/geo columns |
StressConfig |
features.builders.stress_compute |
min_std, z_clip, scaling_factor |
StressVariable |
features.builders.stress_compute |
Name, invert flag, weight for composite stress score |
PlotGroup |
diagnostics.plots.registry |
Plots sharing a data-preparation step |
PlotSpec |
diagnostics.plots.registry |
One plot function, filename template, kwargs resolver |
GroupPartition |
lib.edit_and_imputation.imputation |
Groups split by valid-observation count |
Enumerations
| Enum |
Permissible values |
Enum_ADM0_ADM1_ADM2 |
ADM0, ADM1, ADM2 |
Enum_coverage_none |
coverage, none |
Enum_gaussian_state_linear_state_partial_pooling |
gaussian_state, linear_state, partial_pooling |
Enum_pca_ridge_ridge_xgboost |
pca_ridge, ridge, xgboost |
Enum_day_year |
day, year |
Notable observations
ForecastConfig.init_date is required: false — it is injected at runtime by the CLI, not present in the static YAML configs.
ExperimentConfig has an evaluation field pointing to EvaluationConfig (containing only wasde_path), but the actual YAML configs use a reference_data list with typed specs (kind: wasde, kind: conab_levantamento, etc.). The schema lags the config evolution here.
- The
DeliveryRow class exposes 10 optional CI-bound columns (lower_95 through upper_95), matching the 5 ci_levels in DeliveryConfig.
xgboost appears as a valid regression enum value, though no current commodity config uses it.
Cross-references