Skip to content

domain-modelling/schema.yaml — LinkML Domain Schema

Purpose

Auto-generated from market_insights_models.src.commodity_hindcast/. Captures the structural shape of every config and runtime class in the package as LinkML classes and enums. As stated in the file header: "Structural only. Editorial information (role, bounded context, identifier) belongs in human-maintained Markdown."

Schema URI: https://treefera.com/schema/commodity-hindcast

Config classes

These classes map directly to YAML config stanzas:

Class Source module Key attributes
ExperimentConfig config data_root, random_seed, mlflow_tracking_uri, experiment_name, commodity, feature_start_year, feature_end_year, experiment_protocol, model, postprocess, delivery, forecast (optional)
CommodityConfig config commodity, season_start, season_start_year_offset, harvest_season_doy, hindcast_init_season_doys, bushel_weight_lbs, delivery_unit, yield_range, feature_cols, climo_windows, weather_windows, builders
ExperimentProtocolConfig config cv_strategy, test_years, production_cumulative_threshold, production_recent_years
ModelConfig config detrend, detrend_params, regression, regression_params, weather_correction_fit_level, use_sample_weights, weight_column
PostprocessConfig config bias_corrector (→ BiasCorrectorConfig)
BiasCorrectorConfig config kind (coverage | none), n_lookback_years, reduction_method
DeliveryConfig config model_public_name, ci_levels
ForecastConfig config raw_obs_filepath, materialised_climo_filepath, init_date (optional, runtime-injected)
EvaluationConfig config wasde_path
SeasonWindow config name, sdoy_start, sdoy_end (optional)
MonthDay config month, day
AssembleStressConfig config indices_zarr, gs_start_doy, gs_end_doy, baseline_start, baseline_end, overwrite

Builder classes

Class Source module Description
YieldsBuilder config Per-commodity yields parquet reader; county_col, state_col, production_col, optional crop_type, edits list
WeatherBuilder config Weather-indices zarr reader; filepath, geo_id_col, time_dim
ClimoBuilder config Climatology-indices zarr reader; optional geo-id remap via geo_lookup_* fields
StressBuilder config Stress parquet reader; can regenerate from indices zarr via assemble_stress_from_indices; rename_map
NDVIBuilder config NDVI reader; county_col, statefp_col

Edit and imputation classes

Class Source module Description
NullImputeRule edit Fires when target is null
PanelNullImputeRule edit Null-impute at panel level (group_by, order_by)
RatioEditRule edit Fires when target / derive outside [1/tol, tol]
RangeEditRule edit Fires when target outside [min, max]
DeductiveImpute edit Replace target via pandas.DataFrame.eval(source)
PanelTrailingMedian edit Impute via trailing-median helper
Clip edit Winsorise to [min, max]
Drop edit Remove firing rows
Flag edit Record fire in report; leave value untouched
Fail edit Raise ValueError if any row fires
EditReport edit Per-rule fire counts and boolean flag frame

Runtime/result classes

Class Source module Description
ExperimentResult lib.results.run_result Loaded from a run directory; handover between pipeline stages
HindcastSlice lib.results.results_slice Lazy handle to one hindcast fold's artefacts on disk
ForecastSlice lib.results.results_slice Canonical path + loader for one (season_year, init_date) slice
PredictionInputs stages.run_predict Pure data struct for one prediction run (no I/O handles)
HindcastDelivery delivery.schemas Container for a complete delivery dataset
DeliveryRow delivery.schemas Single row in the hindcast delivery CSV; 10 CI bound columns
Check run.preflight Result of one preflight check (name, passed, message, critical)

Other domain classes

Class Source module Description
TrendAxis models.detrend.time_axis x-axis convention for yield-trend detrenters (epoch, unit)
FitAggregationPolicy lib.geo.aggregation Weather correction fit level + weight/geo columns
StressConfig features.builders.stress_compute min_std, z_clip, scaling_factor
StressVariable features.builders.stress_compute Name, invert flag, weight for composite stress score
PlotGroup diagnostics.plots.registry Plots sharing a data-preparation step
PlotSpec diagnostics.plots.registry One plot function, filename template, kwargs resolver
GroupPartition lib.edit_and_imputation.imputation Groups split by valid-observation count

Enumerations

Enum Permissible values
Enum_ADM0_ADM1_ADM2 ADM0, ADM1, ADM2
Enum_coverage_none coverage, none
Enum_gaussian_state_linear_state_partial_pooling gaussian_state, linear_state, partial_pooling
Enum_pca_ridge_ridge_xgboost pca_ridge, ridge, xgboost
Enum_day_year day, year

Notable observations

  • ForecastConfig.init_date is required: false — it is injected at runtime by the CLI, not present in the static YAML configs.
  • ExperimentConfig has an evaluation field pointing to EvaluationConfig (containing only wasde_path), but the actual YAML configs use a reference_data list with typed specs (kind: wasde, kind: conab_levantamento, etc.). The schema lags the config evolution here.
  • The DeliveryRow class exposes 10 optional CI-bound columns (lower_95 through upper_95), matching the 5 ci_levels in DeliveryConfig.
  • xgboost appears as a valid regression enum value, though no current commodity config uses it.

Cross-references