--- # **Evidence Appendix — Why Smoothing Models and Chronos2 Form the Forecast Anchor in FreshNet** --- ## **A. Portfolio-Level Evidence** All models were evaluated SKU-wise using the bias-aware scoring function: ``` Score = MAE + |Bias| ``` This penalizes models that appear accurate but drift directionally— a critical failure mode in fresh categories where bias inflates waste or drives stockouts. ### **Observed portfolio stability patterns (↓ = more stable)** **Tier A — Lower-Noise Forecast Models** | Model Family | Mean Stability Score (↓ better) | | --------------------------------------- | ------------------------------- | | **DynamicOptimizedTheta** | 66.89 | | **SimpleExponentialSmoothingOptimized** | 67.31 | | **Chronos2** | 67.65 | | **Theta** | 67.68 | | **DynamicTheta** | 67.69 | | **CrostonOptimized / CrostonClassic** | 67.88–68.36 | **Tier B — Acceptable Secondary Models** | Model | Score | | ------------- | ----- | | WindowAverage | 68.59 | | HoltWinters | 71.40 | | Holt | 71.84 | **Tier C — High-Noise / High-Drift Models** | Model | Score | | ------------------- | ----- | | SeasonalNaive | 76.74 | | **LightGBM** | 83.91 | | HistoricAverage | 84.07 | | Naive | 88.83 | | RandomWalkWithDrift | 92.74 | ### **Interpretation** * Tier-A models produce **lower bias and reduced noise** at the portfolio level. * ML (LightGBM), without drivers such as discount, weather, or stockout hours, becomes **unstable**, overreacting to recent noise. * Naive and drift models exaggerate noise and create planning churn. **Conclusion:** FreshNet dynamics favor **noise-dampening methods over signal chasing**, particularly when demand structure is heterogeneous. --- ## **B. SKU-Level Model Decisions** Winner share across all evaluated SKUs: | Tier | Model Families | Share | | ---------- | ------------------------------------------------------------------ | --------- | | **Tier A** | **Theta-family**, **SES/Holt**, **Chronos2**, **Croston variants** | **~65%+** | | Tier B | WindowAverage, HistoricAverage | ~20% | | **Tier C** | LightGBM, Naive, Drift | ~15% | ### **Interpretation** * Winners did **not** cluster around ML models. * The distribution is **skewed toward smoothing-based approaches**, particularly in volatile and intermittent SKUs. * LightGBM wins primarily where behavior is quasi-linear **and** no external drivers are required. These patterns reflect **model–structure alignment**, not algorithmic preference. --- ## **C. Behavioral Regime Analysis** FreshNet SKUs were segmented into three behavioral regimes. Below are **frequently observed stability winners** within each regime. --- ### **1) High-High Regime** *(unstable timing + unstable magnitude)* | Winning Families | | -------------------------------------------------- | | **Theta-family models** | | **SES/Holt smoothing** | | **Chronos2** | | Croston variants (for sparse high-volatility SKUs) | **Observed behavior** * These models dampen volatility without flattening structure. * They avoid overreacting after spikes. * Chronos2 handles mixed signal patterns without strong oscillation. LightGBM frequently overfit recent bursts, leading to poor forward stability. --- ### **2) Low-High Regime** *(regular recurrence, unstable amplitude)* | Winning Families | | ---------------- | | **Holt-Winters** | | **Theta** | | **Chronos2** | | Croston variants | **Observed behavior** * Seasonal regularity supports Holt-Winters performance. * Amplitude spikes are absorbed more effectively by smoothing models than ML. * Chronos2 adapts without repeatedly resetting level after shocks. --- ### **3) Low-Low Regime** *(stable, low-variance items)* | Winning Families | | ---------------------------- | | **SES/Holt/Theta** | | Historic Average (some SKUs) | | Croston (intermittent) | **Observed behavior** * Model choice has lower impact in this regime. * Smoothing models converge to similar baselines. * Chronos2 is neutral — neither dominant nor harmful. --- ## **D. Example SKU-Level Decisions (Traceable)** | SKU Identifier | Stable Winner | | ----------------- | ------------------------- | | CID0_SID0_PID104… | **DynamicOptimizedTheta** | | CID0_SID0_PID118… | **Chronos2** | | CID0_SID0_PID127… | **SES/Holt** | | CID0_SID0_PID319… | **CrostonSBA** | | CID0_SID0_PID229… | **Holt-Winters** | Purpose: * guarantees reproducibility * shows evidence of regime-matched decisions * prevents subjective reinterpretation --- # **What the Evidence Resolves** --- ## **Technically** The evidence demonstrates that: * Theta/SES models **reduce directional drift**, a critical failure mode. * Chronos2 accommodates mixed structure without aggressive overreaction. * Croston preserves stability for zero-heavy SKUs. * LightGBM is unsuitable for fresh categories **without driver data**. ### Stability, when matched to structure, dominates complexity --- ## **Operationally** A stable, structure-aligned anchor model reduces: * excessive overrides * store–planner misalignment * week-to-week forecast resets * spiraling exception handling And enables: * consistent ordering * predictable labor and waste planning * cleaner exception signals --- ## **Economically** Structure-aligned stability reduces: * re-forecasting cycles * waste from positive bias * stockouts from negative bias * planning churn and meeting load These are material cost centers in fresh operations. --- # **Deployment Decision** > **Use Theta-family smoothing and SES/Holt as the default signal where structure is stable.** > **Use Croston methods for intermittent SKUs.** > **Use Chronos2 when demand structure is mixed or uncertain.** > **Introduce LightGBM only once driver data (discounts, stockout hours, weather) is integrated.** Fallbacks are allowed **only** when: 1. a SKU is structurally deterministic (e.g., controlled replenishment) 2. the category is end-of-life 3. required signals are missing 4. governance mandates a deterministic forecast All fallback choices must be recorded in the model selection ledger. --- # **Closing Position** This evidence shows **consistent, structure-conditional patterns**, not a single universally dominant model. **Theta/SES, Croston, and Chronos2 remain operationally stable across FreshNet’s volatile, mixed-pattern, and intermittent regimes when applied appropriately.** They produce forecasts that are not only accurate, but **steady enough to support durable planning decisions**. That is why they form the **anchor set for FreshNet forecasting**, under a regime-aware deployment standard. ---