← Back to learnwithForhad

Why XGBoost in R Outperforms ARIMA for Retail Demand Forecasting

A practical comparison using modeltime and real retail data — when to use statistical methods vs. gradient-boosted trees in the tidymodels framework.


The Problem

Retail demand forecasting is messy. You've got seasonality, promotions, holidays, stockouts, and dozens of external signals that ARIMA simply can't handle natively. In this article, we'll walk through a real-world comparison using R's modeltime ecosystem.

Setup

We'll use tidymodels + modeltime for a unified workflow:

library(tidymodels)
library(modeltime)
library(timetk)

# Load retail data
retail_data <- walmart_sales_weekly %>%
  filter(id == "1_1") %>%
  select(date = Date, value = Weekly_Sales)

ARIMA Baseline

ARIMA (auto_arima via modeltime) is the classic go-to. It handles trend and seasonality through differencing and seasonal components. But it treats forecasting as a univariate problem — one series, one model, no external features.

model_arima <- arima_reg() %>%
  set_engine("auto_arima") %>%
  fit(value ~ date, data = training_data)

XGBoost with Feature Engineering

XGBoost, on the other hand, thrives on features. Calendar features (day of week, month, week of year), lag features, rolling means, and external regressors like promotions or weather — all of these are trivially added with timetk.

recipe_xgb <- recipe(value ~ date, data = training_data) %>%
  step_timeseries_signature(date) %>%
  step_rm(date) %>%
  step_zv(all_predictors()) %>%
  step_normalize(all_numeric_predictors())

model_xgb <- boost_tree(trees = 500, learn_rate = 0.05) %>%
  set_engine("xgboost") %>%
  fit(value ~ ., data = juice(prep(recipe_xgb)))

Results

On our Walmart weekly sales test set, XGBoost achieved a 23% lower RMSE than ARIMA. The biggest gains came from:

When ARIMA Still Wins

ARIMA isn't dead. For clean, low-noise, univariate series with strong seasonal patterns (e.g., monthly electricity consumption), ARIMA is simpler, faster, and often good enough. The key insight: use ARIMA as your baseline, then beat it with ML when you have features.

Takeaway

The modeltime framework makes it trivially easy to compare both approaches in a single workflow. Start with ARIMA, engineer features, train XGBoost, compare on the same test set. Let the data decide.