← Back to learnwithForhad

Feature Engineering for Time Series in R with timetk

A recipe-based approach to creating calendar features, lags, rolling windows, and Fourier terms for ML-based time series forecasting in R.


Why Feature Engineering Matters

Statistical models like ARIMA handle time series natively — they understand lags and seasonality internally. But ML models (XGBoost, Random Forest, neural nets) see your data as a flat table. They have no idea what "last Tuesday" or "same month last year" means unless you tell them.

That's where timetk comes in. It's the feature engineering backbone of the modeltime ecosystem, and it integrates beautifully with tidymodels recipes.

Calendar Features

The simplest win: extract date components. step_timeseries_signature() does this automatically:

library(tidymodels)
library(timetk)

recipe_ts <- recipe(sales ~ date, data = train) %>%
  step_timeseries_signature(date)

# This creates 25+ features automatically:
# date_year, date_quarter, date_month, date_wday,
# date_week, date_mday, date_hour, etc.

Not all of these are useful. Remove zero-variance and highly correlated ones:

recipe_ts <- recipe_ts %>%
  step_rm(date) %>%
  step_zv(all_predictors()) %>%
  step_normalize(all_numeric_predictors())

Lag Features

The most powerful time series features. "What was the value 7 days ago? 14 days ago? 365 days ago?"

recipe_lags <- recipe(sales ~ date + sales, data = train) %>%
  step_lag(sales, lag = c(7, 14, 21, 28, 365)) %>%
  step_naomit(all_predictors())
⚠️ Watch out for data leakage! Lag features must be computed before train/test splitting, and you need to handle the NAs at the beginning of the series. Always use step_naomit() or set a proper skip window.

Rolling Window Statistics

Rolling means and standard deviations capture momentum and volatility:

recipe_rolling <- recipe(sales ~ date + sales, data = train) %>%
  step_slidify(
    sales,
    period  = c(7, 28),
    .f      = mean,
    align   = "right",    # use only past data
    names   = "rolling_mean"
  ) %>%
  step_slidify(
    sales,
    period  = 28,
    .f      = sd,
    align   = "right",
    names   = "rolling_sd"
  )

Fourier Terms

For capturing complex seasonality (daily + weekly + yearly), Fourier terms are more efficient than one-hot encoding every possible seasonal period:

recipe_fourier <- recipe(sales ~ date, data = train) %>%
  step_fourier(date, period = c(7, 365), K = c(3, 5))

# This creates sin/cos pairs:
# date_sin7_K1, date_cos7_K1, ... date_sin365_K5, date_cos365_K5

Higher K values capture more complex seasonality patterns but risk overfitting. Start with K = 3 for weekly, K = 5 for yearly.

Putting It All Together

Combine everything into one recipe:

full_recipe <- recipe(sales ~ ., data = train) %>%
  step_timeseries_signature(date) %>%
  step_fourier(date, period = c(7, 365), K = c(3, 5)) %>%
  step_lag(sales, lag = c(7, 14, 28)) %>%
  step_slidify(sales, period = 7, .f = mean, align = "right") %>%
  step_rm(date) %>%
  step_zv(all_predictors()) %>%
  step_normalize(all_numeric_predictors()) %>%
  step_naomit(all_predictors())

# Plug into any tidymodels workflow
wf <- workflow() %>%
  add_recipe(full_recipe) %>%
  add_model(boost_tree() %>% set_engine("xgboost")) %>%
  fit(train)

Takeaway

Feature engineering is where ML models win or lose in time series. The timetk + recipes combo gives you a declarative, reproducible pipeline. Calendar features are free, lags are essential, rolling stats add context, and Fourier terms handle multi-seasonality elegantly.