Feature Engineering for Time Series in R with timetk
A recipe-based approach to creating calendar features, lags, rolling windows, and Fourier terms for ML-based time series forecasting in R.
Why Feature Engineering Matters
Statistical models like ARIMA handle time series natively — they understand lags and seasonality internally. But ML models (XGBoost, Random Forest, neural nets) see your data as a flat table. They have no idea what "last Tuesday" or "same month last year" means unless you tell them.
That's where timetk comes in. It's the feature engineering backbone of the
modeltime ecosystem, and it integrates beautifully with tidymodels recipes.
Calendar Features
The simplest win: extract date components. step_timeseries_signature() does
this automatically:
library(tidymodels)
library(timetk)
recipe_ts <- recipe(sales ~ date, data = train) %>%
step_timeseries_signature(date)
# This creates 25+ features automatically:
# date_year, date_quarter, date_month, date_wday,
# date_week, date_mday, date_hour, etc.
Not all of these are useful. Remove zero-variance and highly correlated ones:
recipe_ts <- recipe_ts %>%
step_rm(date) %>%
step_zv(all_predictors()) %>%
step_normalize(all_numeric_predictors())
Lag Features
The most powerful time series features. "What was the value 7 days ago? 14 days ago? 365 days ago?"
recipe_lags <- recipe(sales ~ date + sales, data = train) %>%
step_lag(sales, lag = c(7, 14, 21, 28, 365)) %>%
step_naomit(all_predictors())
step_naomit() or set a proper skip window.
Rolling Window Statistics
Rolling means and standard deviations capture momentum and volatility:
recipe_rolling <- recipe(sales ~ date + sales, data = train) %>%
step_slidify(
sales,
period = c(7, 28),
.f = mean,
align = "right", # use only past data
names = "rolling_mean"
) %>%
step_slidify(
sales,
period = 28,
.f = sd,
align = "right",
names = "rolling_sd"
)
Fourier Terms
For capturing complex seasonality (daily + weekly + yearly), Fourier terms are more efficient than one-hot encoding every possible seasonal period:
recipe_fourier <- recipe(sales ~ date, data = train) %>%
step_fourier(date, period = c(7, 365), K = c(3, 5))
# This creates sin/cos pairs:
# date_sin7_K1, date_cos7_K1, ... date_sin365_K5, date_cos365_K5
Higher K values capture more complex seasonality patterns but risk overfitting.
Start with K = 3 for weekly, K = 5 for yearly.
Putting It All Together
Combine everything into one recipe:
full_recipe <- recipe(sales ~ ., data = train) %>%
step_timeseries_signature(date) %>%
step_fourier(date, period = c(7, 365), K = c(3, 5)) %>%
step_lag(sales, lag = c(7, 14, 28)) %>%
step_slidify(sales, period = 7, .f = mean, align = "right") %>%
step_rm(date) %>%
step_zv(all_predictors()) %>%
step_normalize(all_numeric_predictors()) %>%
step_naomit(all_predictors())
# Plug into any tidymodels workflow
wf <- workflow() %>%
add_recipe(full_recipe) %>%
add_model(boost_tree() %>% set_engine("xgboost")) %>%
fit(train)
Takeaway
Feature engineering is where ML models win or lose in time series. The timetk +
recipes combo gives you a declarative, reproducible pipeline. Calendar features
are free, lags are essential, rolling stats add context, and Fourier terms handle multi-seasonality
elegantly.