Exploring ARIMA Models: A Comparative Analysis of Forecasting Electricity Usage with R

The field of time series forecasting has seen significant advancements with the advent of sophisticated statistical modeling techniques. In this context, I would like to share insights from my recent analysis where I compared different ARIMA models on a dataset representing electricity usage in Belgium, leveraging the R programming language and utilizing various packages like forecast
and fable
. This exploration builds upon my earlier post titled A First Look at TimeGPT using nixtlar, where I assessed the performance of ARIMA forecasts against those generated by the TimeGPT model.
In the previous article, I utilized the auto.arima()
function from the forecast
package to fit an ARIMA model to electricity usage data. The objective was to draw comparisons between the ARIMA forecast and that of TimeGPT. During the troubleshooting process of that analysis, I also experimented with the newer fable
package, and the outcomes were unexpectedly revealing. This article aims to delve into those surprising findings, elucidate my investigative process, and discuss the practical implications regarding the issue of ARIMA model identifiability.
To begin, let's introduce the necessary libraries and the dataset we will be focusing on. The libraries include:
tidyverse
forecast
fable
tsibble
nixtlar
(for the electricity data)feasts
Metrics
For the analysis, I will employ the Belgian electricity usage dataset from the nixtlar
package. By visualizing this dataset, it becomes apparent that the time series exhibits cyclic patterns interspersed with periods of significant volatility.
The initial code snippet prepares the dataset by filtering it for the Belgian region, and creating a time series object:
df <- nixtlar::electricitydf2 <- df |> mutate(time = as.POSIXct(ds, format = "%Y-%m-%d %H:%M:%S")) |> filter(unique_id == "BE") |> select(-unique_id, -ds)p <- df2 |> ggplot(aes(x = time, y = y)) + geom_line(color='darkblue') + ggtitle("BE Electricity Usage Data")
This piece of code generates a time series plot that visually represents electricity consumption over time.
Following this step, I split the dataset into training and testing sets. Here, I set aside the last 24 observations for testing purposes:
NF <- 24BE_df_wide <- df |> pivot_wider(names_from = unique_id, values_from = y) |> select(ds, BE) |> drop_na()BE_train_df <- BE_df_wide %>% filter(row_number() <= n() - NF)BE_test_df <- tail(BE_df_wide, NF)
The training dataset, BE_train_df
, is prepared for input into the auto.arima()
function, which requires the data to be structured as a time series object:
train <- BE_train_df |> select(-unique_id) |> mutate(time = 1:length(ds)) |> select(-ds)elec_ts <- ts(train$y, frequency = 24)
Upon executing the ARIMA fitting process, it was observed that the forecast::arima()
function estimated an ARIMA(2,1,1)(1,0,1)[24] model. This model accounted for stationarity through differencing, while incorporating seasonal components to reflect daily cycles in the data alongside both autoregressive and moving average components. The validity and reliability of this model, along with its forecasts, is a theme I will revisit shortly.
Next, I extracted the forecast and organized it into a comparative dataframe:
arima_fcst_df <- BE_test_df |> mutate(time = ds, BE_actual = y, f_211101 = as.vector(forecast_fit$mean)) |> select(-ds, -y)
Transitioning to the fable
package, I replicated the ARIMA fitting process. This package is relatively newer and offers a more streamlined and efficient reproducible workflow compared to its predecessor:
auto_train <- BE_train_df |> select(-unique_id) |> mutate(time = as.POSIXct(ds, format = "%Y-%m-%d %H:%M:%S")) |> select(-ds)elec_ts_2 <- auto_train |> as_tsibble(index = time) |> fill_gaps(time, .full = start())
To my astonishment, the fable
package fitted an ARIMA(0,1,4)(0,0,2)[24] model to the data, which presented a stark contrast to the previously fitted ARIMA(2,1,1)(1,0,1)[24] model from the forecast
package.
fable_fit_1 <- elec_ts_2 |> model(arima_fable = ARIMA(y)) |> report()
This indicates that the two models differed significantly in their structure, suggesting that multiple model configurations can yield similarly valid results. The underlying question arises: how can two distinct models generate forecasts that closely align? This discrepancy emphasizes the well-known issue of ARIMA model identifiability.
In an attempt to further investigate, I utilized the fable
package to re-fit the ARIMA(2,1,1)(1,0,1)[24] model, confirming its validity. The resulting forecasts were then added to the comparative dataframe:
fable_fit_2 <- elec_ts_2 %>% as_tsibble() %>% model(fable_ARIMA_fcst_2 = ARIMA(y ~ 0 + pdq(2, 1, 1) + PDQ(1, 0, 1))) %>% report()
The forecasts from both models appeared remarkably similar when plotted together, reinforcing the notion that both approaches yield comparable outcomes.
To validate the robustness of the forecasts, I then checked the residuals of both models. The residuals demonstrated a high correlation, suggesting that both models captured the underlying data structure effectively:
fable_fit_2_aug <- fable_fit_2 %>% augment()
The analysis of the residuals indicated that while much of the noise could be characterized as random, there remained some structure that hinted at potential improvements in model performance. The Ljung-Box test for autocorrelation of the residuals yielded a satisfactory result, indicating that the forecast models were indeed providing a reasonable fit.
In summary, I uncovered five distinct ARIMA models fitting the Belgian electricity usage time series. Each model demonstrated a good fit when evaluated against the root mean square error (RMSE) of forecasted values compared to actual outcomes during the hold-out test period. Yet, the question lingers: are there alternative models that might better capture the underlying data structure? Future research could focus on exploring this dimension further, potentially addressing the remaining patterns present in the residuals to refine forecasting accuracy.