Why do Transformers suck at Time Series Forecasting
When they were first gaining attention, the world lost its mind about Transformers in Time Series Forecasting. Unfortunately, Transformers never quite lived up to the hype. So, what went wrong?
“The natural intuition is that multivariate models, such as those based on Transformer architectures, should be more effective than univariate models due to their ability to leverage cross-variate information. However, Zeng et al. (2023) revealed that this is not always the case — Transformer-based models can indeed be significantly worse than simple univariate temporal linear models on many commonly used forecasting benchmarks. The multivariate models seem to suffer from overfitting especially when the target time series is not correlated with other covariates.”
The problems for Transformers don’t end here. The authors of ‘Are Transformers Effective for Time Series Forecasting’ demonstrated that Transformer models could be beaten by a very simple linear model. When analyzing why Transformers failed, they pointed to the Multi-Headed Self Attention as a potential reason for their failure.
0 Comments