Kaggle forecasting competitions: An overlooked learning opportunity

Casper Solheim Bojer; Jens Peder Meldgaard

doi:10.1016/j.ijforecast.2020.07.007

Kaggle forecasting competitions: An overlooked learning opportunity

Casper Solheim Bojer, Jens Peder Meldgaard

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review

106 Citationer (Scopus)

157 Downloads (Pure)

Abstract

We review the results of six forecasting competitions based on the online data science platform Kaggle, which have been largely overlooked by the forecasting community. In contrast to the M competitions, the competitions reviewed in this study feature daily and weekly time series with exogenous variables, business hierarchy information, or both. Furthermore, the Kaggle data sets all exhibit higher entropy than the M3 and M4 competitions, and they are intermittent.

In this review, we confirm the conclusion of the M4 competition that ensemble models using cross-learning tend to outperform local time series models and that gradient boosted decision trees and neural networks are strong forecast methods. Moreover, we present insights regarding the use of external information and validation strategies, and discuss the impacts of data characteristics on the choice of statistics or machine learning methods. Based on these insights, we construct nine ex-ante hypotheses for the outcome of the M5 competition to allow empirical validation of our findings.

Originalsprog	Engelsk
Tidsskrift	International Journal of Forecasting
Vol/bind	37
Udgave nummer	2
Sider (fra-til)	587-603
Antal sider	17
ISSN	0169-2070
DOI	https://doi.org/10.1016/j.ijforecast.2020.07.007
Status	Udgivet - 2021

Adgang til dokumentet

10.1016/j.ijforecast.2020.07.007

Learning from Kaggle CompetitionsAccepteret manuskript, 239 KBLicens: CC BY-NC-ND 4.0

https://arxiv.org/ftp/arxiv/papers/2009/2009.07701.pdf

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

Link to publication in Scopus

Citationsformater

@article{98f8c43fc48b45c9b9da7accf396e0f0,

title = "Kaggle forecasting competitions: An overlooked learning opportunity",

abstract = "We review the results of six forecasting competitions based on the online data science platform Kaggle, which have been largely overlooked by the forecasting community. In contrast to the M competitions, the competitions reviewed in this study feature daily and weekly time series with exogenous variables, business hierarchy information, or both. Furthermore, the Kaggle data sets all exhibit higher entropy than the M3 and M4 competitions, and they are intermittent.In this review, we confirm the conclusion of the M4 competition that ensemble models using cross-learning tend to outperform local time series models and that gradient boosted decision trees and neural networks are strong forecast methods. Moreover, we present insights regarding the use of external information and validation strategies, and discuss the impacts of data characteristics on the choice of statistics or machine learning methods. Based on these insights, we construct nine ex-ante hypotheses for the outcome of the M5 competition to allow empirical validation of our findings.",

keywords = "Benchmarking, Business forecasting, Forecast accuracy, Forecasting competition review, M competitions, Machine learning methods, Time series methods, Time series visualization",

author = "Bojer, {Casper Solheim} and Meldgaard, {Jens Peder}",

year = "2021",

doi = "10.1016/j.ijforecast.2020.07.007",

language = "English",

volume = "37",

pages = "587--603",

journal = "International Journal of Forecasting",

issn = "0169-2070",

publisher = "Elsevier",

number = "2",

}

TY - JOUR

T1 - Kaggle forecasting competitions

T2 - An overlooked learning opportunity

AU - Bojer, Casper Solheim

AU - Meldgaard, Jens Peder

PY - 2021

Y1 - 2021

N2 - We review the results of six forecasting competitions based on the online data science platform Kaggle, which have been largely overlooked by the forecasting community. In contrast to the M competitions, the competitions reviewed in this study feature daily and weekly time series with exogenous variables, business hierarchy information, or both. Furthermore, the Kaggle data sets all exhibit higher entropy than the M3 and M4 competitions, and they are intermittent.In this review, we confirm the conclusion of the M4 competition that ensemble models using cross-learning tend to outperform local time series models and that gradient boosted decision trees and neural networks are strong forecast methods. Moreover, we present insights regarding the use of external information and validation strategies, and discuss the impacts of data characteristics on the choice of statistics or machine learning methods. Based on these insights, we construct nine ex-ante hypotheses for the outcome of the M5 competition to allow empirical validation of our findings.

AB - We review the results of six forecasting competitions based on the online data science platform Kaggle, which have been largely overlooked by the forecasting community. In contrast to the M competitions, the competitions reviewed in this study feature daily and weekly time series with exogenous variables, business hierarchy information, or both. Furthermore, the Kaggle data sets all exhibit higher entropy than the M3 and M4 competitions, and they are intermittent.In this review, we confirm the conclusion of the M4 competition that ensemble models using cross-learning tend to outperform local time series models and that gradient boosted decision trees and neural networks are strong forecast methods. Moreover, we present insights regarding the use of external information and validation strategies, and discuss the impacts of data characteristics on the choice of statistics or machine learning methods. Based on these insights, we construct nine ex-ante hypotheses for the outcome of the M5 competition to allow empirical validation of our findings.

KW - Benchmarking

KW - Business forecasting

KW - Forecast accuracy

KW - Forecasting competition review

KW - M competitions

KW - Machine learning methods

KW - Time series methods

KW - Time series visualization

UR - http://www.scopus.com/inward/record.url?scp=85090060507&partnerID=8YFLogxK

U2 - 10.1016/j.ijforecast.2020.07.007

DO - 10.1016/j.ijforecast.2020.07.007

M3 - Journal article

SN - 0169-2070

VL - 37

SP - 587

EP - 603

JO - International Journal of Forecasting

JF - International Journal of Forecasting

IS - 2

ER -