Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform

Zhen Xu; Sergio Escalera; Adrien Pavão; Magali Richard; Wei Wei Tu; Quanming Yao; Huan Zhao; Isabelle Guyon

doi:10.1016/j.patter.2022.100543

Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform

Zhen Xu^*, Sergio Escalera, Adrien Pavão, Magali Richard, Wei Wei Tu, Quanming Yao, Huan Zhao, Isabelle Guyon

^*Kontaktforfatter

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review

4 Citationer (Scopus)

17 Downloads (Pure)

Abstract

Obtaining a standardized benchmark of computational methods is a major issue in data-science communities. Dedicated frameworks enabling fair benchmarking in a unified environment are yet to be developed. Here, we introduce Codabench, a meta-benchmark platform that is open sourced and community driven for benchmarking algorithms or software agents versus datasets or tasks. A public instance of Codabench is open to everyone free of charge and allows benchmark organizers to fairly compare submissions under the same setting (software, hardware, data, algorithms), with custom protocols and data formats. Codabench has unique features facilitating easy organization of flexible and reproducible benchmarks, such as the possibility of reusing templates of benchmarks and supplying compute resources on demand. Codabench has been used internally and externally on various applications, receiving more than 130 users and 2,500 submissions. As illustrative use cases, we introduce four diverse benchmarks covering graph machine learning, cancer heterogeneity, clinical diagnosis, and reinforcement learning.

Originalsprog	Engelsk
Artikelnummer	100543
Tidsskrift	Patterns
Vol/bind	3
Udgave nummer	7
DOI	https://doi.org/10.1016/j.patter.2022.100543
Status	Udgivet - 8 jul. 2022
Udgivet eksternt	Ja

Bibliografisk note

Publisher Copyright:
© 2022 The Authors

© 2022 The Authors.

Adgang til dokumentet

10.1016/j.patter.2022.100543Licens: CC BY 4.0

Open Access articleForlagets udgivne version, 1,16 MBLicens: CC BY 4.0

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

Link to publication in Scopus

Citationsformater

@article{5d8328c754a3435fae5544278142c6e2,

title = "Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform",

abstract = "Obtaining a standardized benchmark of computational methods is a major issue in data-science communities. Dedicated frameworks enabling fair benchmarking in a unified environment are yet to be developed. Here, we introduce Codabench, a meta-benchmark platform that is open sourced and community driven for benchmarking algorithms or software agents versus datasets or tasks. A public instance of Codabench is open to everyone free of charge and allows benchmark organizers to fairly compare submissions under the same setting (software, hardware, data, algorithms), with custom protocols and data formats. Codabench has unique features facilitating easy organization of flexible and reproducible benchmarks, such as the possibility of reusing templates of benchmarks and supplying compute resources on demand. Codabench has been used internally and externally on various applications, receiving more than 130 users and 2,500 submissions. As illustrative use cases, we introduce four diverse benchmarks covering graph machine learning, cancer heterogeneity, clinical diagnosis, and reinforcement learning.",

keywords = "benchmark platform, competitions, data science, DSML3: Development/Pre-production: Data science output has been rolled out/validated across multiple domains/problems, machine learning, reproducibility",

author = "Zhen Xu and Sergio Escalera and Adrien Pav{\~a}o and Magali Richard and Tu, {Wei Wei} and Quanming Yao and Huan Zhao and Isabelle Guyon",

note = "Funding Information: The Codabench project shares the same community governance as CodaLab Competitions. The openness of Codabench is total: an Apache 2.0 license is used, the source code is on GitHub, and the development framework and all the used components are open sourced. Codabench has received important contributions from many people who did not co-author this paper, and we would like to thank their efforts in making Codabench what it is today, including early CodaLab Competitions developers and contributors (listed alphabetically): Pujun Bhatnagar, Justin Carden, Richard Caruana, Francis Cleary, Xiawei Guo, Ivan Judson, Lori Ada Kilty, Shaunak Kishore, Stephen Koo, Percy Liang, Zhengying Liu, Pragnya Maduskar, Simon Mercer, Arthur Pesah, Christophe Poulain, Lukasz Romaszko, Laurent Senta, Lisheng Sun, Sebastien Treguer, Cedric Vachaudez, Evelyne Viegas, Paul Viola, Erick Watson, Tony Yang, Flavio Zhingri, and Michael Zyskowski. We would like to particularly thank the people who contributed to the design, development, and testing of Codabench including (listed alphabetically) Alexis Arnaud, Xavier Bar{\'o}, Feng Bin, Yuna Blum, Eric Carmichael, Laurent Darr{\'e}, Hugo Jair Escalante, Sergio Escalera, Eric Frichot, Yuxuan He, James Keith, Anne-Catherine Letournel, Shouxiang Liu, Zhenwu Liu, Adrien Pavao, Magali Richard, Tyler Thomas, Nic Threfts, Bailey Trefts, Catherine Wallez, and Lanning Wei. The Universit{\'e} Paris-Saclay is hosting the main instance of Codabench. Funding and support have been received via several research grants, including Big Data Chair of Excellence FDS Paris-Saclay, Paris R{\'e}gion Ile-de-France, EU EIT projects HADACA and COMETH, United Health Foundation INCITE project, ANR Chair of Artificial Intelligence HUMANIA ANR-19-CHIA-0022, the Spanish project PID2019-105093GB-I00, ICREA under the ICREA Academia program, INSERM Cancer project ACACIA 232717, MIAI @Grenoble Alpes (ANR-19-P3IA-0003), 4Paradigm, ChaLearn, Microsoft, and Google. We also appreciate the following people and institutes for their open-source datasets that are used in our use cases: Andrew McCallum, C. Lee Giles, Ken Lang, Tom Mitchell, William L. Hamilton, Maximilian Mumme, Oleksandr Shchur, David D. Lewis, William Hersh, Just Research and Carnegie Mellon University, NEC Research Institute, Carnegie Mellon University, Stanford University, Technical University of Munich, AT&T Labs, and Oregon Health Sciences University. We are also very grateful to Joaquin Vanschoren for fruitful discussions. Conceptualization, Z.X. S.E. A.P. and I.G.; methodology, Z.X. and I.G.; validation and investigation, all authors; resources and data curation, Z.X. M.R. W.-W.T. and I.G.; writing – original draft, all authors; writing – review & editing, Z.X. Q.Y. M.R. and I.G.; visualization, Z.X. Q.Y. and I.G.; supervision and project administration, I.G.; funding acquisition, W.-W.T. and I.G. Z.X. W.-W.T. and H.Z. are employed by 4Paradigm, China. I.G. is president of ChaLearn, a not-for-profit organization dedicated to running challenges in machine learning. ChaLearn is a tax-exempt not-for-profit organization under section 501(c)(3) of the US IRS code of the United States. It derived no profit from sponsoring this research. Funding Information: The Codabench project shares the same community governance as CodaLab Competitions. The openness of Codabench is total: an Apache 2.0 license is used, the source code is on GitHub, and the development framework and all the used components are open sourced. Codabench has received important contributions from many people who did not co-author this paper, and we would like to thank their efforts in making Codabench what it is today, including early CodaLab Competitions developers and contributors (listed alphabetically): Pujun Bhatnagar, Justin Carden, Richard Caruana, Francis Cleary, Xiawei Guo, Ivan Judson, Lori Ada Kilty, Shaunak Kishore, Stephen Koo, Percy Liang, Zhengying Liu, Pragnya Maduskar, Simon Mercer, Arthur Pesah, Christophe Poulain, Lukasz Romaszko, Laurent Senta, Lisheng Sun, Sebastien Treguer, Cedric Vachaudez, Evelyne Viegas, Paul Viola, Erick Watson, Tony Yang, Flavio Zhingri, and Michael Zyskowski. We would like to particularly thank the people who contributed to the design, development, and testing of Codabench including (listed alphabetically) Alexis Arnaud, Xavier Bar{\'o}, Feng Bin, Yuna Blum, Eric Carmichael, Laurent Darr{\'e}, Hugo Jair Escalante, Sergio Escalera, Eric Frichot, Yuxuan He, James Keith, Anne-Catherine Letournel, Shouxiang Liu, Zhenwu Liu, Adrien Pavao, Magali Richard, Tyler Thomas, Nic Threfts, Bailey Trefts, Catherine Wallez, and Lanning Wei. The Universit{\'e} Paris-Saclay is hosting the main instance of Codabench. Funding and support have been received via several research grants, including Big Data Chair of Excellence FDS Paris-Saclay , Paris R{\'e}gion Ile-de-France , EU EIT projects HADACA and COMETH, United Health Foundation INCITE project, ANR Chair of Artificial Intelligence HUMANIA ANR-19-CHIA-0022 , the Spanish project PID2019-105093GB-I00 , ICREA under the ICREA Academia program, INSERM Cancer project ACACIA 232717 , MIAI @Grenoble Alpes ( ANR-19-P3IA-0003 ), 4Paradigm , ChaLearn , Microsoft , and Google . We also appreciate the following people and institutes for their open-source datasets that are used in our use cases: Andrew McCallum, C. Lee Giles, Ken Lang, Tom Mitchell, William L. Hamilton, Maximilian Mumme, Oleksandr Shchur, David D. Lewis, William Hersh, Just Research and Carnegie Mellon University, NEC Research Institute, Carnegie Mellon University, Stanford University, Technical University of Munich, AT&T Labs, and Oregon Health Sciences University. We are also very grateful to Joaquin Vanschoren for fruitful discussions. Publisher Copyright: {\textcopyright} 2022 The Authors {\textcopyright} 2022 The Authors.",

year = "2022",

month = jul,

day = "8",

doi = "10.1016/j.patter.2022.100543",

language = "English",

volume = "3",

journal = "Patterns",

issn = "2666-3899",

publisher = "Cell Press",

number = "7",

}

TY - JOUR

T1 - Codabench

T2 - Flexible, easy-to-use, and reproducible meta-benchmark platform

AU - Xu, Zhen

AU - Escalera, Sergio

AU - Pavão, Adrien

AU - Richard, Magali

AU - Tu, Wei Wei

AU - Yao, Quanming

AU - Zhao, Huan

AU - Guyon, Isabelle

N1 - Funding Information: The Codabench project shares the same community governance as CodaLab Competitions. The openness of Codabench is total: an Apache 2.0 license is used, the source code is on GitHub, and the development framework and all the used components are open sourced. Codabench has received important contributions from many people who did not co-author this paper, and we would like to thank their efforts in making Codabench what it is today, including early CodaLab Competitions developers and contributors (listed alphabetically): Pujun Bhatnagar, Justin Carden, Richard Caruana, Francis Cleary, Xiawei Guo, Ivan Judson, Lori Ada Kilty, Shaunak Kishore, Stephen Koo, Percy Liang, Zhengying Liu, Pragnya Maduskar, Simon Mercer, Arthur Pesah, Christophe Poulain, Lukasz Romaszko, Laurent Senta, Lisheng Sun, Sebastien Treguer, Cedric Vachaudez, Evelyne Viegas, Paul Viola, Erick Watson, Tony Yang, Flavio Zhingri, and Michael Zyskowski. We would like to particularly thank the people who contributed to the design, development, and testing of Codabench including (listed alphabetically) Alexis Arnaud, Xavier Baró, Feng Bin, Yuna Blum, Eric Carmichael, Laurent Darré, Hugo Jair Escalante, Sergio Escalera, Eric Frichot, Yuxuan He, James Keith, Anne-Catherine Letournel, Shouxiang Liu, Zhenwu Liu, Adrien Pavao, Magali Richard, Tyler Thomas, Nic Threfts, Bailey Trefts, Catherine Wallez, and Lanning Wei. The Université Paris-Saclay is hosting the main instance of Codabench. Funding and support have been received via several research grants, including Big Data Chair of Excellence FDS Paris-Saclay, Paris Région Ile-de-France, EU EIT projects HADACA and COMETH, United Health Foundation INCITE project, ANR Chair of Artificial Intelligence HUMANIA ANR-19-CHIA-0022, the Spanish project PID2019-105093GB-I00, ICREA under the ICREA Academia program, INSERM Cancer project ACACIA 232717, MIAI @Grenoble Alpes (ANR-19-P3IA-0003), 4Paradigm, ChaLearn, Microsoft, and Google. We also appreciate the following people and institutes for their open-source datasets that are used in our use cases: Andrew McCallum, C. Lee Giles, Ken Lang, Tom Mitchell, William L. Hamilton, Maximilian Mumme, Oleksandr Shchur, David D. Lewis, William Hersh, Just Research and Carnegie Mellon University, NEC Research Institute, Carnegie Mellon University, Stanford University, Technical University of Munich, AT&T Labs, and Oregon Health Sciences University. We are also very grateful to Joaquin Vanschoren for fruitful discussions. Conceptualization, Z.X. S.E. A.P. and I.G.; methodology, Z.X. and I.G.; validation and investigation, all authors; resources and data curation, Z.X. M.R. W.-W.T. and I.G.; writing – original draft, all authors; writing – review & editing, Z.X. Q.Y. M.R. and I.G.; visualization, Z.X. Q.Y. and I.G.; supervision and project administration, I.G.; funding acquisition, W.-W.T. and I.G. Z.X. W.-W.T. and H.Z. are employed by 4Paradigm, China. I.G. is president of ChaLearn, a not-for-profit organization dedicated to running challenges in machine learning. ChaLearn is a tax-exempt not-for-profit organization under section 501(c)(3) of the US IRS code of the United States. It derived no profit from sponsoring this research. Funding Information: The Codabench project shares the same community governance as CodaLab Competitions. The openness of Codabench is total: an Apache 2.0 license is used, the source code is on GitHub, and the development framework and all the used components are open sourced. Codabench has received important contributions from many people who did not co-author this paper, and we would like to thank their efforts in making Codabench what it is today, including early CodaLab Competitions developers and contributors (listed alphabetically): Pujun Bhatnagar, Justin Carden, Richard Caruana, Francis Cleary, Xiawei Guo, Ivan Judson, Lori Ada Kilty, Shaunak Kishore, Stephen Koo, Percy Liang, Zhengying Liu, Pragnya Maduskar, Simon Mercer, Arthur Pesah, Christophe Poulain, Lukasz Romaszko, Laurent Senta, Lisheng Sun, Sebastien Treguer, Cedric Vachaudez, Evelyne Viegas, Paul Viola, Erick Watson, Tony Yang, Flavio Zhingri, and Michael Zyskowski. We would like to particularly thank the people who contributed to the design, development, and testing of Codabench including (listed alphabetically) Alexis Arnaud, Xavier Baró, Feng Bin, Yuna Blum, Eric Carmichael, Laurent Darré, Hugo Jair Escalante, Sergio Escalera, Eric Frichot, Yuxuan He, James Keith, Anne-Catherine Letournel, Shouxiang Liu, Zhenwu Liu, Adrien Pavao, Magali Richard, Tyler Thomas, Nic Threfts, Bailey Trefts, Catherine Wallez, and Lanning Wei. The Université Paris-Saclay is hosting the main instance of Codabench. Funding and support have been received via several research grants, including Big Data Chair of Excellence FDS Paris-Saclay , Paris Région Ile-de-France , EU EIT projects HADACA and COMETH, United Health Foundation INCITE project, ANR Chair of Artificial Intelligence HUMANIA ANR-19-CHIA-0022 , the Spanish project PID2019-105093GB-I00 , ICREA under the ICREA Academia program, INSERM Cancer project ACACIA 232717 , MIAI @Grenoble Alpes ( ANR-19-P3IA-0003 ), 4Paradigm , ChaLearn , Microsoft , and Google . We also appreciate the following people and institutes for their open-source datasets that are used in our use cases: Andrew McCallum, C. Lee Giles, Ken Lang, Tom Mitchell, William L. Hamilton, Maximilian Mumme, Oleksandr Shchur, David D. Lewis, William Hersh, Just Research and Carnegie Mellon University, NEC Research Institute, Carnegie Mellon University, Stanford University, Technical University of Munich, AT&T Labs, and Oregon Health Sciences University. We are also very grateful to Joaquin Vanschoren for fruitful discussions. Publisher Copyright: © 2022 The Authors © 2022 The Authors.

PY - 2022/7/8

Y1 - 2022/7/8

N2 - Obtaining a standardized benchmark of computational methods is a major issue in data-science communities. Dedicated frameworks enabling fair benchmarking in a unified environment are yet to be developed. Here, we introduce Codabench, a meta-benchmark platform that is open sourced and community driven for benchmarking algorithms or software agents versus datasets or tasks. A public instance of Codabench is open to everyone free of charge and allows benchmark organizers to fairly compare submissions under the same setting (software, hardware, data, algorithms), with custom protocols and data formats. Codabench has unique features facilitating easy organization of flexible and reproducible benchmarks, such as the possibility of reusing templates of benchmarks and supplying compute resources on demand. Codabench has been used internally and externally on various applications, receiving more than 130 users and 2,500 submissions. As illustrative use cases, we introduce four diverse benchmarks covering graph machine learning, cancer heterogeneity, clinical diagnosis, and reinforcement learning.

AB - Obtaining a standardized benchmark of computational methods is a major issue in data-science communities. Dedicated frameworks enabling fair benchmarking in a unified environment are yet to be developed. Here, we introduce Codabench, a meta-benchmark platform that is open sourced and community driven for benchmarking algorithms or software agents versus datasets or tasks. A public instance of Codabench is open to everyone free of charge and allows benchmark organizers to fairly compare submissions under the same setting (software, hardware, data, algorithms), with custom protocols and data formats. Codabench has unique features facilitating easy organization of flexible and reproducible benchmarks, such as the possibility of reusing templates of benchmarks and supplying compute resources on demand. Codabench has been used internally and externally on various applications, receiving more than 130 users and 2,500 submissions. As illustrative use cases, we introduce four diverse benchmarks covering graph machine learning, cancer heterogeneity, clinical diagnosis, and reinforcement learning.

KW - benchmark platform

KW - competitions

KW - data science

KW - DSML3: Development/Pre-production: Data science output has been rolled out/validated across multiple domains/problems

KW - machine learning

KW - reproducibility

UR - http://www.scopus.com/inward/record.url?scp=85133508400&partnerID=8YFLogxK

U2 - 10.1016/j.patter.2022.100543

DO - 10.1016/j.patter.2022.100543

M3 - Journal article

C2 - 35845844

AN - SCOPUS:85133508400

SN - 2666-3899

VL - 3

JO - Patterns

JF - Patterns

IS - 7

M1 - 100543

ER -

Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform

Abstract

Bibliografisk note

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater