Principles of Proper Validation: use and abuse of re-sampling for validation

Kim Esbensen, Paul Geladi

Publikation: Bidrag til tidsskriftKonferenceartikel i tidsskriftForskningpeer review

155 Citationer (Scopus)

Resumé

Validation in chemometrics is presented using the exemplar context of multivariate calibration/prediction. A phenomenological analysis of common validation practices in data analysis and chemometrics leads to formulation of a set of generic Principles of Proper Validation (PPV), which is based on a set of characterizing distinctions: (i) Validation cannot be understood by focusing on the methods of validation only; validation must be based on full knowledge of the underlying definitions, objectives, methods, effects and consequences which are all outlined and discussed here. (ii) Analysis of proper validation objectives implies that there is one valid paradigm only: test set validation. (iii) Contrary to much contemporary chemometric practices (and validation myths), cross-validation is shown to be unjustified in the form of monolithic application of a one-for-all procedure (segmented cross-validation) on all data sets. Within its own design and scope, cross-validation is in reality a sub-optimal simulation of test set validation, crippled by a critical sampling variance omission, as it manifestly is based on one data set only (training data set). Other re-sampling validation methods are shown to suffer from the same deficiencies. The PPV are universal and can be applied to all situations in which the assessment of performance is desired: prediction-, classification-, time series forecasting-, modeling validation. The key element of PPV is the Theory of Sampling (TOS), which allow insight into all variance generating factors, especially the so-called incorrect sampling errors, which, if not properly eliminated, are responsible for a fatal inconstant sampling bias, for which no statistical correction is possible. In the light of TOS it is shown how a second data set (test set, validation set) is critically necessary for the inclusion of the sampling errors incurred in all 'future' situations in which the validated model must perform. Logically, therefore, all one data set re-sampling approaches for validation, especially cross-validation and leverage-corrected validation, should be terminated, or at the very least used only with full scientific understanding and disclosure of their detrimental variance omissions and consequences. Regarding PLS-regression, an emphatic call is made for stringent commitment to test set validation based on graphical inspection of pertinent t-u plots for optimal understanding of the X-Y interrelationships and for validation guidance. OSAR/QSAP forms a partial exemption from the present test set imperative with no generalization potential.
OriginalsprogEngelsk
TidsskriftJournal of Chemometrics
Vol/bind24
Udgave nummer3-4
Sider (fra-til)168-187
Antal sider9
ISSN0886-9383
DOI
StatusUdgivet - mar. 2010
BegivenhedConferentia Chemometrica 2009 - Siófok, Ungarn
Varighed: 27 sep. 200930 sep. 2009

Konference

KonferenceConferentia Chemometrica 2009
LandUngarn
BySiófok
Periode27/09/200930/09/2009

Citer dette

@inproceedings{33a3a518e7c24d68bf81481e6d3c8dd0,
title = "Principles of Proper Validation: use and abuse of re-sampling for validation",
abstract = "Validation in chemometrics is presented using the exemplar context of multivariate calibration/prediction. A phenomenological analysis of common validation practices in data analysis and chemometrics leads to formulation of a set of generic Principles of Proper Validation (PPV), which is based on a set of characterizing distinctions: (i) Validation cannot be understood by focusing on the methods of validation only; validation must be based on full knowledge of the underlying definitions, objectives, methods, effects and consequences which are all outlined and discussed here. (ii) Analysis of proper validation objectives implies that there is one valid paradigm only: test set validation. (iii) Contrary to much contemporary chemometric practices (and validation myths), cross-validation is shown to be unjustified in the form of monolithic application of a one-for-all procedure (segmented cross-validation) on all data sets. Within its own design and scope, cross-validation is in reality a sub-optimal simulation of test set validation, crippled by a critical sampling variance omission, as it manifestly is based on one data set only (training data set). Other re-sampling validation methods are shown to suffer from the same deficiencies. The PPV are universal and can be applied to all situations in which the assessment of performance is desired: prediction-, classification-, time series forecasting-, modeling validation. The key element of PPV is the Theory of Sampling (TOS), which allow insight into all variance generating factors, especially the so-called incorrect sampling errors, which, if not properly eliminated, are responsible for a fatal inconstant sampling bias, for which no statistical correction is possible. In the light of TOS it is shown how a second data set (test set, validation set) is critically necessary for the inclusion of the sampling errors incurred in all 'future' situations in which the validated model must perform. Logically, therefore, all one data set re-sampling approaches for validation, especially cross-validation and leverage-corrected validation, should be terminated, or at the very least used only with full scientific understanding and disclosure of their detrimental variance omissions and consequences. Regarding PLS-regression, an emphatic call is made for stringent commitment to test set validation based on graphical inspection of pertinent t-u plots for optimal understanding of the X-Y interrelationships and for validation guidance. OSAR/QSAP forms a partial exemption from the present test set imperative with no generalization potential.",
author = "Kim Esbensen and Paul Geladi",
year = "2010",
month = "3",
doi = "10.1002/cem.1310",
language = "English",
volume = "24",
pages = "168--187",
journal = "Journal of Chemometrics",
issn = "0886-9383",
publisher = "Wiley",
number = "3-4",

}

Principles of Proper Validation : use and abuse of re-sampling for validation. / Esbensen, Kim; Geladi, Paul.

I: Journal of Chemometrics, Bind 24, Nr. 3-4, 03.2010, s. 168-187.

Publikation: Bidrag til tidsskriftKonferenceartikel i tidsskriftForskningpeer review

TY - GEN

T1 - Principles of Proper Validation

T2 - use and abuse of re-sampling for validation

AU - Esbensen, Kim

AU - Geladi, Paul

PY - 2010/3

Y1 - 2010/3

N2 - Validation in chemometrics is presented using the exemplar context of multivariate calibration/prediction. A phenomenological analysis of common validation practices in data analysis and chemometrics leads to formulation of a set of generic Principles of Proper Validation (PPV), which is based on a set of characterizing distinctions: (i) Validation cannot be understood by focusing on the methods of validation only; validation must be based on full knowledge of the underlying definitions, objectives, methods, effects and consequences which are all outlined and discussed here. (ii) Analysis of proper validation objectives implies that there is one valid paradigm only: test set validation. (iii) Contrary to much contemporary chemometric practices (and validation myths), cross-validation is shown to be unjustified in the form of monolithic application of a one-for-all procedure (segmented cross-validation) on all data sets. Within its own design and scope, cross-validation is in reality a sub-optimal simulation of test set validation, crippled by a critical sampling variance omission, as it manifestly is based on one data set only (training data set). Other re-sampling validation methods are shown to suffer from the same deficiencies. The PPV are universal and can be applied to all situations in which the assessment of performance is desired: prediction-, classification-, time series forecasting-, modeling validation. The key element of PPV is the Theory of Sampling (TOS), which allow insight into all variance generating factors, especially the so-called incorrect sampling errors, which, if not properly eliminated, are responsible for a fatal inconstant sampling bias, for which no statistical correction is possible. In the light of TOS it is shown how a second data set (test set, validation set) is critically necessary for the inclusion of the sampling errors incurred in all 'future' situations in which the validated model must perform. Logically, therefore, all one data set re-sampling approaches for validation, especially cross-validation and leverage-corrected validation, should be terminated, or at the very least used only with full scientific understanding and disclosure of their detrimental variance omissions and consequences. Regarding PLS-regression, an emphatic call is made for stringent commitment to test set validation based on graphical inspection of pertinent t-u plots for optimal understanding of the X-Y interrelationships and for validation guidance. OSAR/QSAP forms a partial exemption from the present test set imperative with no generalization potential.

AB - Validation in chemometrics is presented using the exemplar context of multivariate calibration/prediction. A phenomenological analysis of common validation practices in data analysis and chemometrics leads to formulation of a set of generic Principles of Proper Validation (PPV), which is based on a set of characterizing distinctions: (i) Validation cannot be understood by focusing on the methods of validation only; validation must be based on full knowledge of the underlying definitions, objectives, methods, effects and consequences which are all outlined and discussed here. (ii) Analysis of proper validation objectives implies that there is one valid paradigm only: test set validation. (iii) Contrary to much contemporary chemometric practices (and validation myths), cross-validation is shown to be unjustified in the form of monolithic application of a one-for-all procedure (segmented cross-validation) on all data sets. Within its own design and scope, cross-validation is in reality a sub-optimal simulation of test set validation, crippled by a critical sampling variance omission, as it manifestly is based on one data set only (training data set). Other re-sampling validation methods are shown to suffer from the same deficiencies. The PPV are universal and can be applied to all situations in which the assessment of performance is desired: prediction-, classification-, time series forecasting-, modeling validation. The key element of PPV is the Theory of Sampling (TOS), which allow insight into all variance generating factors, especially the so-called incorrect sampling errors, which, if not properly eliminated, are responsible for a fatal inconstant sampling bias, for which no statistical correction is possible. In the light of TOS it is shown how a second data set (test set, validation set) is critically necessary for the inclusion of the sampling errors incurred in all 'future' situations in which the validated model must perform. Logically, therefore, all one data set re-sampling approaches for validation, especially cross-validation and leverage-corrected validation, should be terminated, or at the very least used only with full scientific understanding and disclosure of their detrimental variance omissions and consequences. Regarding PLS-regression, an emphatic call is made for stringent commitment to test set validation based on graphical inspection of pertinent t-u plots for optimal understanding of the X-Y interrelationships and for validation guidance. OSAR/QSAP forms a partial exemption from the present test set imperative with no generalization potential.

U2 - 10.1002/cem.1310

DO - 10.1002/cem.1310

M3 - Conference article in Journal

VL - 24

SP - 168

EP - 187

JO - Journal of Chemometrics

JF - Journal of Chemometrics

SN - 0886-9383

IS - 3-4

ER -