Principles of Proper Validation: use and abuse of re-sampling for validation

Kim Esbensen; Paul Geladi

doi:10.1002/cem.1310

Principles of Proper Validation: use and abuse of re-sampling for validation

Kim Esbensen, Paul Geladi

Department of Chemistry and Bioscience

Research output: Contribution to journal › Conference article in Journal › Research › peer-review

223 Citations (Scopus)

Abstract

Validation in chemometrics is presented using the exemplar context of multivariate calibration/prediction. A phenomenological analysis of common validation practices in data analysis and chemometrics leads to formulation of a set of generic Principles of Proper Validation (PPV), which is based on a set of characterizing distinctions: (i) Validation cannot be understood by focusing on the methods of validation only; validation must be based on full knowledge of the underlying definitions, objectives, methods, effects and consequences which are all outlined and discussed here. (ii) Analysis of proper validation objectives implies that there is one valid paradigm only: test set validation. (iii) Contrary to much contemporary chemometric practices (and validation myths), cross-validation is shown to be unjustified in the form of monolithic application of a one-for-all procedure (segmented cross-validation) on all data sets. Within its own design and scope, cross-validation is in reality a sub-optimal simulation of test set validation, crippled by a critical sampling variance omission, as it manifestly is based on one data set only (training data set). Other re-sampling validation methods are shown to suffer from the same deficiencies. The PPV are universal and can be applied to all situations in which the assessment of performance is desired: prediction-, classification-, time series forecasting-, modeling validation. The key element of PPV is the Theory of Sampling (TOS), which allow insight into all variance generating factors, especially the so-called incorrect sampling errors, which, if not properly eliminated, are responsible for a fatal inconstant sampling bias, for which no statistical correction is possible. In the light of TOS it is shown how a second data set (test set, validation set) is critically necessary for the inclusion of the sampling errors incurred in all 'future' situations in which the validated model must perform. Logically, therefore, all one data set re-sampling approaches for validation, especially cross-validation and leverage-corrected validation, should be terminated, or at the very least used only with full scientific understanding and disclosure of their detrimental variance omissions and consequences. Regarding PLS-regression, an emphatic call is made for stringent commitment to test set validation based on graphical inspection of pertinent t-u plots for optimal understanding of the X-Y interrelationships and for validation guidance. OSAR/QSAP forms a partial exemption from the present test set imperative with no generalization potential.

Original language	English
Journal	Journal of Chemometrics
Volume	24
Issue number	3-4
Pages (from-to)	168-187
Number of pages	9
ISSN	0886-9383
DOIs	https://doi.org/10.1002/cem.1310
Publication status	Published - Mar 2010
Event	Conferentia Chemometrica 2009 - Siófok, Hungary Duration: 27 Sept 2009 → 30 Sept 2009

Conference

Conference	Conferentia Chemometrica 2009
Country/Territory	Hungary
City	Siófok
Period	27/09/2009 → 30/09/2009

Access to Document

10.1002/cem.1310

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@inproceedings{33a3a518e7c24d68bf81481e6d3c8dd0,

title = "Principles of Proper Validation: use and abuse of re-sampling for validation",

abstract = "Validation in chemometrics is presented using the exemplar context of multivariate calibration/prediction. A phenomenological analysis of common validation practices in data analysis and chemometrics leads to formulation of a set of generic Principles of Proper Validation (PPV), which is based on a set of characterizing distinctions: (i) Validation cannot be understood by focusing on the methods of validation only; validation must be based on full knowledge of the underlying definitions, objectives, methods, effects and consequences which are all outlined and discussed here. (ii) Analysis of proper validation objectives implies that there is one valid paradigm only: test set validation. (iii) Contrary to much contemporary chemometric practices (and validation myths), cross-validation is shown to be unjustified in the form of monolithic application of a one-for-all procedure (segmented cross-validation) on all data sets. Within its own design and scope, cross-validation is in reality a sub-optimal simulation of test set validation, crippled by a critical sampling variance omission, as it manifestly is based on one data set only (training data set). Other re-sampling validation methods are shown to suffer from the same deficiencies. The PPV are universal and can be applied to all situations in which the assessment of performance is desired: prediction-, classification-, time series forecasting-, modeling validation. The key element of PPV is the Theory of Sampling (TOS), which allow insight into all variance generating factors, especially the so-called incorrect sampling errors, which, if not properly eliminated, are responsible for a fatal inconstant sampling bias, for which no statistical correction is possible. In the light of TOS it is shown how a second data set (test set, validation set) is critically necessary for the inclusion of the sampling errors incurred in all 'future' situations in which the validated model must perform. Logically, therefore, all one data set re-sampling approaches for validation, especially cross-validation and leverage-corrected validation, should be terminated, or at the very least used only with full scientific understanding and disclosure of their detrimental variance omissions and consequences. Regarding PLS-regression, an emphatic call is made for stringent commitment to test set validation based on graphical inspection of pertinent t-u plots for optimal understanding of the X-Y interrelationships and for validation guidance. OSAR/QSAP forms a partial exemption from the present test set imperative with no generalization potential. ",

author = "Kim Esbensen and Paul Geladi",

year = "2010",

month = mar,

doi = "10.1002/cem.1310",

language = "English",

volume = "24",

pages = "168--187",

journal = "Journal of Chemometrics",

issn = "0886-9383",

publisher = "Wiley",

number = "3-4",

note = "Conferentia Chemometrica 2009 ; Conference date: 27-09-2009 Through 30-09-2009",

}

TY - GEN

T1 - Principles of Proper Validation

T2 - Conferentia Chemometrica 2009

AU - Esbensen, Kim

AU - Geladi, Paul

PY - 2010/3

Y1 - 2010/3

N2 - Validation in chemometrics is presented using the exemplar context of multivariate calibration/prediction. A phenomenological analysis of common validation practices in data analysis and chemometrics leads to formulation of a set of generic Principles of Proper Validation (PPV), which is based on a set of characterizing distinctions: (i) Validation cannot be understood by focusing on the methods of validation only; validation must be based on full knowledge of the underlying definitions, objectives, methods, effects and consequences which are all outlined and discussed here. (ii) Analysis of proper validation objectives implies that there is one valid paradigm only: test set validation. (iii) Contrary to much contemporary chemometric practices (and validation myths), cross-validation is shown to be unjustified in the form of monolithic application of a one-for-all procedure (segmented cross-validation) on all data sets. Within its own design and scope, cross-validation is in reality a sub-optimal simulation of test set validation, crippled by a critical sampling variance omission, as it manifestly is based on one data set only (training data set). Other re-sampling validation methods are shown to suffer from the same deficiencies. The PPV are universal and can be applied to all situations in which the assessment of performance is desired: prediction-, classification-, time series forecasting-, modeling validation. The key element of PPV is the Theory of Sampling (TOS), which allow insight into all variance generating factors, especially the so-called incorrect sampling errors, which, if not properly eliminated, are responsible for a fatal inconstant sampling bias, for which no statistical correction is possible. In the light of TOS it is shown how a second data set (test set, validation set) is critically necessary for the inclusion of the sampling errors incurred in all 'future' situations in which the validated model must perform. Logically, therefore, all one data set re-sampling approaches for validation, especially cross-validation and leverage-corrected validation, should be terminated, or at the very least used only with full scientific understanding and disclosure of their detrimental variance omissions and consequences. Regarding PLS-regression, an emphatic call is made for stringent commitment to test set validation based on graphical inspection of pertinent t-u plots for optimal understanding of the X-Y interrelationships and for validation guidance. OSAR/QSAP forms a partial exemption from the present test set imperative with no generalization potential.

AB - Validation in chemometrics is presented using the exemplar context of multivariate calibration/prediction. A phenomenological analysis of common validation practices in data analysis and chemometrics leads to formulation of a set of generic Principles of Proper Validation (PPV), which is based on a set of characterizing distinctions: (i) Validation cannot be understood by focusing on the methods of validation only; validation must be based on full knowledge of the underlying definitions, objectives, methods, effects and consequences which are all outlined and discussed here. (ii) Analysis of proper validation objectives implies that there is one valid paradigm only: test set validation. (iii) Contrary to much contemporary chemometric practices (and validation myths), cross-validation is shown to be unjustified in the form of monolithic application of a one-for-all procedure (segmented cross-validation) on all data sets. Within its own design and scope, cross-validation is in reality a sub-optimal simulation of test set validation, crippled by a critical sampling variance omission, as it manifestly is based on one data set only (training data set). Other re-sampling validation methods are shown to suffer from the same deficiencies. The PPV are universal and can be applied to all situations in which the assessment of performance is desired: prediction-, classification-, time series forecasting-, modeling validation. The key element of PPV is the Theory of Sampling (TOS), which allow insight into all variance generating factors, especially the so-called incorrect sampling errors, which, if not properly eliminated, are responsible for a fatal inconstant sampling bias, for which no statistical correction is possible. In the light of TOS it is shown how a second data set (test set, validation set) is critically necessary for the inclusion of the sampling errors incurred in all 'future' situations in which the validated model must perform. Logically, therefore, all one data set re-sampling approaches for validation, especially cross-validation and leverage-corrected validation, should be terminated, or at the very least used only with full scientific understanding and disclosure of their detrimental variance omissions and consequences. Regarding PLS-regression, an emphatic call is made for stringent commitment to test set validation based on graphical inspection of pertinent t-u plots for optimal understanding of the X-Y interrelationships and for validation guidance. OSAR/QSAP forms a partial exemption from the present test set imperative with no generalization potential.

U2 - 10.1002/cem.1310

DO - 10.1002/cem.1310

M3 - Conference article in Journal

SN - 0886-9383

VL - 24

SP - 168

EP - 187

JO - Journal of Chemometrics

JF - Journal of Chemometrics

IS - 3-4

Y2 - 27 September 2009 through 30 September 2009

ER -