Aktiviteter pr. år
Abstract
Procrustes Cross-Validation (PCV) is a new validation method recently proposed for validation of a wide range of chemometric models, including PCA/SIMCA, PCR and PLS [1, 2]. PCV employs conventional cross-validation (CCV) to estimate sampling error and then adds this error into the calibration set, which results in a new dataset – Procrustes validation set (PV-set). PV-set can then be used for validation of global models in the same way as the independent validation set. PCV also provides various diagnostic tools that can be used to assess the dataset quality and optimize the splitting strategy.
In this presentation, we will show a new application of PCV — data augmentation [3]. Data augmentation is a way to artificially increase the calibration set by generating new points from the existing data. This is exactly what PCV does. By using random splits, it is possible to create very large number of unique PV-sets, which, being merged with the original dataset can significantly improve the performance of complex machine learning models, that have large number of hyperparameters, such as artificial neural networks (ANN).
One of the advantages of PCV over other data augmentation methods is that the PV-set has similar variance-covariance structure as the calibration set as if both comprise the same population. This makes it particularly efficient for augmentation of datasets with high degree of collinearity, such as e.g. spectral data. Preliminary tests have shown that PCV based augmentation can decrease root mean squared error of prediction of ANN regression models (computed using independent test set) by several times.
References
[1] Kucheryavskiy S, Zhilin S, Rodionova O, Pomerantsev A. Anal. Chem. 92 (2020) 11842–11850
[2] Kucheryavskiy S, Rodionova O, Pomerantsev A. Anal. Chim. Acta. 1255 (2023)
[3] Kucheryavskiy S. Zhilin S. arXiv Preprint DOI: 10.48550/arXiv.2312.04911
In this presentation, we will show a new application of PCV — data augmentation [3]. Data augmentation is a way to artificially increase the calibration set by generating new points from the existing data. This is exactly what PCV does. By using random splits, it is possible to create very large number of unique PV-sets, which, being merged with the original dataset can significantly improve the performance of complex machine learning models, that have large number of hyperparameters, such as artificial neural networks (ANN).
One of the advantages of PCV over other data augmentation methods is that the PV-set has similar variance-covariance structure as the calibration set as if both comprise the same population. This makes it particularly efficient for augmentation of datasets with high degree of collinearity, such as e.g. spectral data. Preliminary tests have shown that PCV based augmentation can decrease root mean squared error of prediction of ANN regression models (computed using independent test set) by several times.
References
[1] Kucheryavskiy S, Zhilin S, Rodionova O, Pomerantsev A. Anal. Chem. 92 (2020) 11842–11850
[2] Kucheryavskiy S, Rodionova O, Pomerantsev A. Anal. Chim. Acta. 1255 (2023)
[3] Kucheryavskiy S. Zhilin S. arXiv Preprint DOI: 10.48550/arXiv.2312.04911
Originalsprog | Engelsk |
---|---|
Titel | XIX CAC 2024 Chemometrics in Analytical Chemistry : Book of Abstracts |
Antal sider | 1 |
Publikationsdato | 2024 |
ISBN (Elektronisk) | 9789876924085 |
Status | Udgivet - 2024 |
Begivenhed | Chemometrics in Analytical Chemistry - UNIVERSIDAD NACIONAL DEL LITORAL, Santa Fe, Argentina Varighed: 9 sep. 2024 → 12 sep. 2024 Konferencens nummer: XIX https://www.fbcb.unl.edu.ar/cac2024/ |
Konference
Konference | Chemometrics in Analytical Chemistry |
---|---|
Nummer | XIX |
Lokation | UNIVERSIDAD NACIONAL DEL LITORAL |
Land/Område | Argentina |
By | Santa Fe |
Periode | 09/09/2024 → 12/09/2024 |
Internetadresse |
Fingeraftryk
Dyk ned i forskningsemnerne om 'Collinear datasets augmentation using Procrustes validation sets'. Sammen danner de et unikt fingeraftryk.Aktiviteter
- 1 Konferenceoplæg
-
Collinear datasets augmentation using Procrustes validation sets
Kucheryavskiy, S. (Oplægsholder)
10 sep. 2024Aktivitet: Foredrag og mundtlige bidrag › Konferenceoplæg