TY - JOUR
T1 - Data reduction by randomization subsampling for the study of large hyperspectral datasets
AU - Cruz-Tirado, J. P.
AU - Amigo, José Manuel
AU - Barbin, Douglas Fernandes
AU - Kucheryavskiy, Sergey
N1 - Publisher Copyright:
© 2022 The Author(s)
PY - 2022/5/29
Y1 - 2022/5/29
N2 - Large amount of information in hyperspectral images (HSI) generally makes their analysis (e.g., principal component analysis, PCA) time consuming and often requires a lot of random access memory (RAM) and high computing power. This is particularly problematic for analysis of large images, containing millions of pixels, which can be created by augmenting series of single images (e.g., in time series analysis). This tutorial explores how data reduction can be used to analyze time series hyperspectral images much faster without losing crucial analytical information. Two of the most common data reduction methods have been chosen from the recent research. The first one uses a simple randomization method called randomized sub-sampling PCA (RSPCA). The second implies a more robust randomization method based on local-rank approximations (rPCA). This manuscript exposes the major benefits and drawbacks of both methods with the spirit of being as didactical as possible for a reader. A comprehensive comparison is made considering the amount of information retained by the PCA models at different compression degrees and the performance time. Extrapolation is also made to the case where the effect of time and any other factor are to be studied simultaneously.
AB - Large amount of information in hyperspectral images (HSI) generally makes their analysis (e.g., principal component analysis, PCA) time consuming and often requires a lot of random access memory (RAM) and high computing power. This is particularly problematic for analysis of large images, containing millions of pixels, which can be created by augmenting series of single images (e.g., in time series analysis). This tutorial explores how data reduction can be used to analyze time series hyperspectral images much faster without losing crucial analytical information. Two of the most common data reduction methods have been chosen from the recent research. The first one uses a simple randomization method called randomized sub-sampling PCA (RSPCA). The second implies a more robust randomization method based on local-rank approximations (rPCA). This manuscript exposes the major benefits and drawbacks of both methods with the spirit of being as didactical as possible for a reader. A comprehensive comparison is made considering the amount of information retained by the PCA models at different compression degrees and the performance time. Extrapolation is also made to the case where the effect of time and any other factor are to be studied simultaneously.
KW - Data reduction
KW - Hyperspectral imaging
KW - Principal component analysis
KW - Randomization
KW - Sub-sampling
KW - Time series
UR - http://www.scopus.com/inward/record.url?scp=85127499635&partnerID=8YFLogxK
U2 - 10.1016/j.aca.2022.339793
DO - 10.1016/j.aca.2022.339793
M3 - Journal article
AN - SCOPUS:85127499635
SN - 0003-2670
VL - 1209
JO - Analytica Chimica Acta
JF - Analytica Chimica Acta
M1 - 339793
ER -