### Abstract

Most of multivariate datasets used in chemometrics have a number of varia- bles (significantly) larger than a number of objects (observations) and many algorithms are optimized for this case. When PCA is applied to a hyperspectral image, each pixel of the image is an individual observation (spectrum). Nowa- days many laboratory hyperspectral cameras can obtain an image with hun- dreds of thousands of pixels. In remote sensing the images can easily have resolution up to several million pixels. Working with such images can easily lead to memory and computational issues if standard algorithms for estima- tion of principal components are employed.

In the present work author is going to share his experience of creating an in- teractive tool for exploratory analysis of relatively large hyperspectral images with PCA using R and JavaScript based user interface. The main goal was to make the implementation of PCA decomposition as fast as possible without substituting standard R tools (e.g. library for linear algebra operations), so the package is easily available for non-experienced users. For the same reason it was decided to write all code in R without using C/C++ at the current stage.

Three PCA state-of-art algorithms have been considered — singular value de- composition (SVD), eigenvectors of variance-covariance matrix and non- iterative partial least squares (NIPALS). Each algorithm was investigated in order to find the steps critical for working with datasets containing huge number of rows. The steps were then optimized (both algorithmically as well as an R implementation) to reduce the computational time. The results of op- timization will be discussed.

Original language | English |
---|---|

Publication date | 2016 |

Number of pages | 1 |

Publication status | Published - 2016 |

Event | Winter Symposium on Chemometrics - Samara, Russian Federation Duration: 29 Feb 2016 → 4 Mar 2016 Conference number: 10 http://wsc.chemometrics.ru/wsc10/ |

### Conference

Conference | Winter Symposium on Chemometrics |
---|---|

Number | 10 |

Country | Russian Federation |

City | Samara |

Period | 29/02/2016 → 04/03/2016 |

Internet address |

### Fingerprint

### Cite this

*Speeding up PCA in R*. Abstract from Winter Symposium on Chemometrics, Samara, Russian Federation.

}

**Speeding up PCA in R.** / Kucheryavskiy, Sergey V.

Research output: Contribution to conference without publisher/journal › Conference abstract for conference › Research

TY - ABST

T1 - Speeding up PCA in R

AU - Kucheryavskiy, Sergey V.

PY - 2016

Y1 - 2016

N2 - Principal component analysis (PCA) is a widespread method for multivariate data exploration. In many cases, PCA is the first tool an analyst uses to get a feeling of the data structure. Analysis of hyperspectral images is not an excep- tion here and many image analysis packages provide, among other tools, the possibility for exploring images with PCA.Most of multivariate datasets used in chemometrics have a number of varia- bles (significantly) larger than a number of objects (observations) and many algorithms are optimized for this case. When PCA is applied to a hyperspectral image, each pixel of the image is an individual observation (spectrum). Nowa- days many laboratory hyperspectral cameras can obtain an image with hun- dreds of thousands of pixels. In remote sensing the images can easily have resolution up to several million pixels. Working with such images can easily lead to memory and computational issues if standard algorithms for estima- tion of principal components are employed.In the present work author is going to share his experience of creating an in- teractive tool for exploratory analysis of relatively large hyperspectral images with PCA using R and JavaScript based user interface. The main goal was to make the implementation of PCA decomposition as fast as possible without substituting standard R tools (e.g. library for linear algebra operations), so the package is easily available for non-experienced users. For the same reason it was decided to write all code in R without using C/C++ at the current stage.Three PCA state-of-art algorithms have been considered — singular value de- composition (SVD), eigenvectors of variance-covariance matrix and non- iterative partial least squares (NIPALS). Each algorithm was investigated in order to find the steps critical for working with datasets containing huge number of rows. The steps were then optimized (both algorithmically as well as an R implementation) to reduce the computational time. The results of op- timization will be discussed.

AB - Principal component analysis (PCA) is a widespread method for multivariate data exploration. In many cases, PCA is the first tool an analyst uses to get a feeling of the data structure. Analysis of hyperspectral images is not an excep- tion here and many image analysis packages provide, among other tools, the possibility for exploring images with PCA.Most of multivariate datasets used in chemometrics have a number of varia- bles (significantly) larger than a number of objects (observations) and many algorithms are optimized for this case. When PCA is applied to a hyperspectral image, each pixel of the image is an individual observation (spectrum). Nowa- days many laboratory hyperspectral cameras can obtain an image with hun- dreds of thousands of pixels. In remote sensing the images can easily have resolution up to several million pixels. Working with such images can easily lead to memory and computational issues if standard algorithms for estima- tion of principal components are employed.In the present work author is going to share his experience of creating an in- teractive tool for exploratory analysis of relatively large hyperspectral images with PCA using R and JavaScript based user interface. The main goal was to make the implementation of PCA decomposition as fast as possible without substituting standard R tools (e.g. library for linear algebra operations), so the package is easily available for non-experienced users. For the same reason it was decided to write all code in R without using C/C++ at the current stage.Three PCA state-of-art algorithms have been considered — singular value de- composition (SVD), eigenvectors of variance-covariance matrix and non- iterative partial least squares (NIPALS). Each algorithm was investigated in order to find the steps critical for working with datasets containing huge number of rows. The steps were then optimized (both algorithmically as well as an R implementation) to reduce the computational time. The results of op- timization will be discussed.

M3 - Conference abstract for conference

ER -