### Resumé

Most of multivariate datasets used in chemometrics have a number of varia- bles (significantly) larger than a number of objects (observations) and many algorithms are optimized for this case. When PCA is applied to a hyperspectral image, each pixel of the image is an individual observation (spectrum). Nowa- days many laboratory hyperspectral cameras can obtain an image with hun- dreds of thousands of pixels. In remote sensing the images can easily have resolution up to several million pixels. Working with such images can easily lead to memory and computational issues if standard algorithms for estima- tion of principal components are employed.

In the present work author is going to share his experience of creating an in- teractive tool for exploratory analysis of relatively large hyperspectral images with PCA using R and JavaScript based user interface. The main goal was to make the implementation of PCA decomposition as fast as possible without substituting standard R tools (e.g. library for linear algebra operations), so the package is easily available for non-experienced users. For the same reason it was decided to write all code in R without using C/C++ at the current stage.

Three PCA state-of-art algorithms have been considered — singular value de- composition (SVD), eigenvectors of variance-covariance matrix and non- iterative partial least squares (NIPALS). Each algorithm was investigated in order to find the steps critical for working with datasets containing huge number of rows. The steps were then optimized (both algorithmically as well as an R implementation) to reduce the computational time. The results of op- timization will be discussed.

Originalsprog | Engelsk |
---|---|

Publikationsdato | 2016 |

Antal sider | 1 |

Status | Udgivet - 2016 |

Begivenhed | Winter Symposium on Chemometrics - Samara, Rusland Varighed: 29 feb. 2016 → 4 mar. 2016 Konferencens nummer: 10 http://wsc.chemometrics.ru/wsc10/ |

### Konference

Konference | Winter Symposium on Chemometrics |
---|---|

Nummer | 10 |

Land | Rusland |

By | Samara |

Periode | 29/02/2016 → 04/03/2016 |

Internetadresse |

### Fingerprint

### Citer dette

*Speeding up PCA in R*. Abstract fra Winter Symposium on Chemometrics, Samara, Rusland.

}

**Speeding up PCA in R.** / Kucheryavskiy, Sergey V.

Publikation: Konferencebidrag uden forlag/tidsskrift › Konferenceabstrakt til konference › Forskning

TY - ABST

T1 - Speeding up PCA in R

AU - Kucheryavskiy, Sergey V.

PY - 2016

Y1 - 2016

N2 - Principal component analysis (PCA) is a widespread method for multivariate data exploration. In many cases, PCA is the first tool an analyst uses to get a feeling of the data structure. Analysis of hyperspectral images is not an excep- tion here and many image analysis packages provide, among other tools, the possibility for exploring images with PCA.Most of multivariate datasets used in chemometrics have a number of varia- bles (significantly) larger than a number of objects (observations) and many algorithms are optimized for this case. When PCA is applied to a hyperspectral image, each pixel of the image is an individual observation (spectrum). Nowa- days many laboratory hyperspectral cameras can obtain an image with hun- dreds of thousands of pixels. In remote sensing the images can easily have resolution up to several million pixels. Working with such images can easily lead to memory and computational issues if standard algorithms for estima- tion of principal components are employed.In the present work author is going to share his experience of creating an in- teractive tool for exploratory analysis of relatively large hyperspectral images with PCA using R and JavaScript based user interface. The main goal was to make the implementation of PCA decomposition as fast as possible without substituting standard R tools (e.g. library for linear algebra operations), so the package is easily available for non-experienced users. For the same reason it was decided to write all code in R without using C/C++ at the current stage.Three PCA state-of-art algorithms have been considered — singular value de- composition (SVD), eigenvectors of variance-covariance matrix and non- iterative partial least squares (NIPALS). Each algorithm was investigated in order to find the steps critical for working with datasets containing huge number of rows. The steps were then optimized (both algorithmically as well as an R implementation) to reduce the computational time. The results of op- timization will be discussed.

AB - Principal component analysis (PCA) is a widespread method for multivariate data exploration. In many cases, PCA is the first tool an analyst uses to get a feeling of the data structure. Analysis of hyperspectral images is not an excep- tion here and many image analysis packages provide, among other tools, the possibility for exploring images with PCA.Most of multivariate datasets used in chemometrics have a number of varia- bles (significantly) larger than a number of objects (observations) and many algorithms are optimized for this case. When PCA is applied to a hyperspectral image, each pixel of the image is an individual observation (spectrum). Nowa- days many laboratory hyperspectral cameras can obtain an image with hun- dreds of thousands of pixels. In remote sensing the images can easily have resolution up to several million pixels. Working with such images can easily lead to memory and computational issues if standard algorithms for estima- tion of principal components are employed.In the present work author is going to share his experience of creating an in- teractive tool for exploratory analysis of relatively large hyperspectral images with PCA using R and JavaScript based user interface. The main goal was to make the implementation of PCA decomposition as fast as possible without substituting standard R tools (e.g. library for linear algebra operations), so the package is easily available for non-experienced users. For the same reason it was decided to write all code in R without using C/C++ at the current stage.Three PCA state-of-art algorithms have been considered — singular value de- composition (SVD), eigenvectors of variance-covariance matrix and non- iterative partial least squares (NIPALS). Each algorithm was investigated in order to find the steps critical for working with datasets containing huge number of rows. The steps were then optimized (both algorithmically as well as an R implementation) to reduce the computational time. The results of op- timization will be discussed.

M3 - Conference abstract for conference

ER -