Speeding up PCA in R

Publikation: Konferencebidrag uden forlag/tidsskriftKonferenceabstrakt til konferenceForskning

Abstract

Principal component analysis (PCA) is a widespread method for multivariate data exploration. In many cases, PCA is the first tool an analyst uses to get a feeling of the data structure. Analysis of hyperspectral images is not an excep- tion here and many image analysis packages provide, among other tools, the possibility for exploring images with PCA.

Most of multivariate datasets used in chemometrics have a number of varia- bles (significantly) larger than a number of objects (observations) and many algorithms are optimized for this case. When PCA is applied to a hyperspectral image, each pixel of the image is an individual observation (spectrum). Nowa- days many laboratory hyperspectral cameras can obtain an image with hun- dreds of thousands of pixels. In remote sensing the images can easily have resolution up to several million pixels. Working with such images can easily lead to memory and computational issues if standard algorithms for estima- tion of principal components are employed.

In the present work author is going to share his experience of creating an in- teractive tool for exploratory analysis of relatively large hyperspectral images with PCA using R and JavaScript based user interface. The main goal was to make the implementation of PCA decomposition as fast as possible without substituting standard R tools (e.g. library for linear algebra operations), so the package is easily available for non-experienced users. For the same reason it was decided to write all code in R without using C/C++ at the current stage.

Three PCA state-of-art algorithms have been considered — singular value de- composition (SVD), eigenvectors of variance-covariance matrix and non- iterative partial least squares (NIPALS). Each algorithm was investigated in order to find the steps critical for working with datasets containing huge number of rows. The steps were then optimized (both algorithmically as well as an R implementation) to reduce the computational time. The results of op- timization will be discussed.
OriginalsprogEngelsk
Publikationsdato2016
Antal sider1
StatusUdgivet - 2016
BegivenhedWinter Symposium on Chemometrics - Samara, Rusland
Varighed: 29 feb. 20164 mar. 2016
Konferencens nummer: 10
http://wsc.chemometrics.ru/wsc10/

Konference

KonferenceWinter Symposium on Chemometrics
Nummer10
Land/OmrådeRusland
BySamara
Periode29/02/201604/03/2016
Internetadresse

Fingeraftryk

Dyk ned i forskningsemnerne om 'Speeding up PCA in R'. Sammen danner de et unikt fingeraftryk.

Citationsformater