Speeding up PCA in R

Publikation: Konferencebidrag uden forlag/tidsskriftKonferenceabstrakt til konferenceForskning

Resumé

Principal component analysis (PCA) is a widespread method for multivariate data exploration. In many cases, PCA is the first tool an analyst uses to get a feeling of the data structure. Analysis of hyperspectral images is not an excep- tion here and many image analysis packages provide, among other tools, the possibility for exploring images with PCA.

Most of multivariate datasets used in chemometrics have a number of varia- bles (significantly) larger than a number of objects (observations) and many algorithms are optimized for this case. When PCA is applied to a hyperspectral image, each pixel of the image is an individual observation (spectrum). Nowa- days many laboratory hyperspectral cameras can obtain an image with hun- dreds of thousands of pixels. In remote sensing the images can easily have resolution up to several million pixels. Working with such images can easily lead to memory and computational issues if standard algorithms for estima- tion of principal components are employed.

In the present work author is going to share his experience of creating an in- teractive tool for exploratory analysis of relatively large hyperspectral images with PCA using R and JavaScript based user interface. The main goal was to make the implementation of PCA decomposition as fast as possible without substituting standard R tools (e.g. library for linear algebra operations), so the package is easily available for non-experienced users. For the same reason it was decided to write all code in R without using C/C++ at the current stage.

Three PCA state-of-art algorithms have been considered — singular value de- composition (SVD), eigenvectors of variance-covariance matrix and non- iterative partial least squares (NIPALS). Each algorithm was investigated in order to find the steps critical for working with datasets containing huge number of rows. The steps were then optimized (both algorithmically as well as an R implementation) to reduce the computational time. The results of op- timization will be discussed.
OriginalsprogEngelsk
Publikationsdato2016
Antal sider1
StatusUdgivet - 2016
BegivenhedWinter Symposium on Chemometrics - Samara, Rusland
Varighed: 29 feb. 20164 mar. 2016
Konferencens nummer: 10
http://wsc.chemometrics.ru/wsc10/

Konference

KonferenceWinter Symposium on Chemometrics
Nummer10
LandRusland
BySamara
Periode29/02/201604/03/2016
Internetadresse

Fingerprint

Principal component analysis
Pixels
Linear algebra
Covariance matrix
Eigenvalues and eigenfunctions
Image analysis
User interfaces
Data structures
Remote sensing
Cameras
Decomposition
Data storage equipment
Chemical analysis

Citer dette

Kucheryavskiy, S. V. (2016). Speeding up PCA in R. Abstract fra Winter Symposium on Chemometrics, Samara, Rusland.
Kucheryavskiy, Sergey V. / Speeding up PCA in R. Abstract fra Winter Symposium on Chemometrics, Samara, Rusland.1 s.
@conference{76587140bd554127bc013c9095cb7116,
title = "Speeding up PCA in R",
abstract = "Principal component analysis (PCA) is a widespread method for multivariate data exploration. In many cases, PCA is the first tool an analyst uses to get a feeling of the data structure. Analysis of hyperspectral images is not an excep- tion here and many image analysis packages provide, among other tools, the possibility for exploring images with PCA.Most of multivariate datasets used in chemometrics have a number of varia- bles (significantly) larger than a number of objects (observations) and many algorithms are optimized for this case. When PCA is applied to a hyperspectral image, each pixel of the image is an individual observation (spectrum). Nowa- days many laboratory hyperspectral cameras can obtain an image with hun- dreds of thousands of pixels. In remote sensing the images can easily have resolution up to several million pixels. Working with such images can easily lead to memory and computational issues if standard algorithms for estima- tion of principal components are employed.In the present work author is going to share his experience of creating an in- teractive tool for exploratory analysis of relatively large hyperspectral images with PCA using R and JavaScript based user interface. The main goal was to make the implementation of PCA decomposition as fast as possible without substituting standard R tools (e.g. library for linear algebra operations), so the package is easily available for non-experienced users. For the same reason it was decided to write all code in R without using C/C++ at the current stage.Three PCA state-of-art algorithms have been considered — singular value de- composition (SVD), eigenvectors of variance-covariance matrix and non- iterative partial least squares (NIPALS). Each algorithm was investigated in order to find the steps critical for working with datasets containing huge number of rows. The steps were then optimized (both algorithmically as well as an R implementation) to reduce the computational time. The results of op- timization will be discussed.",
author = "Kucheryavskiy, {Sergey V.}",
year = "2016",
language = "English",
note = "Winter Symposium on Chemometrics, WSC-10 ; Conference date: 29-02-2016 Through 04-03-2016",
url = "http://wsc.chemometrics.ru/wsc10/",

}

Kucheryavskiy, SV 2016, 'Speeding up PCA in R', Winter Symposium on Chemometrics, Samara, Rusland, 29/02/2016 - 04/03/2016.

Speeding up PCA in R. / Kucheryavskiy, Sergey V.

2016. Abstract fra Winter Symposium on Chemometrics, Samara, Rusland.

Publikation: Konferencebidrag uden forlag/tidsskriftKonferenceabstrakt til konferenceForskning

TY - ABST

T1 - Speeding up PCA in R

AU - Kucheryavskiy, Sergey V.

PY - 2016

Y1 - 2016

N2 - Principal component analysis (PCA) is a widespread method for multivariate data exploration. In many cases, PCA is the first tool an analyst uses to get a feeling of the data structure. Analysis of hyperspectral images is not an excep- tion here and many image analysis packages provide, among other tools, the possibility for exploring images with PCA.Most of multivariate datasets used in chemometrics have a number of varia- bles (significantly) larger than a number of objects (observations) and many algorithms are optimized for this case. When PCA is applied to a hyperspectral image, each pixel of the image is an individual observation (spectrum). Nowa- days many laboratory hyperspectral cameras can obtain an image with hun- dreds of thousands of pixels. In remote sensing the images can easily have resolution up to several million pixels. Working with such images can easily lead to memory and computational issues if standard algorithms for estima- tion of principal components are employed.In the present work author is going to share his experience of creating an in- teractive tool for exploratory analysis of relatively large hyperspectral images with PCA using R and JavaScript based user interface. The main goal was to make the implementation of PCA decomposition as fast as possible without substituting standard R tools (e.g. library for linear algebra operations), so the package is easily available for non-experienced users. For the same reason it was decided to write all code in R without using C/C++ at the current stage.Three PCA state-of-art algorithms have been considered — singular value de- composition (SVD), eigenvectors of variance-covariance matrix and non- iterative partial least squares (NIPALS). Each algorithm was investigated in order to find the steps critical for working with datasets containing huge number of rows. The steps were then optimized (both algorithmically as well as an R implementation) to reduce the computational time. The results of op- timization will be discussed.

AB - Principal component analysis (PCA) is a widespread method for multivariate data exploration. In many cases, PCA is the first tool an analyst uses to get a feeling of the data structure. Analysis of hyperspectral images is not an excep- tion here and many image analysis packages provide, among other tools, the possibility for exploring images with PCA.Most of multivariate datasets used in chemometrics have a number of varia- bles (significantly) larger than a number of objects (observations) and many algorithms are optimized for this case. When PCA is applied to a hyperspectral image, each pixel of the image is an individual observation (spectrum). Nowa- days many laboratory hyperspectral cameras can obtain an image with hun- dreds of thousands of pixels. In remote sensing the images can easily have resolution up to several million pixels. Working with such images can easily lead to memory and computational issues if standard algorithms for estima- tion of principal components are employed.In the present work author is going to share his experience of creating an in- teractive tool for exploratory analysis of relatively large hyperspectral images with PCA using R and JavaScript based user interface. The main goal was to make the implementation of PCA decomposition as fast as possible without substituting standard R tools (e.g. library for linear algebra operations), so the package is easily available for non-experienced users. For the same reason it was decided to write all code in R without using C/C++ at the current stage.Three PCA state-of-art algorithms have been considered — singular value de- composition (SVD), eigenvectors of variance-covariance matrix and non- iterative partial least squares (NIPALS). Each algorithm was investigated in order to find the steps critical for working with datasets containing huge number of rows. The steps were then optimized (both algorithmically as well as an R implementation) to reduce the computational time. The results of op- timization will be discussed.

M3 - Conference abstract for conference

ER -

Kucheryavskiy SV. Speeding up PCA in R. 2016. Abstract fra Winter Symposium on Chemometrics, Samara, Rusland.