Efficient tools for principal component analysis of complex data— A tutorial

Oxana Rodionova, Sergey Kucheryavskiy*, Alexey Pomerantsev

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

28 Citations (Scopus)
83 Downloads (Pure)

Abstract

Basic tools for exploration and interpretation of Principal Component Analysis (PCA) results are well-known and thoroughly described in many comprehensive tutorials. However, in the recent decade, several new tools have been developed. Some of them were originally created for solving authentication and classification tasks. In this paper we demonstrate that they can also be useful for the exploratory data analysis.

We discuss several important aspects of the PCA exploration of high dimensional datasets, such as estimation of a proper complexity of PCA model, dependence on the data structure, presence of outliers, etc. We introduce new tools for the assessment of the PCA model complexity such as the plots of the degrees of freedom developed for the orthogonal and score distances, as well as the Extreme and Distance plots, which present a new look at the features of the training and test (new) data. These tools are simple and fast in computation. In some cases, they are more efficient than the conventional PCA tools. A simulated example provides an intuitive illustration of their application. Three real-world examples originated from various fields are employed to demonstrate capabilities of the new tools and ways they can be used. The first example considers the reproducibility of a handheld spec- trometer using a dataset that is presented for the first time. The other two datasets, which describe the authentication of olives in brine and classification of wines by their geographical origin, are already known and are often used for the illustrative purposes.

The paper is written in the form of tutorial; however, we do not touch upon the well-known things, such as the algorithms for the PCA decomposition, or interpretation of scores and loadings. Instead, we pay attention pri- marily to more advanced topics, such as exploration of data homogeneity, understanding and evaluation of an optimal model complexity. The tutorial is accompanied by links to free software that implements the tools.
Original languageEnglish
Article number104304
JournalChemometrics and Intelligent Laboratory Systems
Volume213
Number of pages11
ISSN0169-7439
DOIs
Publication statusPublished - 2021

Keywords

  • Chemometrics
  • Principal Component Analysis
  • SIMCA
  • Validation
  • Exploratory analysis
  • Extreme plots
  • Model complexity
  • Principal component analysis

Fingerprint

Dive into the research topics of 'Efficient tools for principal component analysis of complex data— A tutorial'. Together they form a unique fingerprint.

Cite this