TY - JOUR
T1 - Analysis of NIR spectroscopic data using decision trees and their ensembles
AU - Kucheryavskiy, Sergey V.
PY - 2018
Y1 - 2018
N2 - Decision trees and their ensembles became quite popular for data analysis during the past decade. One of the main reasons for that is current boom in big data, where traditional statistical methods (such as, e.g., multiple linear regression) are not very efficient. However, in chemometrics these methods are still not very widespread, first of all because of several limitations related to the ratio between number of variables and observations. This paper presents several examples on how decision trees and their ensembles can be used in analysis of NIR spectroscopic data both for regression and classification. We will try to consider all important aspects including optimization and validation of models, evaluation of results, treating missing data and selection of most important variables. The performance and outcome of the decision tree-based methods are compared with more traditional approach based on partial least squares.
AB - Decision trees and their ensembles became quite popular for data analysis during the past decade. One of the main reasons for that is current boom in big data, where traditional statistical methods (such as, e.g., multiple linear regression) are not very efficient. However, in chemometrics these methods are still not very widespread, first of all because of several limitations related to the ratio between number of variables and observations. This paper presents several examples on how decision trees and their ensembles can be used in analysis of NIR spectroscopic data both for regression and classification. We will try to consider all important aspects including optimization and validation of models, evaluation of results, treating missing data and selection of most important variables. The performance and outcome of the decision tree-based methods are compared with more traditional approach based on partial least squares.
KW - decision trees
KW - random forest
KW - Near Infrared Spectroscopy
U2 - 10.1007/s41664-018-0078-0
DO - 10.1007/s41664-018-0078-0
M3 - Journal article
SN - 2096-241X
VL - 2
SP - 274
EP - 289
JO - Journal of Analysis and Testing
JF - Journal of Analysis and Testing
IS - 3
ER -