Abstract
Decision trees and their ensembles became quite popular for data analysis during the past decade. One of the main reasons for that is current boom in big data, where traditional statistical methods (such as, e.g., multiple linear regression) are not very efficient. However, in chemometrics these methods are still not very widespread, first of all because of several limitations related to the ratio between number of variables and observations. This paper presents several examples on how decision trees and their ensembles can be used in analysis of NIR spectroscopic data both for regression and classification. We will try to consider all important aspects including optimization and validation of models, evaluation of results, treating missing data and selection of most important variables. The performance and outcome of the decision tree-based methods are compared with more traditional approach based on partial least squares.
Original language | English |
---|---|
Journal | Journal of Analysis and Testing |
Volume | 2 |
Issue number | 3 |
Pages (from-to) | 274-289 |
Number of pages | 15 |
ISSN | 2096-241X |
DOIs | |
Publication status | Published - 2018 |
Keywords
- decision trees
- random forest
- Near Infrared Spectroscopy