TY - JOUR
T1 - Sample-based Attribute Selective AnDE for Large Data
AU - Chen, Shenglei
AU - Martinez, Ana
AU - Webb, Geoffrey
AU - Wang, Limin
PY - 2017
Y1 - 2017
N2 - More and more applications come with large data sets in the past decade. However, existing algorithms cannot guarantee to scale well on large data. Averaged n-Dependence Estimators (AnDE) allows for flexible learning from out-of-core data, by varying the value of n (number of super parents). Hence AnDE is especially appropriate for large data learning. In this paper, we propose a sample-based attribute selection technique for AnDE. It needs one more pass through the training data, in which a multitude of approximate AnDE models are built and efficiently assessed by leave-one-out cross validation. The use of a sample reduces the training time. Experiments on 15 large data sets demonstrate that the proposed technique significantly reduces AnDE's error at the cost of a modest increase in training time. This efficient and scalable out-of-core approach delivers superior or comparable performance to typical in-core Bayesian network classifiers.
AB - More and more applications come with large data sets in the past decade. However, existing algorithms cannot guarantee to scale well on large data. Averaged n-Dependence Estimators (AnDE) allows for flexible learning from out-of-core data, by varying the value of n (number of super parents). Hence AnDE is especially appropriate for large data learning. In this paper, we propose a sample-based attribute selection technique for AnDE. It needs one more pass through the training data, in which a multitude of approximate AnDE models are built and efficiently assessed by leave-one-out cross validation. The use of a sample reduces the training time. Experiments on 15 large data sets demonstrate that the proposed technique significantly reduces AnDE's error at the cost of a modest increase in training time. This efficient and scalable out-of-core approach delivers superior or comparable performance to typical in-core Bayesian network classifiers.
KW - Attribute selection
KW - Averaged n-Dependence Estimators (AnDE)
KW - Bayesian network classifiers
KW - Classification learning
KW - Large data
KW - Leave-one-out cross validation
UR - http://www.scopus.com/inward/record.url?scp=84992053080&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2016.2608881
DO - 10.1109/TKDE.2016.2608881
M3 - Journal article
SN - 1041-4347
VL - 29
SP - 172
EP - 185
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 1
M1 - 7565579
ER -