Sample-based Attribute Selective AnDE for Large Data

Shenglei Chen, Ana Martinez, Geoffrey Webb, Limin Wang

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

25 Citationer (Scopus)

Abstract

More and more applications come with large data sets in the past decade. However, existing algorithms cannot guarantee to scale well on large data. Averaged n-Dependence Estimators (AnDE) allows for flexible learning from out-of-core data, by varying the value of n (number of super parents). Hence AnDE is especially appropriate for large data learning. In this paper, we propose a sample-based attribute selection technique for AnDE. It needs one more pass through the training data, in which a multitude of approximate AnDE models are built and efficiently assessed by leave-one-out cross validation. The use of a sample reduces the training time. Experiments on 15 large data sets demonstrate that the proposed technique significantly reduces AnDE's error at the cost of a modest increase in training time. This efficient and scalable out-of-core approach delivers superior or comparable performance to typical in-core Bayesian network classifiers.

OriginalsprogEngelsk
Artikelnummer7565579
TidsskriftIEEE Transactions on Knowledge and Data Engineering
Vol/bind29
Udgave nummer1
Sider (fra-til)172-185
ISSN1041-4347
DOI
StatusUdgivet - 2017

Fingeraftryk

Dyk ned i forskningsemnerne om 'Sample-based Attribute Selective AnDE for Large Data'. Sammen danner de et unikt fingeraftryk.

Citationsformater