Abstract
This paper studies filter and hybrid filter-wrapper feature subset selection for unsupervised learning (data clustering). We constrain the search for the best feature subset by scoring the dependence of every feature on the rest of the features, conjecturing that these scores discriminate some irrelevant features. We report experimental results on artificial and real data for unsupervised learning of naive Bayes models. Both the filter and hybrid approaches perform satisfactorily.
Original language | English |
---|---|
Title of host publication | Proceedings on the Workshop on Probabilistic Graphical Models for Classification : (within ECML/PKDD 2003) |
Number of pages | 11 |
Publication date | 2003 |
Pages | 71-82 |
Publication status | Published - 2003 |
Event | ECML/PKDD - Cavtat-Dubrovnik, Croatia Duration: 22 Sept 2003 → 26 Sept 2003 Conference number: 14th / 7th |
Conference
Conference | ECML/PKDD |
---|---|
Number | 14th / 7th |
Country/Territory | Croatia |
City | Cavtat-Dubrovnik |
Period | 22/09/2003 → 26/09/2003 |
Keywords
- feature selection
- data-clustering
- EM-algorithm
- dependence measure