HSM: Heterogeneous Subspace Mining in High Dimensional Data

Emmanuel Müller, Ira Assent, Thomas Seidl

Publikation: Bidrag til tidsskriftKonferenceartikel i tidsskriftForskningpeer review

9 Citationer (Scopus)

Abstract

Heterogeneous data, i.e. data with both categorical and continuous values, is common in many databases. However, most data mining algorithms assume either continuous or categorical attributes, but not both. In high dimensional data, phenomena due to the "curse of dimensionality" pose additional challenges. Usually, due to locally varying relevance of attributes, patterns do not show across the full set of attributes.

In this paper we propose HSM, which defines a new pattern model for heterogeneous high dimensional data. It allows data mining in arbitrary subsets of the attributes that are relevant for the respective patterns. Based on this model we propose an efficient algorithm, which is aware of the heterogeneity of the attributes. We extend an indexing structure for continuous attributes such that HSM indexing adapts to different attribute types. In our experiments we show that HSM efficiently mines patterns in arbitrary subspaces of heterogeneous high dimensional data.

OriginalsprogEngelsk
BogserieLecture Notes in Computer Science
Vol/bind5566
Sider (fra-til)497-516
ISSN0302-9743
DOI
StatusUdgivet - 2009
BegivenhedInternational Conference on Scientific and Statistical Database Management (SSDBM 2009) - New Orleans, Louisiana, USA
Varighed: 2 jun. 20094 jun. 2009
Konferencens nummer: 21

Konference

KonferenceInternational Conference on Scientific and Statistical Database Management (SSDBM 2009)
Nummer21
Land/OmrådeUSA
ByNew Orleans, Louisiana
Periode02/06/200904/06/2009

Fingeraftryk

Dyk ned i forskningsemnerne om 'HSM: Heterogeneous Subspace Mining in High Dimensional Data'. Sammen danner de et unikt fingeraftryk.

Citationsformater