HSM: Heterogeneous Subspace Mining in High Dimensional Data

Emmanuel Müller; Ira Assent; Thomas Seidl

doi:10.1007/978-3-642-02279-1_36

HSM: Heterogeneous Subspace Mining in High Dimensional Data

Emmanuel Müller, Ira Assent, Thomas Seidl

Publikation: Bidrag til tidsskrift › Konferenceartikel i tidsskrift › Forskning › peer review

9 Citationer (Scopus)

Abstract

Heterogeneous data, i.e. data with both categorical and continuous values, is common in many databases. However, most data mining algorithms assume either continuous or categorical attributes, but not both. In high dimensional data, phenomena due to the "curse of dimensionality" pose additional challenges. Usually, due to locally varying relevance of attributes, patterns do not show across the full set of attributes.

In this paper we propose HSM, which defines a new pattern model for heterogeneous high dimensional data. It allows data mining in arbitrary subsets of the attributes that are relevant for the respective patterns. Based on this model we propose an efficient algorithm, which is aware of the heterogeneity of the attributes. We extend an indexing structure for continuous attributes such that HSM indexing adapts to different attribute types. In our experiments we show that HSM efficiently mines patterns in arbitrary subspaces of heterogeneous high dimensional data.

Originalsprog	Engelsk
Bogserie	Lecture Notes in Computer Science
Vol/bind	5566
Sider (fra-til)	497-516
ISSN	0302-9743
DOI	https://doi.org/10.1007/978-3-642-02279-1_36
Status	Udgivet - 2009
Begivenhed	International Conference on Scientific and Statistical Database Management (SSDBM 2009) - New Orleans, Louisiana, USA Varighed: 2 jun. 2009 → 4 jun. 2009 Konferencens nummer: 21

Konference

Konference	International Conference on Scientific and Statistical Database Management (SSDBM 2009)
Nummer	21
Land/Område	USA
By	New Orleans, Louisiana
Periode	02/06/2009 → 04/06/2009

Adgang til dokumentet

10.1007/978-3-642-02279-1_36

http://www.springerlink.com/content/g21108372l61/

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Citationsformater

@inproceedings{3d3b1c50fd3d11de9a61000ea68e967b,

title = "HSM: Heterogeneous Subspace Mining in High Dimensional Data",

abstract = "Heterogeneous data, i.e. data with both categorical and continuous values, is common in many databases. However, most data mining algorithms assume either continuous or categorical attributes, but not both. In high dimensional data, phenomena due to the {"}curse of dimensionality{"} pose additional challenges. Usually, due to locally varying relevance of attributes, patterns do not show across the full set of attributes.In this paper we propose HSM, which defines a new pattern model for heterogeneous high dimensional data. It allows data mining in arbitrary subsets of the attributes that are relevant for the respective patterns. Based on this model we propose an efficient algorithm, which is aware of the heterogeneity of the attributes. We extend an indexing structure for continuous attributes such that HSM indexing adapts to different attribute types. In our experiments we show that HSM efficiently mines patterns in arbitrary subspaces of heterogeneous high dimensional data.",

author = "Emmanuel M{\"u}ller and Ira Assent and Thomas Seidl",

year = "2009",

doi = "10.1007/978-3-642-02279-1_36",

language = "English",

volume = "5566",

pages = "497--516",

journal = "Lecture Notes in Computer Science",

issn = "0302-9743",

publisher = "Physica-Verlag",

note = "International Conference on Scientific and Statistical Database Management (SSDBM 2009) ; Conference date: 02-06-2009 Through 04-06-2009",

}

TY - GEN

T1 - HSM: Heterogeneous Subspace Mining in High Dimensional Data

AU - Müller, Emmanuel

AU - Assent, Ira

AU - Seidl, Thomas

N1 - Conference code: 21

PY - 2009

Y1 - 2009

N2 - Heterogeneous data, i.e. data with both categorical and continuous values, is common in many databases. However, most data mining algorithms assume either continuous or categorical attributes, but not both. In high dimensional data, phenomena due to the "curse of dimensionality" pose additional challenges. Usually, due to locally varying relevance of attributes, patterns do not show across the full set of attributes.In this paper we propose HSM, which defines a new pattern model for heterogeneous high dimensional data. It allows data mining in arbitrary subsets of the attributes that are relevant for the respective patterns. Based on this model we propose an efficient algorithm, which is aware of the heterogeneity of the attributes. We extend an indexing structure for continuous attributes such that HSM indexing adapts to different attribute types. In our experiments we show that HSM efficiently mines patterns in arbitrary subspaces of heterogeneous high dimensional data.

AB - Heterogeneous data, i.e. data with both categorical and continuous values, is common in many databases. However, most data mining algorithms assume either continuous or categorical attributes, but not both. In high dimensional data, phenomena due to the "curse of dimensionality" pose additional challenges. Usually, due to locally varying relevance of attributes, patterns do not show across the full set of attributes.In this paper we propose HSM, which defines a new pattern model for heterogeneous high dimensional data. It allows data mining in arbitrary subsets of the attributes that are relevant for the respective patterns. Based on this model we propose an efficient algorithm, which is aware of the heterogeneity of the attributes. We extend an indexing structure for continuous attributes such that HSM indexing adapts to different attribute types. In our experiments we show that HSM efficiently mines patterns in arbitrary subspaces of heterogeneous high dimensional data.

U2 - 10.1007/978-3-642-02279-1_36

DO - 10.1007/978-3-642-02279-1_36

M3 - Conference article in Journal

SN - 0302-9743

VL - 5566

SP - 497

EP - 516

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

T2 - International Conference on Scientific and Statistical Database Management (SSDBM 2009)

Y2 - 2 June 2009 through 4 June 2009

ER -

HSM: Heterogeneous Subspace Mining in High Dimensional Data

Abstract

Konference

Adgang til dokumentet

AUB Link

Fingeraftryk

Citationsformater