Abstract
In data science, there are important parameters that affect the accuracy of the algorithms used. Some of these parameters are: the type of data objects, the membership assignments, and distance or similarity functions. In this chapter we describe different data types , membership functions , and similarity functions and discuss the pros and cons of using each of them. Conventional similarity functions evaluate objects in the vector space. Contrarily, Weighted Feature Distance (WFD) functions compare data objects in both feature and vector spaces, preventing the system from being affected by some dominant features. Traditional membership functions assign membership values to data objects but impose some restrictions. Bounded Fuzzy Possibilistic Method (BFPM) makes possible for data objects to participate fully or partially in several clusters or even in all clusters. BFPM introduces intervals for the upper and lower boundaries for data objects with respect to each cluster. BFPM facilitates algorithms to converge and also inherits the abilities of conventional fuzzy and possibilistic methods. In Big Data applications knowing the exact type of data objects and selecting the most accurate similarity [1] and membership assignments is crucial in decreasing computing costs and obtaining the best performance. This chapter provides data types taxonomies to assist data miners in selecting the right learning method on each selected data set. Examples illustrate how to evaluate the accuracy and performance of the proposed algorithms. Experimental results show why these parameters are important.
Original language | English |
---|---|
Title of host publication | Data Science and Big Data : An Environment of Computational Intelligence |
Number of pages | 20 |
Volume | 24 |
Publisher | Springer |
Publication date | 2017 |
Pages | 29-48 |
ISBN (Print) | 978-3-319-53474-2 |
ISBN (Electronic) | 978-3-319-53474-9 |
DOIs | |
Publication status | Published - 2017 |
Series | Studies in Big Data |
---|---|
Volume | 24 |
ISSN | 2197-6503 |
Bibliographical note
Publisher Copyright:© 2017, Springer International Publishing AG.
Keywords
- Bounded fuzzy-possibilistic method
- Membership function
- Distance function
- Supervised learning
- Unsupervised learning
- Clustering
- Data type
- Critical objects
- Outstanding objects
- Weighted feature distance