Clustering is used to group objects in a way by which the group members are similar to each other. These objects can be anything related to data.
So why should data be clustered like this? It's simple. This concept helps to analyze important patterns from seemingly unimportant data; Such patterns are invaluable to organizations such as business firms, research firms etc.
- Marketing: finding groups of customers with similar behavior when given a large database of customer data containing their properties and past buying records
- Biology: classification of plants and animals once given their features
- Libraries: book ordering
- Insurance: identifying groups of motor insurance policy holders with a high average claim cost and also identifying frauds
- WWW: document classification and clustering weblog data to discover groups of similar access patterns
There are 4 TYPES of clustering algorithm classifications.
- Exclusive Clustering - In this method, data are grouped in an exclusive way; If a certain datum belongs to a definite cluster then it could not be included in another cluster
- Overlapping Clustering - This uses fuzzy sets to cluster data; Each point may belong to two or more clusters with different degrees of membership
- Hierarchical Clustering - Is based on the union between the two nearest clusters. Here, the beginning condition for clustering is realized by setting every datum as a cluster. After a few iterations, it reaches the final clusters wanted
- Probabilistic Clustering - It uses a completely probabilistic approach
The 4 most used clustering algorithms are as follows:
- K-means
- Fuzzy C-means
- Hierarchical clustering
- Mixture of Gaussians
K-means is an exclusive clustering algorithms whereas Fuzzy C-means is an overlapping clustering algorithm. Hierarchical clustering obviously belongs to the Hierarchical clustering algorithm. Mixture of Gaussians is classified under probabilistic clustering
No comments:
Post a Comment