Clustering helps find hidden patterns in your data by grouping similar things for you, such as customer personas. There are two types of clustering techniques:
- Partitioned: Each and every item is clustered in one, and only one, cluster. One such method is K-means.
- Hierarchical: Creates the clusters by continually grouping or breaking items.
How it works: How this algorithm operates is in its name. K is an input to the algorithm and refers to the number of buckets or clusters that will be created. Each item is assigned to a cluster based on the closest mean.
- Choosing the initial values is one of the toughest problems. Often, the algorithm might choose centroids as the initial k points.
- Then it clusters every point based on the closest mean.
- Then it evaluates if the initial values were the best central values.
- If not, then it reassigns the initial k points and repeats the process.
The details: Given that the algorithm is guessing the initial values, you can get different results for the same value of k.
There are two types of hierarchical clustering: agglomerative and divisive.
- Agglomerative clustering is bottom up, i.e., each item is its own cluster and then they are joined together to create larger clusters.
- Divisive clustering is top down, i.e., all items start in a single cluster and then are broken apart to create smaller clusters.
The details: Hierarchical clustering always produces the same results every time because the distance between items doesn’t change.