Data Analytics • Dictionary

Clustering

Jul 1, 2023

Clustering helps find hidden patterns in your data by grouping similar things for you, such as customer personas. There are two types of clustering techniques:

Partitioned: Each and every item is clustered in one, and only one, cluster. One such method is K-means.
Hierarchical: Creates the clusters by continually grouping or breaking items.

K-means clustering

How it works: How this algorithm operates is in its name. K is an input to the algorithm and refers to the number of buckets or clusters that will be created. Each item is assigned to a cluster based on the closest mean.

Choosing the initial values is one of the toughest problems. Often, the algorithm might choose centroids as the initial k points.
Then it clusters every point based on the closest mean.
Then it evaluates if the initial values were the best central values.
If not, then it reassigns the initial k points and repeats the process.

The details: Given that the algorithm is guessing the initial values, you can get different results for the same value of k.

Hierarchical clustering

There are two types of hierarchical clustering: agglomerative and divisive.

Agglomerative clustering is bottom up, i.e., each item is its own cluster and then they are joined together to create larger clusters.
Divisive clustering is top down, i.e., all items start in a single cluster and then are broken apart to create smaller clusters.

The details: Hierarchical clustering always produces the same results every time because the distance between items doesn’t change.

Copied

Links to this Evergreen Note

None yet