Survey of Clustering Algorithms (2005)
·2 mins
Table of Contents
- “Clustering has a long history, with lineage dating back to Aristotle [124].”
- Curse of dimensionality – Bellman
Hierarchical Clustering #
- BIRCH
- $O(N)$
- uses clustering feature (CF) tree
- CURE
- motivation: arbitrary cluster shapes
- take “… a set of well-scattered points to represent each cluster …”
- shrunk them towards centroid (should weaken the effect of outliers)
Clustering Large-Scale Data Sets #
- hierarchical clustering: $O(N^2)$ time and space
- complexity of $k$-means is effectively linear for $N \gg k,d$
- Random sampling
- to represent each cluster: CLARA = medoid, CURE = a set of well-scattered and center-shrunk points
- Chernoff bounds can be used to lower-bound the minimum sample size
- Randomized search
- CLARANS
- search in a graph, each node ~ a set of medoids
- better than CLARA but still $O(N^2)$
- CLARANS
- Condensation-based
- BIRCH
- “… generalized into a broader framework …”
- BUBBLE, BUBBLE-FM
- “… generalized into a broader framework …”
- BIRCH
- High dimensionality
- FC, DENCLUE successfully applied but still far from satisfactory
- Incremental
- BFR
Algorithms with the capability of tackling high-dimensional data #
Algorithm | Complexity |
---|---|
CURE | $O(N^2_{sample} \log N_{sample})$ time, $O(N_{sample})$ space |
DENCLUE | $O(N \log N)$ time |
FC | $O(N)$ time |
CLIQUE | Linear with the number of objects, quadratic with the number of dimensions |
OptiGrid | Between $O(Nd)$ and $O(Nd \log N)$ |
ORCLUS | $O(K_0^3+K_0Nd+K_0^2d^3)$ time, $O(K_0d^2)$ space |
Xu, Rui, and Donald Wunsch. “Survey of clustering algorithms.” IEEE Transactions on neural networks 16.3 (2005): 645-678.