Survey of Clustering Algorithms (2005)
·2 mins
Table of Contents
- “Clustering has a long history, with lineage dating back to Aristotle [124].”
- Curse of dimensionality – Bellman
Hierarchical Clustering #
- BIRCH
- uses clustering feature (CF) tree
- CURE
- motivation: arbitrary cluster shapes
- take “… a set of well-scattered points to represent each cluster …”
- shrunk them towards centroid (should weaken the effect of outliers)
Clustering Large-Scale Data Sets #
- hierarchical clustering: time and space
- complexity of -means is effectively linear for
- Random sampling
- to represent each cluster: CLARA = medoid, CURE = a set of well-scattered and center-shrunk points
- Chernoff bounds can be used to lower-bound the minimum sample size
- Randomized search
- CLARANS
- search in a graph, each node ~ a set of medoids
- better than CLARA but still
- CLARANS
- Condensation-based
- BIRCH
- “… generalized into a broader framework …”
- BUBBLE, BUBBLE-FM
- “… generalized into a broader framework …”
- BIRCH
- High dimensionality
- FC, DENCLUE successfully applied but still far from satisfactory
- Incremental
- BFR
Algorithms with the capability of tackling high-dimensional data #
Algorithm | Complexity |
---|---|
CURE | time, space |
DENCLUE | time |
FC | time |
CLIQUE | Linear with the number of objects, quadratic with the number of dimensions |
OptiGrid | Between and |
ORCLUS | time, space |
Xu, Rui, and Donald Wunsch. “Survey of clustering algorithms.” IEEE Transactions on neural networks 16.3 (2005): 645-678.