Survey of Clustering Algorithms (2005)

Table of Contents

Hierarchical Clustering #

BIRCH
- $O(N)$
- uses clustering feature (CF) tree
CURE
- motivation: arbitrary cluster shapes
- take “… a set of well-scattered points to represent each cluster …”
- shrunk them towards centroid (should weaken the effect of outliers)

hierarchical clustering: $O(N^2)$ time and space
complexity of $k$ -means is effectively linear for $N \gg k,d$
Random sampling
- to represent each cluster: CLARA = medoid, CURE = a set of well-scattered and center-shrunk points
- Chernoff bounds can be used to lower-bound the minimum sample size
Randomized search
- CLARANS
  - search in a graph, each node ~ a set of medoids
  - better than CLARA but still $O(N^2)$
Condensation-based
- BIRCH
  - “… generalized into a broader framework …”
    - BUBBLE, BUBBLE-FM
High dimensionality
- FC, DENCLUE successfully applied but still far from satisfactory
Incremental
- BFR

Algorithm	Complexity
CURE	$O(N^2_{sample} \log N_{sample})$ time, $O(N_{sample})$ space
DENCLUE	$O(N \log N)$ time
FC	$O(N)$ time
CLIQUE	Linear with the number of objects, quadratic with the number of dimensions
OptiGrid	Between $O(Nd)$ and $O(Nd \log N)$
ORCLUS	$O(K_0^3+K_0Nd+K_0^2d^3)$ time, $O(K_0d^2)$ space

Xu, Rui, and Donald Wunsch. “Survey of clustering algorithms.” IEEE Transactions on neural networks 16.3 (2005): 645-678.