Skip to main content
  1. Paper Notes/

Survey of Clustering Algorithms (2005)

·2 mins
  • “Clustering has a long history, with lineage dating back to Aristotle [124].”
  • Curse of dimensionality – Bellman

Hierarchical Clustering #

  • BIRCH
    • O(N)O(N)
    • uses clustering feature (CF) tree
  • CURE
    • motivation: arbitrary cluster shapes
    • take “… a set of well-scattered points to represent each cluster …”
    • shrunk them towards centroid (should weaken the effect of outliers)

Clustering Large-Scale Data Sets #

  • hierarchical clustering: O(N2)O(N^2) time and space
  • complexity of kk-means is effectively linear for Nk,dN \gg k,d
  • Random sampling
    • to represent each cluster: CLARA = medoid, CURE = a set of well-scattered and center-shrunk points
    • Chernoff bounds can be used to lower-bound the minimum sample size
  • Randomized search
    • CLARANS
      • search in a graph, each node ~ a set of medoids
      • better than CLARA but still O(N2)O(N^2)
  • Condensation-based
    • BIRCH
      • “… generalized into a broader framework …”
        • BUBBLE, BUBBLE-FM
  • High dimensionality
    • FC, DENCLUE successfully applied but still far from satisfactory
  • Incremental
    • BFR

Algorithms with the capability of tackling high-dimensional data #

AlgorithmComplexity
CUREO(Nsample2logNsample)O(N^2_{sample} \log N_{sample}) time, O(Nsample)O(N_{sample}) space
DENCLUEO(NlogN)O(N \log N) time
FCO(N)O(N) time
CLIQUELinear with the number of objects, quadratic with the number of dimensions
OptiGridBetween O(Nd)O(Nd) and O(NdlogN)O(Nd \log N)
ORCLUSO(K03+K0Nd+K02d3)O(K_0^3+K_0Nd+K_0^2d^3) time, O(K0d2)O(K_0d^2) space

Xu, Rui, and Donald Wunsch. “Survey of clustering algorithms.” IEEE Transactions on neural networks 16.3 (2005): 645-678.