Some General Comments on Cluster Analysis
Cluster analysis is one of those techniques that is very attractive to both students and researchers alike. The idea behind cluster analysis is very simple, that is, to identify groupings or clusters of individuals, using multiple variables, that are not readily aparent to the researcher. The figure below gives a simplistic example of two clusters defined by two variables.
The problem with cluster analysis is that in all but the simplest of cases uniquely defined clusters may not exist. Cluster analysis is a collection of techniques and algorithms which often classify the same observations into completely different groupings. For example, cluster analysis tends to be good at finding spherical cluster and has great difficulty curved clusters, as in the example below, even though humans easily discern the two clusters.
Another issue to be aware of is that cluster analysis treats all variables as being equally important in determings cluster membership.
Nick Cox Comments on Cluster Analysis
In response to a question on which method of cluster is best, Nick Cox of University of Durham commented:
What I call the "classification crunch" can be formulated as follows. If data have very clear-cut group or cluster structure, very simple methods suffice to identify it, principally scatter plots (or possibly biplots) based on the most important variables or constructed variables (e.g. principal components). If data do not have such structure, it is possible to waste vast amounts of time and effort working through part-contradictory, part-complementary results from minutely different analyses (this measure of dissimilarity rather than that, etc.). That's a cynical formulation, but I await examples of interesting structure identified by cluster analyses (and I don't include purely visual methods).
The following quote is from the Stata Reference Manual in the section on cluster analysis:
It has been said that there are as many cluster analysis methods as there are people performing cluster analysis. This is a gross understatement! There are infinitely more ways to perform a cluster analysis than people who perform them.
Multivariate Course Page
Phil Ender, 21Jan05