Figure 20.4 is a classification of clustering procedures. Clustering procedures can be hierarchical, nonhierarchical, or other procedures. Hierarchical clustering is characterized by the development of a hierarchy or tree-like structure. Hierarchical methods can be agglomerative or divisive. Agglomerative clustering starts with each object in a separate cluster. Clusters are formed by grouping objects into bigger and bigger clusters. This process is continued until all objects are members of a single cluster.Divisive clustering starts with all the objects grouped in a single cluster.Clusters are divided or split until each object is in a separate cluster.
Agglomerative methods are commonly used in marketing research. They consist of linkage methods, error sums of squares or variance methods, and centroid methods. Linkage methods include single linkage, complete linkage, and average linkage. The single linkage method is based on minimum distance or the nearest neighbor rule. The first two objects clustered are those that have the smallest distance between them. The next shortest distance is identified, and either the third object is clustered with the first two, or a new two-object cluster is formed. At every stage, the distance between two clusters is the distance between their two closest points.
Two clusters are merged at any stage by the single shortest link between them. This process is continued until all objects are in one cluster. The single linkage method does not work well when the clusters are poorly defined. The complete linkage method is similar to single linkage, except that it is based on the maximum distance or the furthest neighbor approach. Incomplete linkage, the distance between two clusters is calculated as the distance between their two furthest points. The average linkage method works similarly. However, in this method, the distance between two clusters is defined as the average of the distances between all pairs of objects, where one member of the pair is from each of the clusters (Figure 20.5). As can be seen, the average linkage method uses information on all pairs of distances, not merely the minimum or maximum distances. For this reason, it is usually preferred to the single and complete linkage methods
The variance methods attempt to generate clusters to minimize the within-cluster variance. A commonly used variance method is Ward’s procedure. For each cluster, the means for all the variables are computed. Then, for each object, the squared euclidean distance to the cluster means is calculated (Figure 20.6). These distances are summed for all the objects. At each stage, the two clusters with the smallest increase in the overall sum of squares within-cluster distances are combined. In the centroid methods, the distance between two clusters is the distance between their centroids (means for all the variables), as shown in Figure 20.6. Clusters are generated so as to maximize the distances between the centers or centroids of clusters. Every time objects are grouped, a new centroid is computed. Of the hierarchical methods, average linkage and Ward’s methods have been shown to perform better than the other procedures.
The second type of clustering procedures, the nonhierarchical clustering method, is frequently referred to as k-means clustering. These methods include sequential threshold, parallel threshold, and optimizing partitioning. In the sequential threshold method, a cluster center is selected and all objects within a prespecified threshold value from the center are grouped together. Then a new cluster center or seed is selected, and the process is repeated for the unclustered points. Once an object is clustered with a seed, it is no longer considered for clustering with subsequent seeds. The parallel threshold method operates similarly, except that several cluster centers are selected simultaneously, and objects within the threshold level are grouped with the nearest center. The optimizing partitioning method differs from the two threshold procedures in that objects can later be reassigned to clusters to optimize an overall criterion, such as average within-cluster distance for a given number of clusters.
Two major disadvantages of the nonhierarchical procedures are that the number of clusters must be prespecified and the selection of cluster centers is arbitrary. Furthermore, the clustering results may depend on how the centers are selected. Many nonhierarchical programs select the first k (k = number of clusters) cases without missing values as initial cluster centers. Thus, the clustering results may depend on the order of observations in the data. Yet nonhierarchical .~ clustering is faster than hierarchical methods and has merit when the number of objects or observations is large. It has been suggested that the hierarchical and nonhierarchical methods be used in tandem. First, an initial clustering solution is obtained using a hierarchical procedure, such as average linkage or Ward’s. The number of clusters and cluster centroids so obtained are used as inputs to the optimizing partitioning method
Other clustering procedures are also available; one of particular interest is TwoStep cluster analysis. This procedure can automatically determine the optimal number of clusters by comparing the values of model choice criteria across different clustering solutions. It also has the ability to create cluster models based on categorical and continuous variables. In addition to euclidean distance, the TwoStep procedure also uses the log-likelihood measure. The log-likelihood measure places a probability distribution on the variables. It also accommodates two clustering criteria: Schwarz’s Bayesian Information Criterion (lHC) or the Akaike Information Criterion (AlC).
Choice of a clustering method and choice of a distance measures are interrelated. For example, squared euclidean distances should be used with the Ward’s and centroid methods. Several nonhierarchical procedures also use squared euclidean distances. In the TwoStep procedure, the euclidean measure can be used only when all of the variables are continuous.
We will use the Ward’s procedure to illustrate hierarchical clustering. The output obtained by clustering the data of Table 20.1 is given in Table 20.2. Useful information is contained in the “Agglomeration Schedule”, which shows the number of cases or clusters being combined at each stage. The first line represents.stage I, with 19 clusters. Respondents 14 and 16 are combined at this stage, as shown in the columns labeled “Clusters Combined.” The squared euclidean distance between these two respondents is given under the column labeled “Coefficients.” The column entitled “Stage Cluster First Appears” indicates the stage at which a cluster is first formed. To illustrate, an entry of 1 at stage 6 indicates that respondent 14 was first grouped at stage 1. The last column, “Next Stage,” indicates the stage at which another case (respondent) or cluster is combined with this one. Because the number in the first line of the last column is 6, we see that at stage 6, respondent 10 is combined with 14 and 16to form a single cluster. Similarly, the second line represents stage 2 with 18 clusters. In stage 2, respondents 6 and 7 are grouped together.
Another important part of the output is contained in the icicle plot given in Figure 20.7. The columns correspond to the objects being clustered, in this case respondents labeled I through 20. The rows correspond to the number of clusters. This figure is read from bottom to top. At first, all cases are considered as individual clusters. Because there are 20 respondents, there are 20 initial clusters. At the first step, the two closest objects are combined, resulting in 19 clusters. The last row of Figure 20.7 shows these 19 clusters. The two cases, respondents 14 and 16, which have been combined at this stage, have in the column between them, all Xs in rows I through 19. Row number 18 corresponds to the next stage, with 18 clusters. At this stage, respondents 6 and 7 are grouped together. The column of Xs between respondents 6 and 7 has a blank in row 19. Thus, at this stage there are 18 clusters; 16 of them consist of individual respondents, and two contain two respondents each. Each subsequent step leads to the formation of a new cluster in one of three ways: (1) two individual cases are grouped together, (2) a case is joined to an already existing cluster, or (3) two clusters are grouped together
Another graphic device that is useful in displaying clustering results is the dendrogram (see Figure 20.8). The dendrogram is read from left to right. Vertical lines represent clusters that are joined together. The position of the line on the scale indicates the distances at which clusters were joined. Because many of the distances in the early stages are of similar magnitude, it is difficult to tell the sequence in which some of the early clusters are formed. However, it is clear that in the last two stages, the distances at which the clusters are being combined are large. This information is useful in deciding on the number of clusters.
It is also possible to obtain information on cluster membership of cases if the number of clusters is specified. Although this information can be discerned from the icicle plot, a tabular display is helpful. Table 20.2 contains the “Cluster Membership” for the cases, depending on whether the final solution contains two, three, or four clusters. Information of this type can be obtained for any number of clusters and is useful for deciding on the number of clusters.