The steps involved in conducting discriminant analysis consist of formulation, estimation, determination of significance, interpretation, and validation (see Figure 18.2). These steps are discussed and illustrated within the context of two-group discriminant analysis. Discriminant analysis with more than two groups is discussed later in this chapter
Formulate the Problem
The first step in discriminant analysis is to formulate the problem by identifying the objectives, the criterion variable, and the independent variables. The criterion variable must consist of two or more mutually exclusive and collectively exhaustive categories. When the dependent variable is interval or ratio scaled, it must first be converted into categories. For example, attitude toward the brand, measured on a 7-point scale, could be categorized as unfavorable (1,2,3), neutral (4), or favorable (5, 6, 7). Alternatively, one could plot the distribution of the dependent variable and form groups of equal size by determining the appropriate cutoff points for each category. The predictor variables should be selected based on a theoretical model or previous research, or, in the case of exploratory research, the experience of the researcher should guide their selection
Often the distribution of the number of cases in the analysis and validation samples follows the distribution in the total sample. For instance, if the total sample contained 50 percent loyal and 50 percent nonloyal consumers, then the analysis and validation samples would each contain 50 percent loyal and 50 percent nonloyal consumers. On the other hand, if the sample contained 25 percent loyal and 75 percent nonloyal consumers, the analysis and validation samples would be selected to reflect the same distribution (25 percent versus 75 percent).
Finally, it has been suggested that the validation of the discriminant function should be conducted repeatedly. Each time, the sample should be split into different analysis and validation parts. The discriminant function should be estimated and the validation analysis carried out. Thus, the validation assessment is based on a number of trials. More rigorous methods have also been suggested,”
To better illustrate two-group discriminant analysis, let us look at an example. Suppose we want to determine the salient characteristics of families that have visited a vacation resort during the last two years. Data were obtained from a pretest sample of 42 households. Of these, 30 households shown in Table 18.2 were included in the analysis sample and the remaining 12 shown in Table 18.3 were part of the validation sample. For illustrative purposes, we consider “‘, only a small number of observations. In actual practice, discriminant analysis is performed on a much larger sample such as that in the Dell running case and other cases with real data that are
presented in this book.
The households that visited a resort during the last two years are coded as I; those that did not, as 2 (VISIT). Both the analysis and validation samples were balanced in terms of VISIT. As can be seen, the analysis sample contains 15 households in each category,
whereas the validation sample has six in each category. Data were also obtained on annual family income (INCOME), attitude toward travel (TRAVEL, measured on a 9-point scale), importance attached to family vacation (VACATION, measured on a 9-point scale), household size (HSIZE), and age of the head of the household (AGE).
Determine the Significance of the Discriminant Function
It would not be meaningful to interpret if the discriminant functions estimated were nor statistically significant. The null hypoi! csis that, in the of all discriminant functions in all groups are equal can be statistically tested. In SPSS, this test is based on Wilks’ A. If several functions are tested simultaneously (as in the case of multiple discriminant
analysis), the Wilks’ A statistic is the product of the univariate A for each function. The significance level is estimated based on a chi-square transformation of the statistic. In testing for significance in the vacation resort example (see Table 18.4), it may be noted that the Wilks’ A associated with the function is 0.3589, which transforms to a chi-square of 26.13 with 5 degrees of freedom. This is significant beyond the 0.05 level. In SAS approximate F statistic, based on an approximation to the distribution of the likelihood ratio, is calculated. A test of significance is not available in MINITAB. If the null hypothesis is rejected. indicating significant discrimination. one can proceed to interpret the results.
Interpret the Results
The interpretation of the discriminant weights, or coefficients, is similar to that in multiple regression analysis. The value of the coefficient for a particular predictor depends on the other predictors included in the discriminant function. The signs of the coefficients are arbitrary, but they indicate which variable values result in large and small function values and associate them with particular groups
Some idea of the relative importance of the predictors can also be obtained by examining the structure correlations, also called canonical loading or discriminant loading. These – simple correlations between each predictor and the discriminant function represent the variance that the predictor shares with the function. The greater the magnitude of a structure correlation, the more important the corresponding predictor. Like the standardized coefficients, these correlations must also be interpreted with caution
The un standardized discriminant function coefficients are also given.These can be applied to the raw values of the variables in the holdout set for classification purposes. The group centroids, giving the value of the discriminant function evaluated at the group means, are also shown. Group I, those who have visited a resort, has a positive value (1.29118), whereas group 2 has an equal negative value. The signs of the coefficients associated with all the predictors are positive. This suggests that higher family income, larger household size, more importance attached to family vacation, more favorable attitude toward travel, and older heads of households are more likely to result in the family visiting the resort. It would be reasonable to develop a profile of the two groups in terms of the three predictors that seem to be the most important: income, household size, and importance of vacation. The values of these three variables for the two groups are given at the beginning of Table 18.4.
The determination of relative importance of the predictors is further illustrated by the following example