Marketing researchers are often interested in examining the differences in the mean values of the dependent variable for several categories of a single independent variable or factor. For example:

• Do the various segments differ in terms of their volume of product consumption?

• Do the brand evaluations of groups exposed to different commercials vary?

• Do retailers, wholesalers, and agents differ in their attitudes toward the firm’s distribution policies?

• How do consumers intentions to buy the brand vary with different price levels?

• What is the effect of consumers’ familiarity with the store (measured as high, medium, and low) on preference for the store?

The answers to these and similar questions can be determined by conducting one-way analysis of variance. Before describing the procedure, we define the important statistics associated with one-way analysis of variance.

**Statistics Associated with One-Way Analysis of Variance**

The strength of the effects of X (independent variable or factor) on Y(dependent variable) is measured by, The value of varies between 0 and 1.

**Fstatistic.** The null hypothesis that the category means are equal in the population is tested by an F statistic based on the ratio of mean square related to X and mean square related to error.

**Mean square.** The mean square is the sum of squares divided by the appropriate degrees of freedom.

**SSbetween·** Also denoted as 88″, this is the variation in Y related to the variation in the means of the categories of X. This represents variation between the categories of X, or the portion of the sum of squares in Y related to X.

**SSwithin·** Also referred to as SSerror, this is the variation in Y due to the variation within each of the categories of X. This variation is not accounted for by X.

**SSγ,.** The total variation in Y is SSγ.

**Conducting One-Way Analysis of Variance**

The procedure for conducting one-way analysis of variance is described in Figure 16.2. It involves identifying the dependent and independent variables, decomposing the total variation, measuring effects, testing significance, and interpreting results. We consider these steps in detail and illustrate them with some application.

**Identify the Dependent and Independent Variables**

The dependent variable is denoted by Y and the independent variable by X. X is a categorical variable having c categories, There are n observations on Y for each category of X, as shown in

Table 16.1. As can be seen, the sample size in each category of X is n, and the total sample size N = n x c. Although the sample sizes in the categories of X (the group sizes) are assumed to be equal for the sake of simplicity, this is not a requirement.

**Decompose the Total Variation**

In examining the differences among means, one-way analysis of variance involves the decomposition of the total variation observed in the dependent variable, This variation is measured by the sums of squares corrected for the mean (SS). Analysis of variance is so named because it examines the variability or variation in the sample (dependent variable) and, based on the variability, determines whether there is reason to believe that the population means differ.

The total variation in Y.denoted by SSγ can be decomposed into two components: SSy = SSbetween+ SSwithin where the subscripts between and within refer to the categories of X. SSbetween is the variation in Y related to the variation in the means of the categories of X. It represents variation between the categories of X. In other words, SSbetween is the portion of the sum of squares in Y related to the independent variable or factor X. For this reason, SSbetween is also denoted as SSx SSwithin is the variation in Y related to the variation within each category of X. SSwithin is not accounted for by X. Therefore it is referred to as SSerror The total variation in Y may be decomposed as:

The logic of decomposing the total variation in Y, SSy into SSbetween and SSwithin in order to examine differences in group means can be intuitively understood that if the variation of the variable in the population was known or estimated. one could estimate how much the sample mean should vary because of random variation alone. In analysis of variance, there are several different groups (e.g., heavy, medium, light, and nonusers). If the null hypothesis is true and all the groups have the same mean in the population, one can estimate how much the sample means should vary because of sampling (random) variations alone. If the observed variation in the sample means is more than what would be expected by sampling variation, it is reasonable to conclude that this extra variability is related to differences in group means in the population.

In analysis of variance, we estimate two measures of variation: within groups (SSwithin) and between groups (SSbetween) Within-group variation is a measure of how much the observations, Y values, within a group vary. This is used to estimate the variance within a group in the population. It is assumed that all the groups have the same variation in the population. However, because it is not known that all the groups have the same mean, we cannot calculate the variance of all the observations together. The variance for each of the groups must be calculated individually, and these are combined into an “average” or “overall” variance. Likewise, another estimate of the variance of the Y values may be obtained by examining the variation between the means (This process is the reverse of determining the variation in the means, given the population variances.) If the population mean is the same in all the groups, then the variation in the sample means and the sizes of the sample groups can be used to estimate the variance of Y. The reasonableness of this estimate of the Y variance depends on whether the null hypothesis is true. If the null hypothesis is true and the population means are equal, the variance estimate based on between-group variation is correct. On the other hand, if the groups have different means in the population. the variance estimate based on between-group variation will be too large. Thus, by comparing the Y variance estimates based on between-group nd within-group variation, we can test the null hypothesis, Decomposition of the total variation in this manner also enables us to measure the effects of X on Y.

**Measure the Effects**

The effects of X on Y are measured by SS), Because SS, is related to the variation in the means of the categories of X, the relative magnitude of SS, increases as the differences among the means of Y in the categories of X increase. The relative magnitude of SS, also increases as the variations in Y within the categories of X decrease. The strength of the effects of X on Y are measured as follows:

The value of varies between 0 and 1. It assumes a value of 0 when all the category means are equal, indicating that X has no effect on Y. The value will be 1 when there is no variability within each category of X but there is some variability between categories, a measure of the variation in Y that is explained by the independent variable X. Not only can we measure the effects of X on Y but we can also test for their significance.

**Test the Significance**

In one-way analysis of variance, the interest lies in testing the null hypothesis that the category means are equal in the population, In other words,

Under the null hypothesis, SSx and SSerror from the same source of variation. In such a case, the estimate of the population variance of Y can be based on either between-category variation or within-category variation. In other words, the estimate of the population variance of Y.

When all the category means are equal, indicating that X has no effect on Y, When there is no variability within each category of X but there is some variability between categories is a measure of the variation in Y that is explained by the independent variable X. Not only can we measure the effects of X on Y.but we can also test for their significance.

**Test the Significance**

In one-way analysis of variance, the interest lies in testing the null hypothesis that the category means are equal in the population, In other words,

Under the null hypothesis, SSx and SSerror come from the same source of variation. In such a case, the estimate of the population variance of Y can be based on either between-category variation or within-category variation. In other words, the estimate of the population variance of Y.

The null hypothesis may be tested by the F statistic based on the ratio between these two estimates:

This statistic follows the F distribution, with (e – 1) and (N – c) degrees of freedom (df), A table of the F distribution is given as Table 5 in the Statistical Appendix at the end of the book, the F distribution is a probability distribution of the ratios of sample variances. It is characterized by degrees of freedom for the numerator and degrees of freedom for the denominator.

If the null hypothesis of equal category means is not rejected, then the independent variable does not have a significant effect on the dependent variable. On the other hand, if the null hypothesis is rejected, then the effect of the independent variable is significant. In other words, the mean value of the dependent variable will be different for different categories of the independent variable. A comparison of the category mean values will indicate the nature of the effect of the independent variable, other salient issues in the interpretation of results, such as examination of differences among specific means.

**Illustrative Data**

We illustrate the concepts discussed in this chapter using the data presented in Table 16.2. For illustrative purposes, we consider only a small number of observations. In actual practice: analysis of variance is performed on a much larger sample such as that in the Dell running case and other cases with real data that are presented in this book. These data were generated by an experiment in which a major department store chain wanted to examine the effect of the level of in-store promotion and a store wide coupon on sales. In-store promotion was varied at three levels: high (I), medium (2), and low (3). Couponing was manipulated at two levels. Either a $20 store wide coupon was distributed to potential shoppers. (denoted by I) or it was not (denoted by 2 in Table 16.2). In-store promotion and couponing were crossed, resulting in a 3 X 2 design with six cells. Thirty stores were randomly selected, and five stores were randomly assigned to each treatment condition, as shown in Table 16.2. The experiment was run for two monlhs. Sales in each store were measured, normalized to account for extraneous factors (store size, traffic, etc.), and converted to a 1-to-10 scale. In addition, a qualitative assessment was made of the relative affluence of the clientele of each store, again using a 1-to-10 scale. In these scales, higher numbers denote higher sales or more affluent clientele.