**Regression Rings the Right Bell for Avon**

Avon Products, Inc. was having significant problems with the sales staff. The company’s business, dependent on sales representatives, was facing a shortage of sales reps without much hope of getting new ones. Regression models were developed to reveal the possible variables that were fueling this situation.

The models revealed that the most significant variable was the level of the appointment fee that reps pay for materials and second was the employee benefits. With data to back up its actions, the company lowered the fee.

The company also hired senior manager Michele Schneider to improve the way Avon informed new hires of their employee benefits program. Schneider revamped Avon’s benefits program information packet, which yielded an informative and easy way to navigate “Guide to Your Personal Benefits.” These changes resulted in an improvement in the recruitment and retention of sales reps. As of 2009, Avon was the world’s largest direct seller of beauty and related products, selling products in more than 100 countries

**Retailing Revolution**

Many retailing experts suggest that electronic shopping will be the next revolution in retailing. Whereas many traditional retailers experienced sluggish, single-digit sales growth in the 2000s, online sales records were off the charts. Although e-tailing continues to make up a very small portion of overall retail sales (less than 5 percent in 2009), the trend looks very promising for the future. A research project investigating this trend looked for correlates of consumers’ preferences for electronic shopping services. The explanation of consumers’ preferences was sought in psycho graphic, demographic, and communication variables suggested in the literature.

These examples illustrate some of the uses of regression analysis in determining which independent variables explain a significant variation in the dependent variable of interest, the structure and form of the relationship, the strength of the relationship, and predicted values of the dependent variable. Fundamental to regression analysis is an understanding of the product moment correlation

**Product Moment Correlation**

In marketing research, we are often interested in summarizing the strength of association between two metric variables, as in the following situations:

• How strongly are sales related to advertising expenditures?

• Is there an association between market share and size of the sales force?

• Are consumers’ perceptions of quality related to their perceptions of prices?

In situations like these, the product moment correlation, r, is the most widely used statistic, summarizing the strength of association between two metric (interval or ratio scaled) variables, say X and Y. It is an index used to determine whether a linear, or straight-line, relationship exists between X and Y. It indicates the degree to which the variation in one variable, X, is related to the variation in another variable, Y. Because it was originally proposed by Karl Pearson, it is also known as the Pearson correlation coefficient. It is also referred to as simple correlation, bi-variate.

• correlation, or merely the correlation coefficient. From a sample of n observations, X and y, the .A product moment correlation, r, can be calculated as:

Division of the numerator and denominator by n – I gives

In these equations, X and f denote the sample means, and Sx and Sy the standard deviations. COV.ry’ the covariance between X and Y, measures the extent to which X and Yare related. The covariance may be either positive or negative. Division by s~)” achieves standardization, so that r varies between -1.0 and 1.0.Thus, correlation is a special case of covariance, and is obtained when the data are standardized. Note that the correlation coefficient is an absolute number and is not expressed in any unit of measurement. The correlation coefficient between two variables will

be the same regardless of their underlying units of measurement

As an example, suppose a researcher wants to explain attitudes toward a respondent’s city of residence in terms of duration of residence in the city. The attitude is measured on an l l-point.scale (1 = do not like the city, 11 = very much like the city), and the duration of residence is measured in terms of the number of years the respondent has lived in the city. In addition, importance attached to the weather is also measured on an II-point scale (I = not important, I I = very important). In a

pretest of 12 respondents, the data shown in Table 17.1 are obtained. For illustrative purposes, we consider only a small number of observations so that we can show the calculations by hand. In actual practice, correlation and regression analyses are performed on a much larger sample such as that in the Dell running case and other cases with real data that are presented in this book. The correlation coefficient may be calculated as follows:

In conducting multivariate data analysis, it is often useful to examine the simple correlation between each pair of variables. These results are presented in the form of a correlation matrix, which indicates the coefficient of correlation between each pair of variables. Usually, only the lower triangular portion of the matrix is considered. The diagonal elements all equal 1.00, because a variable correlates perfectly with itself. The upper triangular portion of the matrix is a mirror image of the lower triangular portion, because r is a symmetric measure of association. The form of a correlation matrix for five variables, VI through VS’ is as follows

Although a matrix of simple correlations provides insights into pairwise associations, sometimes researchers want to examine the association between two variables after controlling’ for one or more other variables. In the latter case, partial correlation should be estimated

**Partial Correlation**

Whereas the product moment or simple correlation is a measure of association describing the linear association between two variables, a partial correlation coefficient measures the association between two variables after controlling for or adjusting for the effects of one or more additional variables. This statistic is used to answer the following questions

• How strongly are sales related to advertising expenditures when the effect of price is controlled?

• Is there an association between market share and size of the sales force after adjusting for the effect of sales promotion?

• Are consumers’ perceptions of quality related to their perceptions of prices when the effect of brand image is controlled

As in these situations, suppose one wanted to calculate the association between X and Y after controlling for a third variable, Z. Conceptually, one would first remove the effect of Z from X. To do this, one would predict the values of X based on a knowledge of Z by using the product moment correlation between.X and Z. The predicted value of X is then subtracted from the actual value of X to construct an adjusted value of X. In a similar manner, the values of Y are adjusted to remove the effects of Z. The product moment correlation between the adjusted values of X and the adjusted values of Y is the partial correlation coefficient between X and Y. after controlling for the effect of Z. and is denoted by r,,:o. Statistically, because the simple correlation between two variables completely describes the linear relationship between them, the partial correlation coefficient can be calculated by a know ledge of the simple correlations alone, without using individual observations.

To continue our example, suppose the researcher wanted to calculate the association between attitude toward the city, Y. and duration of residence, XI’ after controlling for a third variable, importance attached to weather, These data are presented in Table 17.1. The simple correlations between the variables are:

The required partial correlation is calculated as follows:

As can be seen, controlling for the effect of importance attached to weather has little effect on the association between attitude toward the city and duration of residence. Thus, regardless of the importance they attach to weather, those who have stayed in a city longer have more favorable attitudes toward the city and vice versa

Partial correlations have an order associated with them. The order indicates how many variables are being adjusted or controlled. The simple correlation coefficient, r. has a zero-order, as it does not control for any additional variables when measuring the association between two variables. The coefficient is a first-order partial correlation coefficient, as it controls for the effect of one additional variable, Z. A second-order partial correlation coefficient controls for the effects of two variables, a third-order for the effects of three variables, and so on. The higher-order partial correlations are calculated similarly. The (n + I)th-order partial coefficient may be calculated by replacing the simple correlation coefficients on the right side of the preceding equation with the nth-order partial coefficients.

Partial correlations can be helpful for detecting spurious relationships . The relationship between X and Y is spurious if it is solely due to the fact that X is associated with Z. which is indeed the true predictor of Y. In this case, the correlation between X and Y disappears when the effect of Z is controlled. Consider a case in which consumption of a cereal brand (C) is positively associated with income (I), with rei = 0.28. Because this brand was popularly priced, income was not expected to be a significant factor. Therefore, the researcher suspected that this relationship was spurious. The sample results also indicated that income is positively associated with household size (H), = 0.48, and that household size is associated with cereal consumption, = 0.56. These figures seem to indicate that the real predictor of cereal consumption is not income but household size. To test this assertion, the first-order partial correlation between cereal consumption and income is calculated, controlling for the effect of household size. The reader can verify that this partial correlation, is 0.02, and the initial correlation between cereal consumption and income vanishes when the household size is controlled. Therefore, the correlation between cornfield and cereal consumption is spurious. The special case when a partial correlation is larger than its respective zero-order correlation involves a suppressor effect

The part correlation between attitude toward the city and the duration of residence. when the linear effects of the importance attached to weather have been removed from the duration of residence. can be calculated as