Although answers to questions related to a single variable are interesting, they often raise additional questions about how to link that variable to other variables. To introduce the frequency distribution, we posed several representative marketing research questions. For each of these, a researcher might pose additional questions to relate these variables to other variables. For example:
• How many brand-loyal users are males?
• Is product use (measured in terms of heavy users, medium users, light users, and nonusers) related to interest in outdoor activities (high, medium, and low)?
• Is familiarity with a new product related to age and education levels?
• Is product ownership related to income (high, medium, and low)?
The answers to such questions can be determined by examining cross-tabulations. Whereas a frequency distribution describes one variable at a time, a cross-tabulation describes two or more variables simultaneously. A cross-tabulation is the merging of the frequency distribution of two or more variables in a single table. It helps us to understand how one variable such as brand loyalty relates to another variable such as sex. Cross-tabulation results in tables that reflect the joint distribution of two or more variables with a limited number of categories or distinct values. The categories of one variable are cross-classified with the categories of one or more other variables.
Thus, the frequency distribution of one variable is subdivided according to the values or categories of the other variables. Suppose we are interested in determining whether Internet usage is related to sex. For the purpose of cross-tabulation. respondents are classified as light or heavy users. Those reporting 5 hours or less usage are classified as light users, and the remaining are heavy users. The cross tabulation is shown in Table 153. A cross-tabulation includes a cell for every combination of the categories of the two variables. The number in each cell shows.how many respondents gave that combination of responses. In Table 15.3, 10 respondents were females who reported light Internet usage. The marginal totals in this table indicate that of the 30 respondents with valid responses on both the variables, 15 reported light usage and 15 were heavy users. In terms of sex.
15 respondents were females and 15 were males. Note that this information could have been obtained from a separate frequency distribution for each variable. In general. the margins of a cross-tabulation show the same information as the frequency tables for each of the variables. Cross-tabulation tables are also called contingency tables. The data are considered to be qualitative or categorical data, because each variable is assumed to have only a nominal scale,”
Cross-tabulation is widely used in commercial marketing research, because (1) cross-tabulation analysis and results can be easily interpreted and understood by managers who are not statistically -oriented; (2) the clarity of interpretation provides a stronger link between research results and managerial action; (3) a series of cross-tabulations may provide greater insights into a complex phenomenon than a single multivariate analysis; (4) cross-tabulation may alleviate the problem of sparse cells, which could be serious in discrete multivariate analysis; and (5) cross-tabulation analysis is simple to conduct and appealing to less sophisticated researchers.”
Cross-tabulation with two variables is also known as bivariate cross-tabulation. Consider again the cross-classification of Internet usage with sex given in Table 15.3. Is usage related to sex? It appears to be from Table 15.3. We see that disproportionately more of the respondents who are males are heavy Internet users as compared to females. Computation of percentages can provide more insights.
Because two variables have been cross-classified, percentages could be computed either column wise, based on column totals (Table 15.4), or row wise, based on row totals (Table 15.5). Which of these tables is more useful? The answer depends on which variable will be considered as the independent variable and which as the dependent variable. The general rule is to compute the percentages in the direction of the independent variable, across the dependent variable. In our, analysis, sex may be considered as the independent variable and Internet usage as the dependent variable, and the correct way of calculating percentages is as shown in Table 15.4. Note that whereas 66.7 percent of the males are heavy users, only 33.3 percent of females fall into this category. This seems to indicate that males are more likely to be heavy users of the Internet as compared to females.
Note that computing percentages in the direction of the dependent variable across the independent variable, as shown in Table 155, is not meaningful in this case. Table 155 implies that heavy Internet usage causes people to be males. This latter finding is implausible. It is possible, however, that the association between Internet usage and sex is mediated by a third variable, such as age or income. This kind of possibility points to the need to examine the effect of a third variable.
Often the introduction of a third variable clarifies the initial association (or lack of it) observed between two variables. As shown in Figure 15.7,the introduction of a third variable can result in four possibilities.
1. It can refine the association observed between the two original variables.
2. It can indicate no association between the two variables, although an association was initially observed. In other words, the third variable indicates that the initial association between the two variables was spurious,
3. It can reveal some association between the two variables, although no association was initially observed. In this case. the third variable reveals a suppressed association between the first two variables: a suppressor effect.
4. It can indicate no change in the initial association.
These cases are explained with examples based on a sample of 1,000 respondents. Although these examples are contrived to illustrate specific cases, such cases are not uncommon in commercial marketing research.
REFINE AN INITIAL RELATIONSHIP An examination of the relationship between the purchase of fashion clothing and marital status resulted in the data reported in Table 15.6. The respondents were classified into either high or low categories based on their purchase of fashion clothing. Marital status was also measured in terms of two categories: currently married or unmarried. As can be seen from Table 15.6 52 percent of unmarried respondents fell in the high-purchase category. as opposed to 31 percent of the married respondents. Before concluding that unmarried respondents purchase more fashion clothing than those who are married, a third variable, the buyer’s sex, was introduced into the analysis.
The buyer’s sex was selected as the third variable based on past research. The relationship between purchase of fashion clothing and marital status was reexamined in light of the third variable, as shown in Table 15.7. In the case of females, 60 percent of the unmarried fall in the high purchase category, as compared to 25 percent of those who are married. On the other hand, the percentages are much closer for males, with 40 percent of the unmarried and 35 percent of the married falling in the high-purchase category. Hence, the introduction of sex (third variable) has refined the relationship between marital status and purchase of fashion clothing (original variables). Unmarried respondents are more likely to fall in the high-purchase category than married ones, and this effect is much more pronounced for females than for males.
INITIAL RELATIONSHIP WAS SPURIOUS A researcher working for an advertising agency promoting a line of automobiles costing more than $30,000 was attempting to explain the ownership of expensive automobiles (see Table 15.8). The table shows that 32 percent of those with college degrees own an expensive automobile, as compared to 21 percent of those without college degrees. The researcher was tempted to conclude that education influenced ownership of expensive automobiles. Realizing that income may also be a factor, the researcher decided to reexamine the relationship between education and ownership of expensive automobiles in light
of income level. This resulted in Table 15.9. Note that the percentages of those with and without college degrees who own expensive automobiles are the same for each of the income groups.
When the data for the high-income and low-income groups are examined separately. the association between education and ownership of expensive automobiles disappears. indicating that the initial relationship observed between these two variables was spurious.
REVEAL SUPPRESSED ASSOCIATION A researcher suspected desire to travel abroad may be influenced by age. However. a cross-tabulation of the two variables produced the results in Table 15.10. indicating no association. When sex was introduced as the third variable, Table 15.11 was obtained. Among men, 60 percent of those under 45 indicated a desire to travel abroad, as compared to 40 percent of those 45 or older. The pattern was reversed for women, where 35 percent of those under 45 indicated a desire to travel abroad, as opposed to 65 percent of those 45 or older. Because the association between desire to travel abroad and age runs in the opposite direction for males and females, the relationship between these two .A variables is masked when the data are aggregated across sex, as in table 15.10. But when the effect of sex is controlled, as in Table 15.!1, the suppressed association between desire to travel abroad and age is revealed for the separate categories of males and females .
NO CHANGE IN INITIAL RELATIONSHIP In some cases, the introduction of the third variable does not change the initial relationship observed, regardless of whether the original variables were associated. This suggests that the third variable does not influence the relationship between the first two. Consider the cross-tabulation of family size and the tendency to eat out frequently in fast-food restaurants, as shown in Table 15.12. The respondents were classified – into small and large family size categories based on a median split of the distribution, with 500 respondents in each category. No association is observed. The respondents were further classified into high- or low-income groups based on a median split. When income was introduced as a tlurd variable in the analysis, Table 15.13 was obtained. Again, no association was observed.
General Comments on Cross-Tabulation
More than three variables can be cross-tabulated, but the interpretation is quite complex. Also, because the number of cells increases multiplicatively, maintaining an adequate number of respondents or cases in each cell can be problematic. As a general rule, there should be at least five expected observations in each cell for the statistics computed to be reliable. Thus, cross-tabulation is an inefficient way of examining relationships when there are several variables. Note that cross-tabulation examines association between variables, not causation. To examine causation, the causal research design framework should be adopted.