A, of 2009. Princes, Cruises (www.themarketingresearch.com), pan of Carnival Corporation. annually carried more than a million passengers, Princess wished to know”, hat passengers thoughIof the cruise experience. but wanted to determine this information in a cost-effective way.A scannable questionnaire ‘”as de’ doped that allowed’the cruise line to quickly transcribe the data from thousands of sun, copyediting data preparation and analysis, This questionnaire is distributed 10 measure customer faction on all voyages
In add million to saving time compared to keypunching. scanning of the survey results. The senior market researcher for Princess Cruises. Jaime Goldfarb. commented. “When we compared the data files from the two methods. we found that although the scanned item occasionally missed marks because he had not been killed improperly the scanned data file more accurate than the keypunched file.
A monthly repin cruise destination.and ship is produced. This report identifies any specific problems that have been noticed. and steps arc taken 10 make sure these problems are addressed. Recently.these surveys have led to changes in the menu and the various buffets located around.
Data cleaning includes consistency checks and treatment of missing responses, Although preliminary consistency checks have been made during editing. the checks at this stage are more thorough and extensive, because they are made by computer.
Consistency checks identify data that are out of range. logically inconsistent, or have extreme values. Out-of-range data values are inadmissible and must be corrected. For example, respondents have been asked to express their degree of agreement with a series of lifestyle statements on a 1-10-5 scale. Assuming that 9 has been designated for missing values. data values of 0.6_ 7, and 8 are out of range. Computer packages like SPSS. SAS_ EXCEL and MINITAB can be programmed to identify out-of-range values for each variable and print out the respondent code. variable code. variable name, record number, column number. and out-of-range value.” This makes it easy to check each variable systematically for out-of-range values. The correct responses can be determined by going back to the edited and coded questionnaire.
Responses can be logically inconsistent in various ways. For example. a respondent may indicate that she charges long-distance calls to a calling card. although she does not have one. Or a respondent reports both unfamiliarity with, and frequent usage of. the same product. The necessary information (respondent code, variable code. variable name. record number, column number. and inconsistent values) can be printed to locate these responses and take corrective action.
Finally, extreme values should be closely examined, Not all extreme values result from errors, but they may point to problems with the data. For example, an extremely low evaluation of a brand may be the result of the respondent indiscriminately circling Is (on a 1-to-7 rating scale) on all attributes of this brand,
Treatment of Missing Responses
Missing responses represent values of a variable that are unknown, either because respondents provided ambiguous answers or their answers were not properly recorded. The former cause is also known as item nOliresponse that occurs because the respondent refuses, or is unable. to answer specific questions or items because of the content. form. or the effort required. Treatment of missing responses poses problems. particularly if the proportion of missing responses is more than 10 percent. The following options are available for the treatment of missing responses.”
- Substitute a Neutral value. A neutral value, typically the mean response to the variable, is substituted for the missing responses. Thus, the mean of the variable remains unchanged and other statistics. such as correlations, are not affected much. Although this approach has some merit. the logic of substituting a mean value (say 4) for respondents who, if they had answered, might have used either high ratings (6 or 7) or low ratings (1 or 1) is questionable.
- Substitute an Imputed Response. The respondents pattern of responses to other questions is used to impute or calculate a suitable response to the missing questions. The researcher attempts to infer from the available data the responses the individuals would have given if they had answered the questions. This can be done statistically by determining the relationship of the variable in question to other variables. based on the available data. For example. product usage could be related to household size for respondents who have provided data on both variables. The missing product usage response for a respondent could then be calculated. given that respondent’s household size. However, this approach requires considerable effort and can introduce serious bias. Sophisticated statistical procedures have been developed to calculate imputed values for missing responses.Imputation Increases Integrity
A project was undertaken to assess the willingness of households to implement the recommendations of an energy audit (dependent variable). given the financial implications. The independent variables consisted of five financial factors that were manipulated at known levels. and their values were always known by virtue of the design adopted. However. several values of the dependent variable were missing. These missing values were replaced with imputed values. The imputed values were statistically calculated. given the corresponding values of the independent variables. The treatment of missing responses in this manner greatly increased the simplicity and validity of subsequent analysis.
- Casewise Deletion. In casewise deletion, cases. or respondents, with any missing responses are discarded from the analysis. Because many respondents may have some missing responses, this approach could result in a small sample. Throwing away large amounts of data is undesirable, because it is costly and time-consuming to collect data. Furthermore, respondents with missing responses could differ from respondents with complete responses in systematic ways. If so, casewise deletion could seriously bias the results.
- Pairwise Deletion. In pairwise deletion, instead of discarding all cases with any missing values, the researcher uses only the cases or respondents with complete responses for the variable(s) involved in each calculation. As a result, different calculations in an analysis may be based on different sample sizes. This procedure may be appropriate when (I) the sample size is large, (2) there are few missing responses, and (3) the variables are not highly related. Yet this procedure can produce results that are unappealing or even infeasible.
The different procedures for the treatment of missing responses may yield different results, particularly when the responses are not missing at random ar.d the variables are related. Hence, missing responses should be kept to a minimum. The researcher should carefully consider the implications of the various procedures before selecting a particular method for the treatment of nonresponse. It is a good practice to use more than one method of treating missing responses and examine the impact of the different methods on the results.