The Data-Preparation Process Marketing Research Help

The data-preparation process is shown in Figure 14.1. The entire process is guided by the preliminary plan of data analysis that was formulated in the research design phase, The first step is to check for acceptable questionnaires, This is followed by editing. coding, and transcribing the data, The data are cleaned and a treatment for Missing responses prescribed. Often, statistical adjustment of the data may be necessary to make them representative of the population of interest, The researcher should then select an appropriate data analysis strategy. The final data analysis strategy differs from the preliminary plan of data analysis due to the information and insights gained since the preliminary plan was formulated. Data preparation should begin as soon as the first batch of questionnaires is received from the field, while the fieldwork is still going on. Thus if any problems are detected, the fieldwork can be modified to incorporate corrective action.

Questionnaire Checking

The initial step in questionnaire checking involves a check of all questionnaires )for completeness and interviewing quality, Often these checks are made while fieldwork is still underway, If the fieldwork was contracted to a data-collection agency, the researcher should make an independent check after it is over, A questionnaire returned from the field may be unacceptable for several reasons.



1. Parts of the questionnaire may be incomplete .

2. The pattern of responses may indicate that the respondent did not understand or follow the instructions. For example, skip patterns may not have been followed.

3. The responses show little variance. For example, a respondent has checked only 4s on a series of 7-point rating scales.

4. The returned questionnaire is physically incomplete: one or more pages are missing.

5. The questionnaire is received after the pre-established cut off date.

6. The questionnaire is answered by someone who does not qualify for participation.

If quotas or cell group sizes have been imposed, the acceptable questionnaires should be classified and counted accordingly, Any problems in meeting the sampling requirements should be identified and corrective action taken, such as conducting additional interviews in the underrepresented cells, before the data are edited.


Editing is the review of the questionnaires with the objective of increasing accuracy and precision, It consists of screening questionnaires to identify illegible, incomplete, inconsistent, or ambiguous responses.

Responses may be illegible if they have been poorly recorded, This is particularly common in questionnaires with a large number of unstructured questions. The data must be legible if they are to be properly coded, Likewise, questionnaires may be incomplete to varying degrees, A few or many questions may be unanswered.

At this stage, the researcher makes a preliminary check for consistency. Certain obvious inconsistencies can be easily detected. For example, a respondent reports an annual income of less than $20.000, yet indicates frequent shopping at prestigious department stores such as Saks Fifth Avenue and Neiman Marcus.

Responses to unstructured questions may be ambiguous and difficult to interpret clearly. The answer may be abbreviated, or some ambiguous words may have been used. For structured questions, more than one response may be marked for a question designed to elicit a single response, Suppose a respondent circles 2 and 3 on a 5-point rating scale. Does this mean that 2.5 was intended? To complicate matters further, the coding procedure may allow for only a single-digit response.

Treatment of Unsatisfactory Responses

Unsatisfactory responses are commonly handled by returning to the field 10 get better data, assigning missing values, or discarding unsatisfactory respondents.

RETURNING TO THE FIELD The questionnaires with unsatisfactory responses may be returned to the field. where the interviewers recontact the respondents. This approach is particularly
attractive for business and industrial marketing surveys. where the sample sizes are small and the respondents are easily identifiable. However. the data obtained the second time may be different from those obtained during the original survey. These differences may be attributed to changes over time or differences in the mode of questionnaire administration (e.g., telephone versus in-person interview).

ASSIGNING MISSING VALUES If returning the questionnaires to the field is not feasible. the editor may assign missing values to unsatisfactory responses. This approach may be desirable if (1) the number of respondents with unsatisfactory responses is small, (2) the proportion of satisfactory responses for each of these respondents is small, or (3) the variables with unsatisfactory responses are not the key variables.

DISCARDING UNSATISFACTORY RESPONDENTS In this approach, the respondents with unsatisfactory responses are simply discarded, This approach may have merit when (1) the proportion of unsatisfactory respondents is small (less than 10 percent), (2) the sample size is large, (3) the unsatisfactory respondents do not differ from satisfactory respondents in obvious ways (e.g., demographics, product usage characteristics), (4) the proportion of unsatisfactory responses for each of these respondents is large, or (5) responses on key variables are missing However, unsatisfactory respondents may differ from satisfactory respondents in systematic ways and the decision to designate a respondent as unsatisfactory may be subjective, Both these factors bias the results, If the researcher decides to discard unsatisfactory respondents the procedure adopted to identify these respondents and their number should be reported.

Real Research

Declaring Discards

In a cross-cultural survey of marketing managers from English-speaking African countries, questionnaires were mailed to 565 firms, A total of 192 completed questionnaires were returned of which four were discarded because respondents suggested that they were not in charge of overall marketing decisions, The decision to discard the four questionnaires was based on the consideration that the sample size was sufficiently large and the proportion of unsatisfactory respondents was small.


Coding means assigning a code, usually a number to each possible response to each question, If the questionnaire contains only structured questions or very few unstructured questions, it is precoded. This means that codes are assigned before fieldwork is conducted. If the questionnaire: contains unstructured questions, codes are assigned after the questionnaires have been returned from the field (post-coding). Although precoding was briefly discussed in Chapter 10 on questionnaire design, we provide further guidelines on coding structured and open-ended questions in the next section.

Coding Questions

The respondent code and the record number should appear on each record in the data. However, the record code can be dispensed if there is only one record for each respondent, The following additional codes should be included for each respondent, project code. interviewer code. date and time codes. and validation code. Fixed-field codes. which mean that the number of records for each respondent is the same and the same data appear in the same column(s) for all respondents, are highly desirable, If possible standard codes should be used for missing data. For example, a code of 9 could be used for a single-column variable, 99 for a double-column variable and so on, The missing value codes should be distinct from the codes assigned to the legitimate responses.

Coding of structured questions is relatively simple because the response options are predetermined, The researcher assigns a code is each response to each question and specifies the appropriate record and columns in which the response codes are to appear. For example.

Do you have a currently valid passport?
1. Yes 2. No (1/54)

For this question, a “Yes” response is coded I and a “No” response, 2.The numbers in parent these indicate that the code assigned will appear on the first record for this respondent in column 54, Because only one response is allowed and there are only two possible responses (1 or 2), a single column is sufficient. In general, a single column is sufficient to code a structured question with a single response if there are fewer than nine possible responses, In questions that permit a large number of responses, each possible response option should be assigned a separate column, Such questions include those about brand ownership or usage, magazine readership, and television viewing. For example,

Which accounts do you now have at this bank? (“X” as many as apply)

Regular savings account                               ¤(162)
Regular checking account                             ¤(163)
Mortgage                                                       ¤(164)
Now account                                                 ¤(165)
Club account (Christmas, etc.)                     ¤(166)
Line of credit                                                 ¤(167)
Term savings account (time deposits, etc.)  ¤(168)
Savings bank life insurance                          ¤(169)
Home improvement loan                               ¤(170)
Auto loan                                                       ¤(171)
Other services                                               ¤(172)

In this example, suppose a respondent checked regular savings, regular checking, and term savings accounts, On record #1,a 1 will be entered in the column numbers 162, 163, and 168. All the other columns (164,165,166,167,169, I7Q. 171. and 172) will receive, Because there is only one record per respondent, the record number has been omitted.

The coding of unstructured or open-ended questions is more complex. Respondents verbatim responses are recorded on the questionnaire, Codes are then developed and assigned to these responses, Sometimes. based on previous projects or theoretical considerations, the researcher can develop the codes before beginning fieldwork, Usually this must wait until the completed questionnaires are received, Then the researcher lists 50 to 100 responses to an unstructured question to identify the categories suitable for coding, Once codes are developed, the coders should be trained to assign the correct codes to the verbatim responses. The following guidelines are suggested for coding unstructured questions and questionnaires in general,”

Category codes should be mutually exclusive and collectively exhaustive, Categories are mutually exclusive if each response fits into one and only one category code. Categories should not overlap. Categories are collectively exhaustive if every response fits into one of the assigned category codes. This can be achieved by adding an additional category code of “other” or “none of the above.” However, only a few (10 percent or less) of the responses should fall into this category, The vast majority of the responses should be classified into meaningful categories.

Category codes should be assigned for critical issues even if no one has mentioned them, It may be important to know that no one has mentioned a particular response. For example. the management of a major consumer goods company was concerned about the packaging for a new brand of toilet soap, Hence, packaging was included as a separate category in coding responses to the question, “What do you like least about this toilet soap?

Data should be coded to retain as much detail as possible, For example, if data on the exact number of trips made on commercial airlines by business travelers have been obtained they should be coded as such. rather that grouped into two category codes of “infrequent fliers” and “frequent fliers.” Obtaining information on the exact number of trips allow, the researcher to later define categories of business travelers in several different ways. lithe categories were predefined the subsequent analysis of data would be limited by those categories.

Posted on December 1, 2015 in Data Preparation

Share the Story

Back to Top
Share This