The validity of a scale may be defined as the extent to which differences in observed scale scores reflect true differences among objects on the characteristic being measured, rather than systematic or random error. Perfect validity requires that there be no measurement error (Xo = XT, XR = 0, Ks = 0). Researchers may assess content ~alidity riterion validity, or construct validity
CRITERION VALIDITY Criterion validity reflects whether. a scale performs.as expected in relation to other variables selected as meaningful criteria (criterion variables). Criterion variables may include demographic and psycho graphic characteristics and behavioral measures, or scores obtained from other scales. Based on the time period involved, criterion validity can take two forms: concurrent and predictive validity.
Construe VALIDITY Construct validity addresses the question of what construct or characteristic the scale is, in fact, measuring. When assessing construct validity, the researcher attempts to answer theoretical questions about why the scale works and what deductions can be made concerning the underlying theory. Thus, construct validity requires a sound theory of the nature of the construct being measured and how it relates to other constructs. Construct validity is the most sophisticated and difficult type of validity to establish. As Figure 9.5 shows, construct validity includes convergent, discriminant, and entomological validity.
Convergent validity is the extent to which the scale correlates positively with other measures of the same construct. It is not necessary that all these measures be obtained by using conventional scaling techniques. Discriminant validity is the extent to which a measure does not correlate with other constructs from which it is supposed to differ.Tt involves demonstrating a lack Qf correlation among differing constructs. Nomological validity is the extent to which the scale correlates in theoretically predicted ways with ‘measures of different but ‘related constructs. A theoretical model is formulated that leads to further deductions, tests, and inferences. Gradually, a nomological net is built in which several constructs are systematically interrelated. We Illustrate construct validity in the context of a multi-item scale designed to measure sclf-concept.i?
To Thine Own Self Be True
The following findings would provide evidence of construct validity for a multi-item scale to measure self-concept
• High correlations with other scales designed to measure self-concepts and with reported classifications by friends (convergent validity)
• Low correlations with unrelated constructs of brand loyalty and variety seeking (discmninant validity)
• Brands that are congruent with the individual’s self-concept are more preferred, as postulated by the theory (nomological validity)
• A high level of reliability
Notice that a high level of reliability.was included as of’ construct validity in this example. This illustrates the relationship between reliability and validity
Relationship’ Between Reliability and Validity..
The relationship between reliability and validity can be understood in terms of the true score..measure is perfectly valid, it is also perfectly reliable. In this case Xo =’x:r- XR = 0, and Xs = 0. Thus, perfect validity implies perfect reliability.’ If a measure is unreliable, it cannot be perfectly valid: because at II minimum-Ko = XT + XR. Furthermore, systematic error may also be present, i.e.,Xs .0. Thus.unreliability implies invalidity: If a measure is perfectly reliable, it may or.may not be perfectly valid, because systematic error may still.be present (Xo = XT + Xs)’ Although lack of reliability constitutes negative evidence for- validity, reliability does not in itself imply validity. Reliability is a necessary, but not sufficient, condition for validity
Generalization refers to the extent to ‘which one can generalize observations at hand to a universe of generalizations. The set of all conditions of measurement over which the investigator wishes to generalize ‘is the universe of generalization. These conditions may include items, interviewers, situations of observation, and so on. iI researcher may wish to generalize a scale developed for use in personal interviews to other modes of data collection, such as mail and telephone interviews. Likewise, one may wish to generalize from a.sample of items to the’ universe of items, from a sample of times of measurement to the universe of times of measurement, from a sample of observers to a universe of observers, and so on
In generalization studies, measurement procedures are designed to investigate the universes of interest by sampling conditions of measurement from each of them. For each universe of interest, an aspect of measurement called e facet is included in the study. Traditional reliability methods can be viewed as single-facet generalization studies, A test-retest correlation is concerned with whether. scores obtained from a measurement scale are generalization to the universe scores across all times of possible measurement. Even if the test-retest correlation is high, nothing can
be said about the generalizability of the scale to other universes. To generalize to other universes, generalizability theory procedures must be employed.
Choosing a Scaling Technique
In addition to theoretical considerations and evaluation of reliability and validity; certain practical factors should be considered in selecting scaling techniques for a particular marketing research problem.’! These. include the level of information (nominal, ordinal, interval, or ratio) desired, the capabilities of the respondents, the characteristics of the stimulus objects, method of administration, the context, and cost. . .
As a general rule, using the scaling technique that will yield the highest level of information feasible in a given situation will permit the use of the greatest variety of statistical analyses. Also, regardless of the type of scale used, whenever feasible, several scale items should measure the characteristic of interest. This provides more accurate measurement than a single-item scale. In many, situations, it is desirable to use more than one scaling technique or to obtain additional measures using mathematically derived scales.
Mathematically Derived Scales
AII the scaling techniques discussed in this chapter require the respondents, to evaluate ‘difectly various characteristics of the stimulus objects. In contrast, mathematical scaling techniques allow researchers to infer respondents’ evaluations of characteristics of stimulus objects. These evaluations are inferred from the respondents’ overall judgments of the objects. Two popular mathematically derived scaling techniques are multidimensional scaling and conjoint analysis. These techniques are discussed in detail in
In designing the scale or response format, respondents’ educational or literacy levels should be taken into account.32 One approach is to develop scales that are or free of biases. Of the scaling techniques we have considered, the semantic differential scale may be said to be It has been tested in a number of countries and has consistently produced similar results.
Copying the Name Xerox
Xerox was a name well received in the former Soviet Union for the past 30 years. In fact, the act of copying documents was called Xeroxing, a term coined after the name of the company. It was a brand name people equated with quality. However, with the disintegration of the Soviet Union into the Commonwealth of Independent States (C[S), sales of Xerox started to fall. The management initially considered this problem to be the result of intense competition with strong competitors such as Canon, Ricoh Co., Mitsubishi Electric Corp., and Minolta Camera Co. First attempts at making the product more competitive did not help. Subsequently, marketing research was undertaken to measure the image of Xerox and its competitors. Semantic differential scales were used, because this type of scale is considered pan-cultural. The bipolar labels used were carefully tested to ensure that they had the intended semantic meaning in the Russian language and context
Although the semantic differential worked well in the Russian context, an alternative approach is to develop scales that use a self-defined cultural norm as a base referent. For example, respondents may be required to indicate their own anchor point and position relative to a culture-specific stimulus set. This approach is useful for measuring attitudes that are defined relative to cultural norms (e.g., attitude toward marital roles). In developing response formats, verbal rating scales appear to be the most suitable. Even less-educated respondents can readily understand and respond to verbal scales. Special attention should be devoted to determining equivalent verbal descriptors in different languages and cultures. The endpoints of the scale are particularly prone to different interpretations. In some cultures, “I” may be interpreted as best, whereas in others, it may be interpreted as worst, regardless of how it is scaled. In such cases, it might ‘be desirable to avoid numbers and to just Use boxes that the respondent can check (worst 0000000 best). It is important that the scale endpoints and the verbal descriptors be employed in a manner that is consistent with the culture. Finally, in international marketing research, it is critical to establish the equivalence of scales and measures used to obtain data from different countries. This topic is complex and is discussed in some detail in
Ethics in Marketing Research
The researcher has the ethical responsibility to use scales that have reasonable reliability, validity, and generalization. The findings generated by scales that are unreliable. invalid, or not generalization to the target population are questionable at best and raise serious ethical issues. Moreover, the researcher should not bias the scales so as to slant the findings in any particular direction. 1;his is easy to do by biasing the wording of the statements (Liken-type scales), the scale descriptors, or other aspects of the scales. Consider the use of scale descriptors. The descriptors used to frame a scale can be chosen to bias results in a desired direction, for example, to generate a positive view of the client’s brand or a negative view of a competitor’s brand. To project the client’s brand favorably, respondents are asked to indicate their opinion of the brand on several attributes using 7-point scales anchored by the descriptors “extremely poor” to “good.” In such a case, respondents are reluctant to rate the product extremely poorly. In fact, respondents who believe the product to be only mediocre will end up responding favorably. Try this yourself. How would you rate BMW automobiles on the following attributes?
A Direct Measure of the Ethics of Direct Marketers
Many types of businesses are marketing to people over the phone, by e-mail, by text messages, and by direct mail without any consideration for the individuals they are trying to persuade to purchase their products. Many direct-marketing companies, including insurance, health care, and telecommunication companies, have paid billions of dollars in fines for unethical marketing practices. Denny Hatch has proposed the following honesty scale for companies using direct marketing.
Monster: The Monster of Career Networks
When you think of the word “monster,” what do you think? Scary creatures under your bed? Elmo and Grover from Sesame Street? The Walt Disney movie titled Monsters, Inc.? These days, the word “monster” also refers to the online job search company that has connected millions of job searchers with employers. This company was founded in 1994 by Jeff Taylor. and Sal Iannuzzi was appointed chairman and CEO in 2007. It is the leading online global careers network and the world’s number one hiring management resource. As of 2008, its clients included more than 90 of the Fortune 100 and approximately 490 of the Fortune 500 companies. The company had operations in 36 countries around the world. No wonder this company has added a whole new meaning to the word monster.
Monster makes heavy use of marketing research techniques in a unique way. Unlike companies such as Nielsen that conduct marketing research for different companies, Monster researches companies that are in need of employees to fill their positions and provides the service of matching job searchers to these companies. Although Monster is doing well, more and more companies have followed in Monster’s footsteps and have entered the arena of providing job search services. These competing companies include Hot Jobs With all of these different services available, the market is beginning to become saturated with Internet recruiting Web sites. It is important for Monster, now more than ever, to differentiate itself from the competition.
The Marketing Research Decision
1. The success of Monster lies in matching the companies’ job specifications with the skills and qualifications of job applicants. What scaling techniques should Monster use to measure companies’ job specifications and job applicants’ skills and qualifications?
2. Discuss the role of the type of scaling technique you recommend in enabling to match companies’ job specifications and job applicants’ skills and qualifications and thereby increase the market share of Monster
The Marketing Management Decision
1. What should Sal do to gain market share over competitors?
2. Discuss how the marketing management decision action that you recommend to Sal Iannuzzi is influenced by the scaling technique that you suggested earlier and by the findings of that research
Using SPSS Data’ Entry, the researcher can design any of the three non comparative scales: Likert, semantic differential, or Stapel. Moreover, multi-item scales can be easily accommodated. Either the question library can be used or customized scales can be designed. We show the use of SPSS Data Entry to design Liken-type scales for rating salespeople and product characteristics in Figure 9.7