As illustrated in the previous section, a frequency distribution is a convenient way of looking at different values of a variable, A frequency table is easy to read and provides basic information, but sometimes this information may be too detailed and the researcher must summarize it by the use of descriptive statistics, The most commonly used statistics associated with frequencies are measures of location (mean, mode, and median), measures of variability (range, interquartile range, standard deviation, and coefficient of variation) , and measures of shape (skewness and kurtosis).”
Measures of Location
The measures of location that we discuss are measures of central tendency because they tend to describe the center of the distribution. If the entire sample is changed by adding a fixed constant to each observation, then the mean, mode, and median change by the same fixed amount.
MEAN The mean, or average value, is the most commonly used measure of central tendency. It is used to estimate the mean when the data have been collected using an interval or ratio scale. The data should display some central tendency, with most of the responses distributed around the mean. The mean, X, is given by
Xi = observed values of the variable X
n = number of observations (sample size)
If there are no outliers, the mean is a robust measure and does not change markedly as data values are added or deleted. For the frequencies given in Table 15.2, the mean value is calculated as follows:
X = (2 X 2 + 6 X 3 + 6 X 4 + 3 X 5 + 8 X 6 + 4 X 7)/29
= (4 + 18 + 24 + 15 + 48 + 28)/29
MODE The mode is the value that occurs most frequently. It represents the highest peak of the distribution. The mode is a good measure of location when the variable is inherently categorical or has otherwise been grouped into categories. The mode in Table 15.2 is 6.000.
MEDIAN The median of a sample is the middle value when the data are arranged in ascending or descending order. If the number of data points is even, the median is usually estimated as the midpoint between the two middle values-by adding the two middle values and dividing their sum by 2. The median is the 50th percentile. The median is an appropriate measure of central tendency for ordinal data. In Table 15.2, the median is 5.000
For the data in Table 15.1, the three measures of central tendency for this distribution are .:.different (mean = 4.724, mode = 6.000, median = 5.000). This is not surprising, because-each measure defines central tendency in a different way. So which measure should be used? If the variable is measured on a nominal scale, the mode should be used. If the variable is measured on an ordinal scale, the median is appropriate. If the variable is measured 0/, an interval or ratio scale, the mode is a poor measure of central tendency. This can be seen from Table 15.2. Although the modal value of 6.000 has the highest frequency, it represents only 27.6 percent of the sample. In general, for interval or ratio data, the median is a.better measure of central tendency, although it too ignores available information about the variable. The actual values of the variable above and below the median are ignored. The mean is the most appropriate measure of central tendency for interval or ratio data. The mean makes use of all the information available because all of the values are used in computing it. However, the mean is sensitive to extremely small or extremely large values (outliers). When there are outliers in the data, the mean is not a good measure of central tendency and it is useful to consider both the mean and the median.
In Table 15.2, since there are no extreme values and the data are treated as interval, the.mean value of 4.724 is a good measure of location or central tendency. Although this value is greater than 4. it is still not high (i.e., it is less than 5). If this were a large and representative sample, the interpretation would be that people, on the average, are only moderately ‘familiar with the Internet. This would call for both managerial action on the part of Internet service providers and public policy initiatives on the part of governmental bodies to make people more familiar with the Internet and increase Internet usage.
Measures of Variability
The measures of variability, which are calculated on interval or ratio data, include the range, interquartile range, variance or standard deviation, and coefficient of variation.
RANGE The range measures the spread of the data. It is simply the difference between the largest and smallest values in the sample. As such, the range is directly affected by outliers.
Range = X largest – X smallest
If all the values in the data are multiplied by a constant, the range is multiplied by the same constant. The range for the data in Table 15.2 is 7 – 2 = 5.000.
INTERQUARTlLE RANGE The interquartile range is the difference between the 75th and 25th percentile. For a set of data points arranged in order of magnitude, the pth percentile is the value that has p percent of the data points below it and (100 – p) percent above it. If all the data joins are multiplied by a constant, the interquartile range is multiplied by the same constant. The interquartile range for the data in Table 15.2 is 6 – 3 = 3.000.
VARIANCE AND STANDARD DEVIATION The difference between the mean and an observed value is called the deviation from the mean. The variance is the mean squared deviation from the mean. The variance can never be negative. When the data points are clustered around the mean, the variance is small. When the data points are scattered, the variance is large. It” all the data values are multiplied by a constant, the variance is multiplied by the square of the constant. The standard deviation is the square root of the variance. Thus, the standard deviation is expressed in the same units as the data, rather than in squared units. The standard deviation of a samples, is calculated as:
We divide by n-1 instead of n because the sample is drawn from a population and we are trying to determine how much the responses vary from the mean of the entire population. However, the population mean is unknown; therefore the sample mean is used instead. The use of the sample mean makes the sample seem less variable than it really is. By dividing by n-1, instead of n, we compensate for the smaller variability observed in the sample. For the data given in Table 15.2, the variance is calculated as follows:
COEFFICIENT OF VARIATION The coefficient of variation is the ratio of the standard “deviation to the mean, expressed as a percentage, and it is a unit less measure of relative variability, The coefficient of variation, CV, is expressed as:
The coefficient of variation is meaningful only if the variable is measured on a ratio scale. It remains unchanged if all the data values are multiplied by a constant. Because familiarity with the Internet is not measured on a ratio scale, it is not meaningful to calculate the coefficient of variation for the data in Table 15.2. From a managerial viewpoint, measures of variability are important because if a characteristic shows good variability, then perhaps the market could be segmented based on that characteristic.
Measures of Shape
In addition to measures of variability, measures of shape are also useful in understanding the nature f the distribution. The shape of a distribution is assessed by examining skewness and kurtosis.
SKEWNESS Distributions can be either symmetric or skewed. In a symentric distribution, the values on either side of the center of the distribution are the same, and the mean, mode, and median are equal. The positive and corresponding negative deviations from the mean are also equal. In a skewed distribution, the positive and negative deviations from the mean are unequal. Skewness is the tendency of the deviations from the mean to be larger in one direction than in the other. It can be thought of as the tendency for one tail of the distribution to be heavier than the other (see Figure 15.2). The skewness value for the data of Table 15.2 is -0.094, indicating a slight negative skew.
KURTOSIS Kurtosis is a measure of the relative peakedness or flatness of the curve defined by the frequency distribution, The kurtosis of a normal distribution is zero. If the kurtosis is positive, then the distribution Is more peaked than a normal distribution. A negative value means that the distribution is flatter than a normal distribution. The value of this statistic for Table 15.2 is -1.261, indicating that the distribution is flatter than a normal distribution. Measures of shape are important, because if a distribution is highly skewed or markedly peaked or flat, then statistical procedures that assume normality should be used with caution.