Introduction
Consider the following scenario.
A study was performed to find out whether pamphlets containing information for cancer patients are written at a level that the cancer patients can understand. Tests were administered to measure the reading ability of 63 cancer patients, and then the readability levels of 30 cancer pamphlets were evaluated based on such factors as the lengths of the sentences and the number of polysyllabic words. Both the reading ability and readability levels correspond to grade levels, but patients’ reading levels of less than Grade 3 and above Grade 12 could not be determined exactly.
Source: Short, Moriarty, and Cooly. (1995). “Readability of Educational Materials for Cancer Patients.” Journal of Statistics Education, v.3, n.2
The following tables indicate the number of patients at each reading ability level and the number of pamphlets at each readability level.
Table 1: Patient’s Reading Level
Patients’ Reading Level
<3 3 4 5 6 7 8 9 10 11 12 >12
Count
6
4
4
3
3
2
6
5
4
7
2
17
Table 2: Pamphlet’s Readability Level
Pamphlets’ Readability Level
6
7
8
9
10
11
12
13
14
15
16
Count
3
3
8
4
1
1
4
2
1
2
1
In this scenario, a researcher might be interested in the typical reading level for patients in the sample or the typical readability level of the pamphlets. In other words, researchers might want to know the central tendency for each of these variables. The central tendency is the value that is the most representative of the entire distribution of scores for a variable. Measures of central tendency for continuous variables are important for researchers and decision-makers because they are often most interested in the typical case.
Defining the Measures of Central Tendency
There are three measures of central tendency:
· 1
1
Mean
· 2
2
Median
· 3
3
Mode
Although you can rely on programs like SPSS to calculate values for these statistics, you need to understand the ways in which they differ and instances where one may be more useful than another.
Mean
The mean is the most frequently used measure of central tendency for continuous variables. There are a number of reasons for this. One important reason is that the mean, as opposed to the median or the mode, incorporates every value for a given variable, which makes the mean a good representation of the sample as a whole. Also, if we draw several samples from the population, the means for the samples tend to be more similar than, say, the values for the medians and modes. This is another indication that the mean is typically a good representation of the typical value for a variable. Whenever possible, then, researchers typically report the mean as the measure of central tendency for their continuous variables.
To calculate the mean for the readability level data above, you would add up all of the scores in the distribution, which would give you a sum of 294. You would then divide this sum by the number of scores in the distribution (30). The mean would be 294/30, or 9.8.
You can also think of the mean as the balancing point, or center of gravity if each observation were a single weight on a number line. To start to understand this, think of a childs see-saw with an adult on one end and a small child on the other. If the see-saw is attached to a crossbar in the middle of the see-saw, the child will be high in the air and the adult stuck on the ground. If the cross-bar is moved closer to the adult, a balance point can be reached.
Histogram for the Pamphlets Readability Level
Now, take a look at the histogram for the Pamphlets Readability Level. The data are presented in a grouped form where the count represents the frequency of occurrence of that level. Note how the histogram corresponds to the data in the table. On level 6, for example, the count is 3, which means that the first three data points are 6 6 6; the count of level 7 is also 3, which means that the next three data points are 7 7 7; the count of level 8 is 8, which means that the next eight data points are 8 8 8 8 8 8 8 8, and so on. As you will see later in the Skill Builder, the further an observation is from the others, the more effect it has on determining the value of the mean.
Median
The second most common measure of central tendency for use with continuous variables is the median. The median is an appropriate measure of central tendency when the measurement is at the ordinal, interval, or ratio level. The median is not appropriate for nominal measurement. Think of the median as the score dividing the observations, so that one-half have smaller values than the median and one-half have larger values. Use the following steps to find the median.
· 1 Order the data from smallest to largest.
· 2 Consider whether n, the number of observations, is even or odd.
· 3 If n is odd, the median M is the center observation in the ordered list. This observation is the one “sitting” in the (n + 1) / 2 spot in the ordered list.
· 4 If n is even, the median M is the mean of the two center observations in the ordered list. These two observations are the ones “sitting” in the n / 2 and n / 2 + 1 spots in the ordered list.
For a simple visualization of the location of the median, consider the following two simple cases of n = 7 and n = 8 ordered observations, with each observation represented by a solid circle.
Mode
The mode for a variable is the most frequently occurring value in the data set. As an example of how to find the mode, consider the data from the previous activity for the number of hours that 9 students spent on the computer on a typical day:
The following are the number of hours that nine students spend on the computer on a typical day:
1
6
7
5
5
8
11
12
15
The mode for these data would be 5. Five is the most frequently occurring score in the data set; it occurs twice in the sample whereas each of the other scores only occurs once.
Note that, although the mode can be calculated for continuous variables (as we did above), the mode is much more useful for categorical variables with a small number of categories, such as gender or political party affiliation.
In our example using the Pamphlets Readability Level, you can see that the mode is 8 and close in value to the median and mean.
Table 2: Pamphlet’s Readability Level
Pamphlets’ Readability Level
6
7
8
9
10
11
12
13
14
15
16
Count
3
3
8
4
1
1
4
2
1
2
1
For the Patients Reading Level, however, the mode is the score category >12, a value not near the center of the distribution
Table 1: Patient’s Reading Level
Patients’ Reading Level
<3 3 4 5 6 7 8 9 10 11 12 >12
Count
6
4
4
3
3
2
6
5
4
7
2
17
Thus, the mode would not be the best measure of central tendency for these data. In conclusion, then, researchers are typically not concerned with the mode when they are studying continuous variables.
Nominal or Ordinal Categorical Variables
As noted, whenever possible, researchers typically report the mean as their measure of central tendency. There are instances, however, when the mean is not the best measure of central tendency to report.
For example, using the mean to discuss central tendency wont make sense for many categorical variables that are nominal or ordinal. For example, think about gender, which is a variable that typically has categories of male and female. Suppose that you have coded male as 1 and female as 2. The average value (the mean) for gender might be 1.2. Saying that the average gender for a sample is 1.2 just does not make sense.
So, in deciding the best measure of central tendency to report for a given variable, first consider whether the variable is categorical or not. If the variable is categorical, it will typically not be appropriate to report the mean. You should, instead, report the mode for nominal, categorical variables such as gender.
Skewed Distribution
One last situation in which the mean might not be the best measure of central tendency to use is when the distribution for a variable is skewed. Recall that a skewed distribution is one that is asymmetrical, in which the scores are piled up more on one side of the mean, compared to the other; see the picture below for an example.
Income is a classic example of a distribution that is often skewed. There are few extremely high-income people and many low-income ones. Income provides an example of positive skew because the tail of the distribution appearing longer on the right side of the distribution corresponding to the positive side of the number line.
Negative skew refers to the tail of the distribution appearing longer on the left-hand side of the distribution corresponding to the negative side of the number line.
The amount of time an air flight is late would also have a positively skewed distribution because many flights are on-time or close to on-time, but those that are late can be extremely late. In general, when data are heavily skewed, you will want to report the median, perhaps alongside the mean.
Applied Sciences
Architecture and Design
Biology
Business & Finance
Chemistry
Computer Science
Geography
Geology
Education
Engineering
English
Environmental science
Spanish
Government
History
Human Resource Management
Information Systems
Law
Literature
Mathematics
Nursing
Physics
Political Science
Psychology
Reading
Science
Social Science
Home
Homework Answers
Blog
Archive
Tags
Reviews
Contact
twitterfacebook
Copyright © 2021 SweetStudy.com
Recent Comments