An important characteristic of any set of data is the variation in the data. In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean. The most common measure of variation, or spread, is the standard deviation. The standard deviation is a number that measures how far data values are from their mean. The standard deviation The standard deviation provides a measure of the overall variation in a data setThe standard deviation is always positive or zero. The standard deviation is small when the data are all concentrated close to the mean, exhibiting little variation or spread. The standard deviation is larger when the data values are more spread out from the mean, exhibiting more variation. Suppose that we are studying the amount of time customers wait in line at the checkout at supermarket A and supermarket B. the average wait time at both supermarkets is five minutes. At supermarket A, the standard deviation for the wait time is two minutes; at supermarket B the standard deviation for the wait time is four minutes. Because supermarket B has a higher standard deviation, we know that there is more variation in the wait times at supermarket B. Overall, wait times at supermarket B are more spread out from the average; wait times at supermarket A are more concentrated near the average. The standard deviation can be used to determine whether a data value is close to or far from the mean.Suppose that Rosa and Binh both shop at supermarket A. Rosa waits at the checkout counter for seven minutes and Binh waits for one minute. At supermarket A, the mean waiting time is five minutes and the standard deviation is two minutes. The standard deviation can be used to determine whether a data value is close to or far from the mean. Rosa waits for seven minutes:
Binh waits for one minute.
The number line may help you understand standard deviation. If we were to put five and seven on a number line, seven is to the right of five. We say, then, that seven is one standard deviation to the right of five because \(5 + (1)(2) = 7\). If one were also part of the data set, then one is two standard deviations to the left of five because \(5 + (-2)(2) = 1\).
The equation value = mean + (#ofSTDEVs)(standard deviation) can be expressed for a sample and for a population.
The lower case letter s represents the sample standard deviation and the Greek letter \(\sigma\) (sigma, lower case) represents the population standard deviation. The symbol \(\bar{x}\) is the sample mean and the Greek symbol \(\mu\) is the population mean. Calculating the Standard DeviationIf \(x\) is a number, then the difference "\(x\) – mean" is called its deviation. In a data set, there are as many deviations as there are items in the data set. The deviations are used to calculate the standard deviation. If the numbers belong to a population, in symbols a deviation is \(x - \mu\). For sample data, in symbols a deviation is \(x - \bar{x}\). The procedure to calculate the standard deviation depends on whether the numbers are the entire population or are data from a sample. The calculations are similar, but not identical. Therefore the symbol used to represent the standard deviation depends on whether it is calculated from a population or a sample. The lower case letter s represents the sample standard deviation and the Greek letter \(\sigma\) (sigma, lower case) represents the population standard deviation. If the sample has the same characteristics as the population, then s should be a good estimate of \(\sigma\). To calculate the standard deviation, we need to calculate the variance first. The variance is the average of the squares of the deviations (the \(x - \bar{x}\) values for a sample, or the \(x - \mu\) values for a population). The symbol \(\sigma^{2}\) represents the population variance; the population standard deviation \(\sigma\) is the square root of the population variance. The symbol \(s^{2}\) represents the sample variance; the sample standard deviation s is the square root of the sample variance. You can think of the standard deviation as a special average of the deviations. If the numbers come from a census of the entire population and not a sample, when we calculate the average of the squared deviations to find the variance, we divide by \(N\), the number of items in the population. If the data are from a sample rather than a population, when we calculate the average of the squared deviations, we divide by n – 1, one less than the number of items in the sample. Formulas for the Sample Standard Deviation \[s = \sqrt{\dfrac{\sum(x-\bar{x})^{2}}{n-1}} \label{eq1}\] or \[s = \sqrt{\dfrac{\sum f (x-\bar{x})^{2}}{n-1}} \label{eq2}\] For the sample standard deviation, the denominator is \(n - 1\), that is the sample size MINUS 1. Formulas for the Population Standard Deviation \[\sigma = \sqrt{\dfrac{\sum(x-\mu)^{2}}{N}} \label{eq3} \] or \[\sigma = \sqrt{\dfrac{\sum f (x-\mu)^{2}}{N}} \label{eq4}\] For the population standard deviation, the denominator is \(N\), the number of items in the population. In Equations \ref{eq2} and \ref{eq4}, \(f\) represents the frequency with which a value appears. For example, if a value appears once, \(f\) is one. If a value appears three times in the data set or population, \(f\) is three. Sampling Variability of a StatisticThe statistic of a sampling distribution was discussed previously in chapter 2. How much the statistic varies from one sample to another is known as the sampling variability of a statistic. You typically measure the sampling variability of a statistic by its standard error. The standard error of the mean is an example of a standard error. It is a special standard deviation and is known as the standard deviation of the sampling distribution of the mean. You will cover the standard error of the mean in Chapter 7. The notation for the standard error of the mean is \(\dfrac{\sigma}{\sqrt{n}}\) where \(\sigma\) is the standard deviation of the population and \(n\) is the size of the sample. TechnologyIn practice, USE A CALCULATOR OR COMPUTER SOFTWARE TO CALCULATE THE STANDARD DEVIATION. If you are using a TI-83, 83+, 84+ calculator, you need to select the appropriate standard deviation \(\sigma_{x}\) or \(s_{x}\) from the summary statistics. If you are using a spreadsheet (Microsoft Excel or Google Sheets), you should use the appropriate formula =stdev.p( or =stdev.s( .We will concentrate on using and interpreting the information that the standard deviation gives us. However you should study the following step-by-step example to help you understand how the standard deviation measures variation from the mean. (The technology instructions appear at the end of this example.) Example \(\PageIndex{1}\) In a fifth grade class, the teacher was interested in the average age and the sample standard deviation of the ages of her students. The following data are the ages for a SAMPLE of n = 20 fifth grade students. The ages are rounded to the nearest half year: 9; 9.5; 9.5; 10; 10; 10; 10; 10.5; 10.5; 10.5; 10.5; 11; 11; 11; 11; 11; 11; 11.5; 11.5; 11.5; \[\bar{x} = \dfrac{9+9.5(2)+10(4)+10.5(4)+11(6)+11.5(3)}{20} = 10.525 \nonumber\] The average age is 10.53 years, rounded to two places. The variance may be calculated by using a table. Then the standard deviation is calculated by taking the square root of the variance. We will explain the parts of the table after calculating s.
The sample variance, \(s^{2}\), is equal to the sum of the last column (9.7375) divided by the total number of data values minus one (20 – 1): \[s^{2} = \dfrac{9.7375}{20-1} = 0.5125 \nonumber\] The sample standard deviation s is equal to the square root of the sample variance: \[s = \sqrt{0.5125} = 0.715891 \nonumber\] and this is rounded to two decimal places, \(s = 0.72\). Typically, you do the calculation for the standard deviation on your calculator or computer. The intermediate results are not rounded. This is done for accuracy.
Solution: Spreadsheet (MS Excel/Google Sheets) (Part a only)
Solution: TI Graphing Calculator
Exercise 2.8.1 On a baseball team, the ages of each of the players are as follows: 21; 21; 22; 23; 24; 24; 25; 25; 28; 29; 29; 31; 32; 33; 33; 34; 35; 36; 36; 36; 36; 38; 38; 38; 40 Use your calculator or computer to find the mean and standard deviation. Then find the value that is two standard deviations above the mean. Answer \(\mu\) = 30.68 \(s = 6.09\) (\(\bar{x} + 2s = 30.68 + (2)(6.09) = 42.86\). Explanation of the standard deviation calculation shown in the tableThe deviations show how spread out the data are about the mean. The data value 11.5 is farther from the mean than is the data value 11 which is indicated by the deviations 0.97 and 0.47. A positive deviation occurs when the data value is greater than the mean, whereas a negative deviation occurs when the data value is less than the mean. The deviation is –1.525 for the data value nine. If you add the deviations, the sum is always zero. (For Example \(\PageIndex{1}\), there are \(n = 20\) deviations.) So you cannot simply add the deviations to get the spread of the data. By squaring the deviations, you make them positive numbers, and the sum will also be positive. The variance, then, is the average squared deviation. The variance is a squared measure and does not have the same units as the data. Taking the square root solves the problem. The standard deviation measures the spread in the same units as the data. Notice that instead of dividing by \(n = 20\), the calculation divided by \(n - 1 = 20 - 1 = 19\) because the data is a sample. For the sample variance, we divide by the sample size minus one (\(n - 1\)). Why not divide by \(n\)? The answer has to do with the population variance. The sample variance is an estimate of the population variance. Based on the theoretical mathematics that lies behind these calculations, dividing by (\(n - 1\)) gives a better estimate of the population variance. Your concentration should be on what the standard deviation tells us about the data. The standard deviation is a number which measures how far the data are spread from the mean. Let a calculator or computer do the arithmetic. The standard deviation, \(s\) or \(\sigma\), is either zero or larger than zero. When the standard deviation is zero, there is no spread; that is, all the data values are equal to each other. The standard deviation is small when the data are all concentrated close to the mean, and is larger when the data values show more variation from the mean. When the standard deviation is a lot larger than zero, the data values are very spread out about the mean; outliers can make \(s\) or \(\sigma\) very large. The standard deviation, when first presented, can seem unclear. By graphing your data, you can get a better "feel" for the deviations and the standard deviation. You will find that in symmetrical distributions, the standard deviation can be very helpful but in skewed distributions, the standard deviation may not be much help. The reason is that the two sides of a skewed distribution have different spreads. In a skewed distribution, it is better to look at the first quartile, the median, the third quartile, the smallest value, and the largest value. Because numbers can be confusing, always graph your data. Display your data in a histogram or a box plot. Example \(\PageIndex{2}\) Use the following data (first exam scores) from Susan Dean's spring pre-calculus class: 33; 42; 49; 49; 53; 55; 55; 61; 63; 67; 68; 68; 69; 69; 72; 73; 74; 78; 80; 83; 88; 88; 88; 90; 92; 94; 94; 94; 94; 96; 100
Answer
The long left whisker in the box plot is reflected in the left side of the histogram. The spread of the exam scores in the lower 50% is greater (\(73 - 33 = 40\)) than the spread in the upper 50% (\(100 - 73 = 27\)). The histogram, box plot, and chart all reflect this. There are a substantial number of A and B grades (80s, 90s, and 100). The histogram clearly shows this. The box plot shows us that the middle 50% of the exam scores (IQR = 29) are Ds, Cs, and Bs. The box plot also shows us that the lower 25% of the exam scores are Ds and Fs.
Exercise \(\PageIndex{2}\) The following data show the different types of pet food stores in the area carry. 6; 6; 6; 6; 7; 7; 7; 7; 7; 8; 9; 9; 9; 9; 10; 10; 10; 10; 10; 11; 11; 11; 11; 12; 12; 12; 12; 12; 12; Calculate the sample mean and the sample standard deviation to one decimal place using a TI-83+ or TI-84 calculator. Answer \(\mu = 9.3\) and \(s = 2.2\) Standard deviation of Grouped Frequency TablesRecall that for grouped data we do not know individual data values, so we cannot describe the typical value of the data with precision. In other words, we cannot find the exact mean, median, or mode. We can, however, determine the best estimate of the measures of center by finding the mean of the grouped data with the formula: \[\text{Mean of Frequency Table} = \dfrac{\sum fm}{\sum f}\] where \(f\) interval frequencies and \(m =\) interval midpoints. Just as we could not find the exact mean, neither can we find the exact standard deviation. Remember that standard deviation describes numerically the expected deviation a data value has from the mean. In simple English, the standard deviation allows us to compare how “unusual” individual data is compared to the mean. Example \(\PageIndex{3}\) Find the standard deviation for the data in Table \(\PageIndex{3}\). Table \(\PageIndex{3}\)
For this data set, we have the mean, \(\bar{x}\) = 7.58 and the standard deviation, \(s_{x}\) = 3.5. This means that a randomly selected data value would be expected to be 3.5 units from the mean. If we look at the first class, we see that the class midpoint is equal to one. This is almost two full standard deviations from the mean since 7.58 – 3.5 – 3.5 = 0.58. While the formula for calculating the standard deviation is not complicated, \(s_{x} = \sqrt{\dfrac{f(m - \bar{x})^{2}}{n-1}}\) where \(s_{x}\) = sample standard deviation, \(\bar{x}\) = sample mean, the calculations are tedious. It is usually best to use technology when performing the calculations. SpreadsheetsFor the previous example, we can use the spreadsheet to calculate the values in the table above, then plug the appropriate sums into the formula for sample standard deviation. Graphing CalculatorFind the standard deviation for the data from the previous example
First, press the STAT key and select 1:Edit Input the midpoint values into L1 and the frequencies into L2 Select STAT, CALC, and 1: 1-Var Stats Select 2nd then 1 then , 2nd then 2 Enter You will see displayed both a population standard deviation, \(\sigma_{x}\), and the sample standard deviation, \(s_{x}\). Comparing Values from Different Data SetsThe standard deviation is useful when comparing data values that come from different data sets. If the data sets have different means and standard deviations, then comparing the data values directly can be misleading.
#ofSTDEVs is often called a "z-score"; we can use the symbol \(z\). In symbols, the formulas become:
Example \(\PageIndex{4}\) Two students, John and Ali, from different high schools, wanted to find out who had the highest GPA when compared to his school. Which student had the highest GPA when compared to his school?
Answer For each student, determine how many standard deviations (#ofSTDEVs) his GPA is away from the average, for his school. Pay careful attention to signs when comparing and interpreting the answer. \[z = \text{#ofSTDEVs} = \left(\dfrac{\text{value-mean}}{\text{standard deviation}}\right) = \left(\dfrac{x + \mu}{\sigma}\right) \nonumber\] For John, \[z = \text{#ofSTDEVs} = \left(\dfrac{2.85-3.0}{0.7}\right) = -0.21 \nonumber\] For Ali, \[z = \text{#ofSTDEVs} = (\dfrac{77-80}{10}) = -0.3 \nonumber\] John has the better GPA when compared to his school because his GPA is 0.21 standard deviations below his school's mean while Ali's GPA is 0.3 standard deviations below his school's mean. John's z-score of –0.21 is higher than Ali's z-score of –0.3. For GPA, higher values are better, so we conclude that John has the better GPA when compared to his school. Exercise \(\PageIndex{4}\) Two swimmers, Angie and Beth, from different teams, wanted to find out who had the fastest time for the 50 meter freestyle when compared to her team. Which swimmer had the fastest time when compared to her team?
Answer For Angie: \[z = \left(\dfrac{26.2-27.2}{0.8}\right) = -1.25 \nonumber\] For Beth: \[z = \left(\dfrac{27.3-30.1}{1.4}\right) = -2 \nonumber\] The following lists give a few facts that provide a little more insight into what the standard deviation tells us about the distribution of the data. For ANY data set, no matter what the distribution of the data is:
For data having a distribution that is BELL-SHAPED and SYMMETRIC:
References
ReviewThe standard deviation can help you calculate the spread of data. There are different equations to use if are calculating the standard deviation of a sample or of a population.
Formula Review\[s_{x} = \sqrt{\dfrac{\sum fm^{2}}{n} - \bar{x}^2}\] where \(s_{x} \text{sample standard deviation}\) and \(\bar{x} = \text{sample mean}\) Use the following information to answer the next two exercises: The following data are the distances between 20 retail stores and a large distribution center. The distances are in miles. 29; 37; 38; 40; 58; 67; 68; 69; 76; 86; 87; 95; 96; 96; 99; 106; 112; 127; 145; 150 Exercise 2.8.4 Use a graphing calculator or computer to find the standard deviation and round to the nearest tenth. Answer \(s\) = 34.5 Exercise 2.8.5 Find the value that is one standard deviation below the mean. Exercise 2.8.6 Two baseball players, Fredo and Karl, on different teams wanted to find out who had the higher batting average when compared to his team. Which baseball player had the higher batting average when compared to his team?
Answer For Fredo: \(z\) = \(\dfrac{0.158-0.166}{0.012}\) = –0.67 For Karl: \(z\) = \(\dfrac{0.177-0.189}{0.015}\) = –0.8 Fredo’s z-score of –0.67 is higher than Karl’s z-score of –0.8. For batting average, higher values are better, so Fredo has a better batting average compared to his team. Exercise 2.8.7 Use Table to find the value that is three standard deviations:
Find the standard deviation for the following frequency tables using the formula. Check the calculations with the TI 83/84. Exercise 2.8.5 Find the standard deviation for the following frequency tables using the formula. Check the calculations with the TI 83/84.
Answer
Bringing It TogetherExercise 2.8.7 Twenty-five randomly selected students were asked the number of movies they watched the previous week. The results are as follows:
Answer
Exercise 2.8.8 Forty randomly selected students were asked the number of pairs of sneakers they owned. Let \(X =\) the number of pairs of sneakers owned. The results are as follows:
Exercise 2.8.9 Following are the published weights (in pounds) of all of the team members of the San Francisco 49ers from a previous year. 177; 205; 210; 210; 232; 205; 185; 185; 178; 210; 206; 212; 184; 174; 185; 242; 188; 212; 215; 247; 241; 223; 220; 260; 245; 259; 278; 270; 280; 295; 275; 285; 290; 272; 273; 280; 285; 286; 200; 215; 185; 230; 250; 241; 190; 260; 250; 302; 265; 290; 276; 228; 265
Answer
Exercise 2.8.10 One hundred teachers attended a seminar on mathematical problem solving. The attitudes of a representative sample of 12 of the teachers were measured before and after the seminar. A positive number for change in attitude indicates that a teacher's attitude toward math became more positive. The 12 change scores are as follows: 3; 8; –1; 2; 0; 5; –3; 1; –1; 6; 5; –2
Exercise 2.8.11 Refer to Figure determine which of the following are true and which are false. Explain your solution to each part in complete sentences. <figure > </figure>
Answer
Exercise 2.8.12 In a recent issue of the IEEE Spectrum, 84 engineering conferences were announced. Four conferences lasted two days. Thirty-six lasted three days. Eighteen lasted four days. Nineteen lasted five days. Four lasted six days. One lasted seven days. One lasted eight days. One lasted nine days. Let X = the length (in days) of an engineering conference.
Exercise 2.8.13 A survey of enrollment at 35 community colleges across the United States yielded the following figures: 6414; 1550; 2109; 9350; 21828; 4300; 5944; 5722; 2825; 2044; 5481; 5200; 5853; 2750; 10012; 6357; 27000; 9414; 7681; 3200; 17500; 9200; 7380; 18314; 6557; 13713; 17768; 7493; 2771; 2861; 1263; 7285; 28165; 5080; 11622
Answer
Use the following information to answer the next two exercises. \(X =\) the number of days per week that 100 clients use a particular exercise facility.
Exercise 2.8.14 The 80th percentile is _____
Exercise 2.8.15 The number that is 1.5 standard deviations BELOW the mean is approximately _____
Answer a Exercise 2.8.16 Suppose that a publisher conducted a survey asking adult consumers the number of fiction paperback books they had purchased in the previous month. The results are summarized in the Table.
GlossaryStandard Deviationa number that is equal to the square root of the variance and measures how far data values are from their mean; notation: s for sample standard deviation and σ for population standard deviation.Contributors and AttributionsVariancemean of the squared deviations from the mean, or the square of the standard deviation; for a set of data, a deviation can be represented as\(x\) – \(\bar{x}\) where \(x\) is a value of the data and \(\bar{x}\) is the sample mean. The sample variance is equal to the sum of the squares of the deviations divided by the difference of the sample size and one.Barbara Illowsky and Susan Dean (De Anza College) with many other contributing authors. Content produced by OpenStax College is licensed under a Creative Commons Attribution License 4.0 license. Download for free at http://cnx.org/contents/. Which measure of variation represents the average distance from the mean?The standard deviation is the most commonly used measure for variability. This measure is related to the distance between the observations and the mean.
What is a measure of variation from the mean?In statistics, variance measures variability from the average or mean. It is calculated by taking the differences between each number in the data set and the mean, then squaring the differences to make them positive, and finally dividing the sum of the squares by the number of values in the data set.
What is the average distance from the mean called?The standard deviation measures the dispersion or variation of the values of a variable around its mean value (arithmetic mean). Put simply, the standard deviation is the average distance from the mean value of all values in a set of data.
What are the measures of variation?Statisticians use summary measures to describe the amount of variability or spread in a set of data. The most common measures of variability are the range, the interquartile range (IQR), variance, and standard deviation.
|