About
Materials
Workshops
New
Order
Activities
HOA Home > Activities > The Normal Curve
 

The following material is an excerpt from Chapter 10, on statistical concepts, of the HOA manual. The pages just before this section in the manual, discussed Variance and the Standard Deviation.

Standard Deviation and the Normal Curve

Standard deviation also has many other useful applications. Statisticians have created a model for random events called the normal distribution. This mathematically describes the likelihood of obtaining a certain value in an experiment, depending on how many standard deviations from the accepted average that value lies. If you connect the midpoint of the tops of each bar in a histogram, you will get a curve. A bell-shaped curve that closely matches the distribution of many large sets of numbers is called the normal curve or bell curve. For example, the odds of a coin-toss resulting in "heads" is 50-50, or half of 100 tosses. But if you toss a coin 100 times and keep track of the number of times you get "heads," you probably will not get "heads" exactly 50 times. But if you repeat the experiment 1000 times (100,000 coin tosses in sets of 100 each), and then draw a relative frequency histogram for the number of times you get "heads," a normal curve will result. The likelihood of a measurement being within a certain number (Z) of standard

normal curve

deviations from the average is assessed by finding the area under the bell curve between the points (-Z) and (Z). Statistical tables exist which give the area between (-Z) and (Z) for a range of possible Z's. The area is given in percent (%) and should be interpreted as the probability that a value will fall within Z standard deviations of the average.

In a bell-shaped histogram, we would expect about 68% of the data to lie within one standard deviation (the interval ± 1 SD), and almost 100% within three standard deviations (the interval ± 3 SD).

To understand what this means, consider the following set of data:

4.0, 3.9, 4.1, 4.0, 4.2, 3.9, 3.9, 4.1, 3.8, 4.0,

with an average = 4.0 and a standard deviation = 0.12.

If the measurements follow the normal distribution, then approximately:

  1. 68% of the measurements fall between 4.0 ± 0.12, or between 3.88 and 4.12;

  2. 96% of the measurements fall between 4.0 ± (2 x 0.12), or between 3.76 and 4.24;

  3. 99.8% of the measurements fall between 4.0 ± (3 x 0.12), or between 3.64 and 4.36.

For any set of data to appear to be normal, the number of data points should be large—at least 30—and the larger the better. Then and only then an analysis of Z should be made to determine if the distribution is normal or not. The example we just considered is not a good representation of a normal distribution, even though it may give a normal curve, because the data points are fewer than 30. So let us assume that we had a large number of observations and came up with a normal curve. Now the question is, why is it so important to have a normal curve?

This concept is critical to assessing the validity of measurements, since it helps to detect errors. Almost 100% of the data will fall within three standard deviations of the average, so if we get a measurement of 4.4 in our sample data, we can assume that the measurement is probably false. However, we have to be very careful to determine whether there is any valid reason to discard this measurement. Not all unlikely measurements are incorrect. To determine the validity of the results, the standard error of the average is calculated.

The next section of the manual is Core Activity 10.5: The Standard Error of the Average—The Error Bar.

A normal Curve