Back to Index:

INEEL Annual Site Environmental Report - 2002
Appendix B - Statistical Methods used in the Idaho National Engineering and Environmental Laboratory Annual Site Environmental Report

Appendix B - Statistical Methods used in the Idaho National Engineering and Environmental Laboratory Annual Site Environmental Report

Relatively simple statistical procedures are used to analyze the data collected by the Idaho National Engineering and Environmental Laboratory (INEEL) Environmental Surveillance, Education and Research (ESER) program. ESER program personnel initially review field collection information and analytical results to determine whether there are clearly identifiable errors that would invalidate or limit the use of the results. Examples of these might be power outages at air sampler locations, torn membrane filters, or evidence of laboratory cross-contamination. Data that pass this initial screening are then evaluated for statistical significance with respect to laboratory analytical uncertainties, sample locations, reported releases from INEEL operations, meteorological data, and worldwide events that might conceivably have an effect on the regional environment.

Reporting Results

The results reported in the quarterly and annual reports are assessed in terms of data quality and statistical significance with respect to laboratory analytical uncertainties, sample locations, reported INEEL releases, meteorological data, and worldwide events that might conceivably have an effect on the INEEL environment. First, field collection and laboratory information are reviewed to determine identifiable errors that would invalidate or limit use of the data. Examples of these include insufficient sample volume, torn filters, evidence of laboratory cross-contamination, or quality control issues. Data that pass initial screening are further evaluated using statistical methods. Statistical tools are necessary for data evaluation particularly since environmental measurements typically involve the determination of minute concentrations, which are difficult to detect and even more difficult to distinguish from other measurements.

The term "measurable" as used for the discussion of results in this report does not imply any degree of risk to the public or environment but rather indicates that the radionuclide was detected at a concentration sufficient for the analytical instrument to record a value. The minimum detectable concentration (MDC) is used to assess measurement process capabilities. The MDC indicates the ability of the laboratory to detect an analyte in a sample at desired concentration levels. The ESER requires that the laboratory be able to detect radionuclides at levels below that normally expected in environmental samples, as observed historically in the region. These levels are typically well below regulatory limits. The MDC is instrument and analysis specific, and it is established by the analytical laboratory at the beginning of each analytical run. The MDC is an analytical/instrument value, determined by the laboratory before each analysis, above which there is a greater than 99.99 percent confidence that an analyte in a sample can be accurately measured.

It is the goal of the ESER program to minimize the error of saying something is not present when it actually is, to the extent that is reasonable and practicable. This is accomplished through the use of the uncertainty term, which is reported by the analytical laboratory with the sample result. For radiological data, individual analytical results are presented in this report with plus or minus two analytical standard deviations (± 2s). Where all analytical uncertainties have been estimated, "s" is an estimate of the population standard deviation "s," assuming a Guassian or normal distribution. The result plus or minus (±) the uncertainty term (2s) represents the 95 confidence interval for the measurement. That is, there is 95 percent confidence that the real concentration in the sample lies somewhere between the measured concentration minus the uncertainty term and the measured concentration plus the uncertainty term. By using a 2s value as a reporting level, the error rate for saying something is not there when it is, is kept to less than 5 percent. However, there may be a relatively high error rate for false detections (reporting something as present when it actually is not) for results near their 2s uncertainty levels. This is because the variability around the sample result may substantially overlap the variability around a net activity of zero for samples with no radioactivity. If the result lies in the range of two to three times its estimated analytical uncertainty (2s to 3s), and assuming that the result belongs to a Gaussian distribution (a bell-shaped curve), detection of the material by the analysis may be questionable because of statistical variations within the group of samples. Analyses with results in the questionable range (2s to 3s) are thus presented in this report with the understanding that the radionuclide may not actually be present in the sample. If the result exceeds 3s, there is higher confidence that the material was detected (or, that the radionuclide was indeed present in the sample). If a result is less than or equal to 2s there is little confidence that the radionuclide is present in the sample.

There are many factors that can influence the result to some degree. These factors are considered and included in the methods used to determine the estimated uncertainty of the measurement. Counting statistics primarily cause uncertainties in measurements near the MDC. For low concentrations near the MDC, the uncertainty in the measurement is nearly equal to the measurement itself, and the lower limit of the range of the measurement approaches "zero." As a result, such values might not be very reliable because the uncertainty is only an estimate and the actual probability distribution of the results is not usually known. In reality, the material being measured may not actually be present in the sample (termed a false positive). Therefore, when analytical results show a measurement very near the MDC, statistical tools, meteorological data, and INEEL release information are all considered when interpreting and evaluating the results.

Statistical Tests

An example set of data are presented here to illustrate the statistical tests used to assess data collected by the ESER contractor. The dataset used are the gross beta environmental surveillance data collected from January 8, 1997 through December 26, 2001. The data were collected weekly from several air monitoring stations located around the perimeter of the INEEL and air monitoring stations throughout the Snake River Plain. The perimeter locations are termed "boundary" and the plain locations are termed "distant." There are seven boundary locations (Arco, Atomic City, Birch Creek, Federal Aviation Administration [FAA] Tower, Howe, Monteview, and Mud Lake) and five distant locations (Blackfoot, Blackfoot Community Monitoring Station [CMS], Craters of the Moon, Idaho Falls, and Rexburg CMS). The gross beta data are of the magnitude 10-15. To simplify the calculations and interpretation, these have been coded by multiplying each measurement by 10-15.

Only portions of the complete gross beta data set are used. The purpose of this task is to evaluate and illustrate the various statistical procedures and not a complete analysis of the data.

Back to top

Test of Normality

The first step in any analysis of data is a test for normality. Many standard statistical tests of significance require that the data be normally distributed. The most widely used test of normality is the Shapiro-Wilk W test (Shapiro and Wilk 1965). The Shapiro-Wilk W test is the preferred test of normality because of its good power properties as compared to a wide range of alternative tests (Shapiro et al. 1968). If the W statistic is significant (p < 0.00001), then the hypothesis that the respective distribution is normal should be rejected.

Graphical depictions of the data should be a part of any evaluation of normality. The following histogram (Figure B-1) presents such a graphical look along with the results of the Shapiro-Wilk W test. The data used for the illustration are the five years of weekly gross beta measurements for the Arco boundary location. The W statistic is highly significant (p < 0.0001), indicating that the data are not normally distributed. The histogram shows that the data are asymmetrical with right skewness. This suggests that the data may be lognormally distributed. The Shapiro-Wilk W test can be used to test this distribution by taking the natural logarithms of each measurement and calculating the W statistic. Figure B-2 presents this test of lognormality. The W statistic is not significant (p = 0.80235), indicating that the data are lognormal.

To perform parametric tests of significance, such as Student's t test or One-Way Analysis of Variance (ANOVA), it is required that all data be normally (or lognormally) distributed. Therefore, to compare gross beta results of each boundary location, tests of normality must be performed before such comparisons are made. Table B-1 presents the results of the Shapiro-Wilk W test for each of the seven boundary locations.

From Table B-1, none of the locations consist of data that are normally distributed and only some of the data sets are lognormally distributed. This is a typical result and a common problem when using a parametric test of significance. When many comparisons are to be made, attractive alternatives are nonparametric tests of significance.

Back to top


Comparison of Two Groups

For comparison of two groups, the Mann-Whitney U test (Hollander and Wolfe 1973) is a powerful nonparametric alternative to the Student's t test. In fact, the U test is the most powerful(or sensitive) nonparametric alternative to the t test for independent samples; in some instances it may offer even greater power to reject the null hypothesis than the t test. The interpretation of the Mann-Whitney U test is essentially identical to the interpretation of the Student's t test for independent samples, except that the U test is computed based on rank sums rather than means. Because of this fact, outliers do not present the serious problem that they do when using parametric tests.

Suppose we wish to compare all boundary locations to all distant locations. Figure B-3 presents the box plots for the two groups. The median is the measure of central tendency most commonly used when there is no assumed distribution. It is the middle value when the data are ranked from smallest to largest. The 25th and 75th percentiles are the values such that 75 percent of the measurements in the data set are greater than the 25th percentile and 75 percent of the measurements are less than the 75th percentile. The large distance between the medians and the maximums seen in Figure B-3 indicate the presence of outliers. It is apparent that the medians are of the same magnitude, indicating graphically that there is probably not a significant difference between the two groups.

The Mann-Whitney U test compares the rank sums between the two groups. In other words, for both groups combined, it ranks the observations from smallest to largest. Then it calculates the sum of the ranks for each group and compares these rank sums. A significant p-value (p < 0.05) indicates a significant difference between the two groups. The p-value for the comparison of boundary and distant locations is not significant (p = 0.0599). Therefore, the conclusion is that there is not strong enough evidence to say that a significant difference exists between boundary and distant locations.

Back to top

Comparison of Many Groups

Comparing the boundary locations amongst themselves in the parametric realm is done with a One-Way ANOVA. A nonparametric alternative to the One-Way ANOVA is the Kruskal-Wallis ANOVA (Hollander and Wolfe 1973). The test assesses the hypothesis that the different samples in the comparison were drawn from the same distribution or from distributions with the same median. Thus, the interpretation of the Kruskal-Wallis ANOVA is basically identical to that of the parametric One-Way ANOVA, except that it is based on ranks rather than means.

Figure B-4 presents the box plot for the boundary locations. The Kruskal-Wallis ANOVA test statistic is highly significant (p < 0.0001), indicating a significant difference amongst the seven boundary locations. Table B-2 gives the number of samples, medians, minimums, and maximums for each boundary location. The Kruskal-Wallis ANOVA only indicates that significant differences exist between the seven locations and not the individual occurrences of differences. If desired, the next step is to identify pairs of locations of interest and test those for significant differences using the Mann-Whitney U test. It is cautioned that all possible pairs should not be tested, only those of interest. As the number of pairs increases, the probability of a false conclusion also increases.

Suppose a comparison between Arco and Atomic City is of special interest because of their close proximity. A test of significance using the Mann-Whitney U test results in a p-value of 0.7288, indicating that a significant difference does not exist between gross beta results at Arco and Atomic City. Other pairs can similarly be tested but with the caution given above.

Back to top

Tests for Trends Over Time

Regression analysis is used to test whether or not there is a significant positive or negative trend in gross beta concentrations over time. To illustrate the technique, the regression analysis is performed for the boundary locations as one group and the distant locations as another group. The tests of normality performed earlier indicated that the data were closer to lognormal than normal. For that reason, the natural logarithms of the original data are used in the regression analysis. Regression analysis assumes that the probability distributions of the dependent variable (gross beta) have the same variance regardless of the level of the independent variable (collection date). The natural logarithmic transformation helps in satisfying this assumption.

Figure B-5 presents a scatterplot of the boundary data with the fitted regression line superimposed. Figure B-6 presents the same for the distant data. Table B-3 gives the regression equation and associated statistics. There appears to be slightly increasing trends in gross beta over time for both the boundary and distant locations. A look at the regression equations and correlation coefficients in Table B-3 confirm this. Notice that the slope parameter of the regression equation and the correlation coefficient are equal. This is true for any linear regression fit. So, a test of significant correlation is also a test of significant trend. The p-value associated with testing, whether or not the correlation coefficient is different from zero, is the same as for testing if the slope of the regression line is different from zero. For both the boundary and distant locations, the slope is significantly different from zero and positive indicating an increasing trend in gross beta over time.

Another important point of note in Figures B-5 and B-6 is the obvious existence of a cyclical trend in gross beta. It appears as if the gross beta measurements are highest in the summer months and lowest in the winter months. Since the regression analysis performed above is over several years, we are still able to detect a positive trend over time even though it is confounded somewhat by the existence of a cyclical trend. This is important because a linear regression analysis performed over a shorter time period may erroneously conclude a significant trend, when in fact, it is just a portion of the cyclical trend.

Back to top

Comparison of Slopes

A comparison of slopes between the regression lines for the boundary locations and distant locations will indicate if the rate of change in gross beta over time differs with location. The comparison of slopes can be performed by constructing 95 percent confidence intervals about the slope parameter (Neter and Wasserman 1974). If these intervals overlap, we can conclude that there is no evidence to suggest a difference in slopes for the two groups of locations.

A confidence interval for the slope is constructed as    b - t0.025, n−2 sb ≤ β ≤ b + t0.025, n−2 sb  where:

b = point estimate of the slope;

t0.025,n-2 = the Student's t-value associated with two-sided 95 percent confidence and n-2 degrees of freedom;

sb = the standard deviation of the slope estimate, b; and

ß = the true slope, which is unknown.

Table B-4 gives the values used in constructing of the confidence intervals and the resulting confidence intervals. From the fifth column of Table B-4 (the confidence intervals for the slope overlap), we can conclude that there is no difference in the rate of change in gross beta measurements for the two location groupings, boundary and distant.


Hollander, M. and Wolfe, D.A., 1973, Nonparametric Statistical Methods, New York: John Wiley and Sons, Inc.

Neter, J. and Wasserman, W., 1974, Applied Linear Statistical Models, Homewood, Illinois: Richard D. Irwin, Inc.

Shapiro, S.S. and Wilk, M.B., 1965, “An Analysis of Variance Test for Normality (complete samples),” Biometrika, 52, 591-611.

Shapiro, S.S., Wilk, M.B., and Chen, H.J., 1968, “A Comparative Study of Various Tests of Normality,” Journal of the American Statistical Association, 63, 1343-1372.

Back to top