Statistics Methods
National Ground-Water Monitoring Network Water-Level Statistics
Monthly and summary statistics are computed for National Groundwater Monitoring Network (NGWMN) sites that meet certain criteria. These statistics are displayed on the site page for each well or spring. The monthly statistics are also used to produce a map layer showing the status of wells. These statistics allow the most recent water-levels to be placed in a historical context regardless of measurement frequency.
Water-Level Data Used in Computations
All available continuous and periodic water-level measurements in the National Ground-Water Monitoring Network are used to compute summary statistics and monthly percentiles.
Water-level data provided by USGS comes from the USGS National Water Information System database. This database contains the status of each water-level measurement or daily value computation. Data that have not been reviewed for accuracy according to USGS policies are considered Provisional. Data that have been reviewed for accuracy by the collecting Water Science Center are considered Approved. For USGS sites, only approved discrete and daily values are used in the computation of summary and monthly statistics.
In contrast to USGS data, water-level data provided by non-USGS data providers do not always contain an indication of the review status of the water-level measurement. If a review status is available, it is used and only Approved water-levels are used in statistical computations. However, if a status flag is not available, all water-levels are assumed to be Approved and are used in calculations of statistics.
Statistics
The statistical approach used to present water-level measurements reflects the diversity of the datasets available for individual well networks across the United States. Ideally and for statistical simplicity, hourly or daily values would be available for 30 or more years from every well in every network, and every daily value would be computed using the same daily-value statistic. However, such temporal density is not always available and ad hoc design decisions in a processing algorithm are required.
Water-level measurements are evaluated within a monthly statistical framework. This process permits a uniform approach across the Nation for wells with frequent data collection and wells with infrequent data collection. The percentile provides the underpinning of a statistical framework. A percentile is a value on a scale of 0 to 100 that indicates the cumulative percentage of a distribution that is equal to or below that value. For example, in the table below, a water-level greater than the 90th percentile value indicates that the water-level is equal to or greater than 90 percent of the monthly median groundwater levels in that month over the period of record at the well. In general, and for the purposes here, the following labels are associated with various percentile intervals:
- a percentile greater than 90 is considered much above normal,
- a percentile greater than 75 and less than 90 is considered above normal,
- a percentile between 25 and 75 is considered normal,
- a percentile between 10 and 25 is considered below normal, and
- a percentile less than 10 is considered much below normal.
Computation of Percentiles
An example of the computed monthly statistics is shown in figure 1.
There are two computation steps. Step 1: Compute the median for each month of every year data had been collected. Step 2: Calculate the 10th, 25th, 50th, 75th and 90th percentiles using the medians when multiple values are provided for a given month.
Monthly medians are calculated for each individual month/year combination using all data available for that month and year. This may include a combination of discrete and daily value data. If the median falls on an indexed value, then it is set to that value. If the median falls between two most central values, then the mean of those two values is used. Rounding and significant figures rules are then applied to these medians. These medians can be found on the site page under the ?Water-Level Statistics? section by expanding the ?Median Water-Levels? section.
Percentiles are calculated for a given site and given month using the National Institute of Standards and Technology?s (NIST) method1. The NIST handbook states the following: For a series of measurements Y1, ..., Yn, denote the data ordered in increasing (numerical ascending) order by Y[1], ..., Y[N]. These ordered data are called order statistics.
Order statistics provide a way of estimating proportions of the data that should fall above and below a given value, called a percentile. The pth percentile is a value, Y(p), such that at most (100p) percent of the measurements are less than this value and at most 100(1- p) percent are greater. The 50th percentile is called the median.
Percentiles split a set of ordered data into hundredths. (Deciles split ordered data into tenths). For example, 70% of the data should fall below the 70th percentile. Percentiles are estimated from N measurements as follows:
For the pth percentile, set p (N+1) equal to k + d for k an integer, and d, a fraction greater than or equal to 0 and less than 1. [In other words, k is the integer part of the product of p and (N+1), and d is the decimal part]
For 0 < k < N, Y(p) = Y[k] + d(Y[k+1] ? Y[k])
For k = 0, Y(p) = Y[1]
For k = N, Y(p) = Y[N]
Significant Figures and Rounding
All decimal places in a water level available in the portal are considered significant (e.g. 3.75, 27.0 and 1.03 are all considered to have 3 significant digits). When used in calculations, significant digits are preserved after each calculation and rounded as necessary.2 When necessary, numbers are rounded towards their ?nearest neighbor? unless both neighbors are equidistant, in which case they are rounded down to the lesser magnitude. (e.g. 9.005 is rounded to 9.00 while 9.0051 is rounded to 9.01)
At this time, not all data provided for the NGWMN display water-level data using significant figures that reflect the accuracy of the data. We are in the process of working with our data providers to ensure that the precision shown in the data they provide reflects the accuracy of the measurement. In some cases, more than two decimals places are shown in discrete or daily value water-level data and sometimes fewer.
Criteria for Inclusion of Sites
Sites are used for water-level percentiles if they meet the following:
Site has been measured in the last 406 days, and Greater than 10 individual years with at least one measurement in a month.
Map Layer of water-level status
The NGWMN Data Portal now provides the capability to visualize this information on a map. Monthly percentiles are calculated for water-level measurements as an indication of the status of the water level at a site. The new map layer is generated by comparing the latest water-level measurement for a site to the water-level percentiles calculated for that site in the month of the latest measurement. The color of the point representing the site on the map correlates to which percentile range it falls. An example of the status map is shown in figure 2.
The water-level category "Low" indicates that the most recent groundwater-level measurement is lower than the lowest monthly median groundwater level in the month of measurement over the period of time that the well has been measured. Similarly, the water-level category "High" indicates that the most recent groundwater-level measurement is higher than the highest monthly median groundwater level in the month of measurement over the period of time that the well has been measured. The water-level category "Not Ranked" indicates that a water-level category has not been computed and, typically, means there have not been at least 10 years of water-level measurements in the month of the most recent measurement or that there has been no data collected in the past 406 days. The criteria used to select sites shown on the map are discussed in the ?Criteria for Exclusion of Sites? section above.
The map layer can be activated using the ?Map Layers? icon located on the right edge of the Portal map display. The layer applies to the set of currently filtered sites on the map.
A table of the water-level percentiles for a site may be accessed by clicking on a site on the map, navigating to the ?WATER LEVELS? tab, and clicking on the ?MONTHLY STATISTICS? button.
Disclaimer
Although these data have been processed on a computer system at the U.S. Geological Survey (USGS), no warranty expressed or implied is made regarding the display or utility of the data for other purposes, nor on all computer systems, nor shall the act of distribution constitute any such warranty. The USGS or the U.S. Government shall not be held liable for improper or incorrect use of the data described and/or contained herein.
References
National Institute of Standards and Technology?s (NIST) Engineering Statistics Handbook. Chapter 7.2.6.2 Percentiles. http://www.itl.nist.gov/div898/handbook/prc/section2/prc262.htm Significant Figures Rules. http://chemistry.bd.psu.edu/jircitano/sigfigs.html