Back to top or back to Water Quality Index or back to Whats New or back to Onalaska Ecology Links
After completing your sampling program you may feel that your project is finished. However, you've really barely begun. You now need to make sense of all the data you have collected. Data interpretation allows you to learn from the water quality data, helps you improve your sampling program, and is really the reason you collected the data in the first place.
After you collect your water quality data, the next step is to organize it into Data Tables . Well-organized data tables help you analyze the data, and are often used to quickly spot data errors and water quality violations. Graphs are visual tools that help you see trends and correlations among your data. Summary Statistics allow you to describe large data sets with just a few representative values. Each of these steps are described in more detail below.
Go to: Introduction
- | - Data Tables
- | - Graphs
- | - Summary Statistics
- | - Exercises
Back to top or back to Water Quality Index or back to Whats New
A Data Table is simply an organized way to display all of your water quality data, and will usually be included in the report you write about your sampling results. Data tables may be hand-written or typed in a word processor, but are most useful when created using computer spreadsheet and database programs. Spreadsheet programs (Excel, Lotus, Quattro) allow you to print tables, perform calculations, and develop graphs with your data. Database programs go a step further by allowing you to do computerized searches through your data (for instance, find all values that exceed Class AA water quality standards).
Data tables may be organized in many ways, depending on what kind of problem you are looking at. One common approach is to create one table for each sampling location. The columns of the table would then be the various water quality parameters, and the rows would be the results for each sampling date:
| Date | mg/l |
degrees C |
mg/l |
#/100ml |
This kind of table is especially useful if you are trying to see how different parameters are related to each other.
A second common method is to create one table for each water quality parameter. In this case the columns would be the various sampling locations, and the rows would be the results for each sampling date:
| Date | ||||||
This kind of table helps you look at trends in your data, such as how a parameter changes over time at one location, or how it changes as you move downriver on a given sampling date.
An important part of Quality Control is to make sure that your tables are transcribed accurately from your original water quality data. All tables should be carefully proofed and checked against your original laboratory and field notes.
Go to: Introduction
- | - Data Tables
- | - Graphs
- | - Summary Statistics
- | - Exercises
Back to top or back to Water Quality Index or back to Whats New
Graphing is an excellent way to display your data, and is very helpful when you are analyzing trends and correlations. There are many kinds of graphs, and you are encouraged to be creative in finding different ways of looking at data. The following are some examples of the kinds of graphs often done in water quality studies.
Time-history graphs are graphs that show how a physical or chemical parameter changes with time at a sampling location. For instance, the following time-history graph of dissolved oxygen at Station 1 was developed from the data in Example Table 1 above:
This graph shows that dissolved oxygen levels were high in the winter, but dropped markedly throughout the summer.
Another type of graph is the spatial-trend graph . This is simply a plot of how a water quality parameter changes as you move upriver. For example, the following graph shows dissolved oxygen levels at all sampling stations for the August 1998 data in Example Table 2:
The low dissolved oxygen at River Mile 10 is called the DO-sag point , and could indicate impacts from an upstream pollution source (such as a sewage treatment plant discharging ammonia and BOD).
A correlation plot is used to see if there is a relationship between two physical or chemical parameters. For instance, you could use the data from Example Table 1 to plot Dissolved Oxygen vs. Temperature at Station 1:
This plot shows that for the most part Dissolved Oxygen decreases as Temperatures become higher (warm water holds less oxygen than does cold water).
Go to: Introduction
- | - Data Tables
- | - Graphs
- | - Summary Statistics
- | - Exercises
Back to top or back to Water Quality Index or back to Whats New
Summary statistics are numbers that you calculate to represent and summarize your data. They are especially useful in large data sets. The most well-known example is the average, or mean value.
The first step in developing summary statistics is to decide how you want to organize your data. You need to divide your data into groups, or subsets, that are comparable and can be used to support or refute whatever hypothesis you are trying to prove. For example, if you wanted to analyze how average summer dissolved oxygen changes as you move downstream, you would create groups of data that contain all of the summer dissolved oxygen data at each sampling location. You could then calculate and graph the average summer dissolved oxygen at each location.
The most common statistics are calculated to show Central Tendency and Variability ; these are discussed in detail below.
Measures of Central Tendency
Measures of Central Tendency are statistics you calculate if you want to represent a group of data by a single value. This value may also be referred to as the expected or the most-likely value.
The most common measure of central tendency is the Mean , or average value. It is calculated as the sum of all of your values divided by the number of values:
The Mean is a useful measure of central tendency, but is very sensitive to unusually large values (called outliers ). For example, the following is a typical data set for fecal coliform at a sampling station:
Date Fecal Coliform (#/100ml) 1/2/98 50 1/9/98 40 1/16/98 90 1/23/98 500 1/30/98 80
The Mean of these values is
or 152 #/100ml. This would be a clear violation of the Class A water quality standard of 100 #/100ml. However, if you remove the largest value (500) the Mean becomes (50 + 40 + 90 + 80)/4, or 65 #/100ml. This Mean value would pass Class A standards. Given all of the possibilities for measurement error, sampling handling error, and natural variability, you don't want your conclusions to be so heavily influenced by a single odd sample; the value of 500 #/100ml might be the result of sample contamination or just an unusually polluted parcel of water.
One way to get around this problem is to calculate the Geometric Mean , which is less sensitive to outliers. The Geometric Mean is the nth root of the product of your values, where n is the number of samples:
In our example above, the Geometric Mean would be
or 94 #/100ml. The Geometric Mean in this case passes the water quality standard. This is the approach taken in State water quality regulations, which stipulate that the Geometric Mean value of fecal coliform measurements should meet water quality standards.
A third measure of central tendency is the Median , which is simply the value that is equal to or greater than half of your data set (and equal to or less than half your data set). In the example above, the Median is 80 #/100ml (it's greater than 40 and 50, but less than 500 and 90). The Median is not particularly sensitive to outliers, and is a better measure of central tendency in data sets that do not follow a normal bell-curve probability distribution.
Measures of Variability
Measures of central tendency like the Mean describe the most likely value, but don't tell you anything about how the values varied. For example, the following two data sets have the same Mean value of 100:
Data Set 1: 98, 99, 100, 101, 102, Mean value = 100 Data Set 2: 10, 38, 150, 202, Mean value = 100
If you just compared the mean values of these data sets, you would think they were very similar. However, you can see that one data set consists of a group of values clustered closely around 100, while the other data set has values ranging from 10 to 202. You want to be able to describe this variability when you are displaying summary statistics.
The simplest measure of variability is the Range , which is the difference between the minimum and maximum value. The most useful way to show the range is by displaying the minimum and maximum values together. In our example data set for coliform bacteria, you would then show the summary statistics as follows:
Mean: 152 #/100ml Geometric Mean: 94 #/100ml Median: 80 #/100ml Range: 40 to 500 #/100ml
A more sophisticated measure of variability is the standard deviation , which is calculated from the square of the deviations of each value from the mean. A basic statistics course will tell you more about how to calculate and use the standard deviation.
The value of summary statistics is that they allow you to compare large data sets without getting lost in all of the numbers. If you have 50 measurements of fecal coliform it may be difficult to tell how water quality differs between two locations just by looking at the raw numbers. By calculating the mean value at each location you end up with only two numbers to compare, and it becomes easier to see if there are significant differences in water quality.
Go to: Introduction
- | - Data Tables
- | - Graphs
- | - Summary Statistics
- | - Exercises
Back to top or back to Water Quality Index or back to Whats New
In a previous unit we developed a sampling plan for Trout River, which flows towards the Pacific Ocean. A sewage treatment plant discharges into the river at River Mile 11. Cow Pasture Creek drains several large dairy farms, and enters Trout River at River Mile 12. Upstream of River Mile 12 the river's watershed is forested and relatively pristine.
1. The following table shows fecal coliform data collected on Trout River during the winter of 1998. For each river mile/sampling location, calculate the geometric mean coliform value. Graph the geometric mean coliform against river mile (coliform on the y-axis, river mile on the x-axis). What does your graph say about the impact of Cow Pasture Creek on Trout River water quality? How about the impact of the sewage treatment plant? Remember in looking at your graph that river miles are miles upstream of the river mouth.
2. The following Biological Oxygen Demand (BOD) data were collected from the same locations. Graph these values against river mile. What does this graph say about the impacts of Cow Pasture Creek and the sewage treatment plant on river BOD?
River Mile BOD (mg/l) 9 3.0 10 3.2 11 3.1 12 1.0 13 0.2 14 0.1
Back to top or back to Water Quality Index or back to Whats New or back to Onalaska Ecology Links
This water quality course material created by Rob Schanz. Send comments to Rob Schanz
This page created and maintained by Chehalis River Council
Send comments or questions to the: Chehalis River Council