Minnesota Lake Geochemistry Lab

Earth Science Extras

by Russ Colson

Introduction to Correlation

Correlation refers to how one characteric or variable changes in sync with another characteristic or variable. For example, changes in Earth's average atmospheric temperature are correlated to changes in the concentration of carbon dioxide in the atmosphere. Concentrations of carbon dioxide and average world temperatures have both increased over the past 150 years. When two variables both increase or decrease together, this is called a positive correlation. When one variable decreases while another increases, this is called a negative correlation. And example of this is that concentrations of Ni (nickel) in Hawaiian basalts generally are higher in samples with lower Yb (Ytterbium), and lower in samples with higher Yb, creating a negative correlation.

The idea of correlation is very important in science. Typically, correlations give us insight into what might be causing changes that we see. For example, it is possible that increasing carbon dioxide in the atmosphere causes temperatures to rise. It might also be possible that increasing temperature can cause carbon dioxide to rise.

Correlation can give us insight into cause, but correlation does not prove that changes in one particular variable are causing changes in the other. For example, I am an early riser. I have noticed that everytime I get up in the morning, the sun will soon rise. This has been a very consistent correlation for my whole life. I might speculate that my rising causes the sun to rise!

Is there any way to know this is not true? Of course! I simply have to do an experiment--I can rise at a different time of day and see if the sun's rising follows me.

It doesn't, thus crushing my nascient hopes that I can control the heavens. It is more likely that this correlation is due to my preference to be awake during the day when I can see and to sleep during the dark of the night.

Sometimes a correlation does not mean that either change is causing the other, but rather that both variables are responding to some third variable that may not be obvious. For example, in the case of the correlation between Ni and Yb cited above, the concentrations of Ni and Yb are both affected by a common causative agent--the crystallization of the mineral olivine in subsurface magma chambers.

Another consideration in understanding correlation is to evaluate whether or not a correlation is statistically significant. That is, sometimes an apparent correlation is simply the result of chance variations. A black cat walks across the road in front of you just before your cousin is killed in a car accident in a different state. There is a correlation, but only between two single events. If you keep records of all the times black cats cross the road, you may be able to find some bad events that happens near that time, but the bad events probably are not well correlated to only those times when black cats cross the road.

Let's consider a more realistic example. You notice that rain often occurs when the wind is in the east. Yet sometimes you get rain when the wind is in other directions. And sometimes when the wind is in the east, you don't get rain. Is there a true correlation between rainfall and easterly winds? Answering this question would require gathering a large number of observations and trying to see if the correlation is noticeable in those large number of observations, giving insight into whether the correlation is significant or not. You might, for example, chart wind direction when its raining and wind direction when it's not raining and see if, over time, there is a clear difference.

In science, we evaluate whether a correlation is statistically significant by observing whether or not we see the same apparent correlation with many observations. The closer the correlations remain similar with repeated observations, the more confident we become that the correlation is significant and 'real'. Once we establish that a correlation is 'real', then we can begin to interpret what that correlation might tell us about causative agents in our natural world.

Correlations between two variables can be charted on a two-dimensional graph. Consider each of the graphs below, with multiple observations (data points) and consider whether you think there is a correlation and whether that correlation is 'real'--that is, statistically signficant.

Consider the data shown in the graph below. Is there a correlation and is that correlation statistically significant?

Value: 2

Consider the data shown in the graph above. Is there a correlation between variables "X" and "Y", and is that correlation statistically significant?

a. There is no basis for determining either a correlation or whether or not it is statistically significant.

b. There is a positive correlation, but there is no basis for determining whether or not it is statistically significant.

c. There is a negative correlation, but there is no basis for determining whether or not it is statistically significant.

d. There is no statistically significant correlation although both varables change in value, apparently to differing degrees.

e. There is a positive correlation and it appears to be statistically signficant

f. There is a negative correlation and it appears to be statistically significant

g. There is no statistically-significant correlation between the variables, although both variables vary in value to comparable degrees.

Consider the data shown in the graph below. Is there a correlation between variables "X" and "Y", and is that correlation statistically significant?