Climate Science Glossary

Term Lookup

Enter a term in the search box to find its definition.

Settings

Use the controls in the far right panel to increase or decrease the number of terms automatically displayed (or to completely turn that feature off).

Term Lookup

Settings


All IPCC definitions taken from Climate Change 2007: The Physical Science Basis. Working Group I Contribution to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change, Annex I, Glossary, pp. 941-954. Cambridge University Press.

Home Arguments Software Resources Comments The Consensus Project Translations About Donate

Twitter Facebook YouTube Pinterest

RSS Posts RSS Comments Email Subscribe


Climate's changed before
It's the sun
It's not bad
There is no consensus
It's cooling
Models are unreliable
Temp record is unreliable
Animals and plants can adapt
It hasn't warmed since 1998
Antarctica is gaining ice
View All Arguments...



Username
Password
Keep me logged in
New? Register here
Forgot your password?

Latest Posts

Archives

Of Averages and Anomalies - Part 1A. A Primer on how to measure surface temperature change

Posted on 29 May 2011 by Glenn Tamblyn

In recent years a number of claims have been made about ‘problems’ with the surface temperature record: that it is faulty, biased, or even ‘being manipulated’. Many of the criticisms often seem to revolve around misunderstandings of how the calculations are done and thus exaggerated ideas of how vulnerable to error the analysis of the record is. In this series I intend to look at how the temperature records are built and why they are actually quite robust. In this first post (Part 1A) I am going to discuss the basic principles of how a reasonable surface temperature record should be assembled, Then in Part 1B I will look at how the major temperature products are built. Finally in Parts 2A and 2B I will then look at a number of the claims of ‘faults’ against this to see if they hold water or are exaggerated based on misconceptions.

How NOT to calculate the Surface Temperature

So, we have records from a whole bunch of meteorological stations from all around the world. They have measurements of daily maximum and minimum temperatures for various parts of the last century and beyond. And we want to know how much the world has warmed or not.

Sounds simple enough. Each day we add up all these station’s daily average temperatures together, divide by the number of stations and, voilá, we have the average temperature for the world that day. Then do that for the next day and the next and…. Now we know the world’s average temperature, each day, for all that measurement period. Then compare the first and last days and we know how much warming has happened – how big the ‘Temperature Anomaly’ is - between the two days. We are calculating the ‘Anomaly of the Averages’. Sounds fairly simple doesn’t it? What could go wrong?

Absolutely everything.

So what is wrong with the method I described above?

1. Every station may not have data for the entire period covered by the record. They have come and gone over the years for all sorts of reasons. Or a station may not have a continuous record. It may not be measured on weekends because there wasn’t the budget for someone to read the station then. Or it couldn’t be reached in the dead of winter.

Imagine we have 5 measuring stations, A to E that have the following temperatures on a Friday: 

A = 15, B = 10, C = 5, D = 20 & E = 25

The average of these is (15+10+5+20+25)/5 = 15

Then on Saturday, the temperature at each station is 2 °C colder because a weather system is passing over. But nobody reads station C because it is high in the mountains and there is no budget for someone to go up there at the weekend. So the average we calculate from the data we have available on Saturday is:

(13+8+18+23)/4 = 15.5.

But if station C had been read as well it would have been:

(13+8+3+18+23)/5 = 13

This is what we should be calculating! So our missing reading has distorted the result.

We can’t just average stations together! If we do, every time a station from a warmer climate drops off the record, our average drops. Every time a station from a colder climate drops off, our average rises. And the reverse for adding stations.  If stations report erratically then our record bounces erratically. We can’t have a consistent temperature record if our station list fluctuates and we are just averaging them. We need another answer!

2. Our temperature measurements aren’t from locations spaced evenly around the world. Much of the world isn’t covered at all – the 70% that is oceans. And even on land our stations are not evenly spread. How many stations are there in the roughly 1000 km between Maine and Washington DC, compared to the number in the roughly 4000 km between Perth & Darwin?

We need to allow for the fact that each station may represent the temperature of very different size regions. Just doing a simple average of all of them will mean that readings from areas with a higher station density will bias the result.  Again, we can’t just average stations together! 

We need to use what is called an Area Weighted Average. Do something like: take each station's value, multiply it by the area it is covering, add all these together, and then divide by the total area. Now the world isn’t colder just because the New England states are having a bad winter!

3. And how good an indicator of its region is each station anyway? A station might be in a wind or rain shadow. It might be on a warm plain or higher in adjacent mountains, or in a deep valley that cools quicker as the Sun sets. It might get a lot more cloud cover at night or be prone to fogs that cause night-time insulation. So don’t we need a lot of stations to sample all these micro-climates to get a good reliable average? How small does each station’s ‘region’ need to be before its readings are a good indicator of that region?  If we are averaging stations together we need a lot of stations!

4. Many sources of bias and errors can exist in the records. Were the samples always taken at the same time of day? If Daylight Savings Time was introduced, was the sampling time adjusted for this? Where log sheets for a station (in the good old days before new fangled electronic recording gizmos) written by someone with bad handwriting – is that a 7 or a 9? Did the measurement technology or their calibrations change? Has the station moved, or changed altitude? Are there local sources of biasing around the station? And do these biases cause one-off changes or a time-varying bias?

We can’t take the reading from a station at face-value. We need to check for problems. And if we find them we need to decide whether we can correct for the problem or need to throw that reading or maybe all that station’s data away. But each reading is a precious resource – we don’t have a time-machine to go back and take another reading. We shouldn’t reject it unless there is no alternative.

So, we have a Big Problem. If we just average the temperatures of stations together, even with the Area Weighting answer to problem #2, this doesn’t solve problems #1, #3 or #4. It seems we need a very large detailed network, which has existed for all of the history of the network, with no variations in stations, measurement instruments etc, and without any measurement problems or biases.

And we just don’t have that. Our station record is what it is. We don’t have that time machine. So do we give up? No!

How do stations' climates change?

Let’s consider a few key questions. If we look at just one location over its entire measurement history, say down on the plains, what will the numbers look like? Seasons come and go; there are colder and warmer years. But what is the longer term average for this location? What is meant by ‘long term’? The World Meteorological Organisation (WMO) defines Climate as the average of Weather over a 30 year period. So if we look at a location averaged over something like a 30 year period and compare the same location averaged over a different 30 year period, the difference between the two is how much the average temperature for that location has changed. And what we find is that they don’t change by very much at all. Short term changes may be huge but the long term average is actually pretty stable.

And if we then look at a nearby location, say up in the mountains, we see the same thing: lots of variation but a fairly stable average with only a small long term change. But their averages are very different from each other. So although a station’s average change over time is quite small, an adjacent station can have a very different average even though its change is small as well. Something like this:

Comparing two adjacent stations

Next question: if each of our two stations averages only change by a small amount, how similar are the changes in their averages? This is not an idle question. It can be investigated, and the answer is: mostly by very little. Nearby locations will tend to have similar variations in their long term averages. If the plains warm long term by 0.5°C, it is likely that the nearby mountains will warm by say 0.4–0.6°C in the long term. Not by 1.5 or -1.5°C.

It is easy to see why this would tend to be the case. Adjacent stations will tend to have the same weather systems passing over them. So their individual weather patterns will tend to change in lockstep. And thus their long term averages will tend to be in lock-step as well. Santiago in Chile is down near sea level while the Andes right at its doorstep are huge mountains. But the same weather systems pass over both. The weather that Adelaide, Australia gets today, Melbourne will tend to get tomorrow.

Station Correlation Scatter Plots (HL87)Final question. If nearby locations have similar variations in their climate, irrespective of each station's local climate, what do we mean by ‘nearby’? This too isn’t an idle question; it can be investigated, and the answer is many 100’s of kilometres at low latitudes, up to 1000 kilometres or more at high latitudes. In Climatology this is the concept of ‘Teleconnection’ – that the climates of different locations are correlated to each other over long distances.

Figure 3, from Hansen & Lebedeff 1987 (apologies for the poor quality, this is an older paper) plots the correlation coefficients versus separation for the annual mean temperature changes between randomly selected pairs of stations with at least 50 common years in their records. Each dot represents one station pair. They are plotted according to latitude zones: 64.2-90N, 44.4-64.2N, 23.6-44.4N, 0-23.6N, 0-23,6S, 23.6-44.4S, 44.4-64.2S.

Notice how the correlation coefficients are highest for stations closer together and less so as they stretch farther apart. These relationships are most clearly defined at mid to high northern latitudes and mid southern latitudes – the regions of the Earth with higher proportions of land to ocean.

This makes intuitive sense since surface air temperatures of the oceanic regions are influenced also by water temperatures, ocean currents etc instead of just air masses passing over them, while land temperatures don’t have this other factor. So land temperatures would be expected to have better correlation since movement of weather systems over them is a stronger factor in their local weather.

This is direct observational evidence of Teleconnection. Not just climatological theory but observation.

A better answer

So what if we do the following? Rather than averaging all our stations together, instead we start out by looking at each station separately. We calculate its long term average over some suitable reference period. Then we recalculate every reading for that station as a difference from that reference period average. We are comparing every reading from that station against its own long term average. Instead of a series of temperatures for a station, we now have a series of ‘Temperature Anomalies’ for that station.  And then we repeat this for each individual station, using the same reference period to produce the long term average for each separate station.

Then, and only then, do we start calculating the Area Weighted Average of these Anomalies. We are now calculating the ‘Area Average of the Anomalies’ rather than the ‘Anomaly of the Area Averages’ – now there’s a mouthful. Think about this. We are averaging the changes, not averaging the absolute temperatures.

Does this give us a better result? In our imaginary ideal world where we have lots of stations, always reporting all the time, no missing readings, etc., then these two methods will give the same result.

The difference arises when we work in an imperfect world. Here is an example (for simplicity I am only doing simple averages here rather than area weighted averages):

Let's look at stations A to E. Let's say their individual long term reference average temperatures are:

A = 15, B = 10, C = 5, D = 20 & E = 25

Then for one day's data their individual readings are:

A = 15.8, B = 10.4, C = 5.7, D = 20.4 & E = 25.3

Using the simple Anomaly of Averages method from earlier we have:

(15.8+10.4+5.7+20.9+25.3)/5 - (15+10+5+20+25)/5 = 0.52

While using our Average of Anomalies method we get:

((15.8-15) + (10.4-10) + (5.7-5) + (20.4-20) + (25.3-25))/5 = 0.52

Exactly the same!

However, if we remove station C as in our earlier example, things look very different. Anomaly of Averages gives us:

(15.8+10.4+20.4+25.3)/4 - (15+10+5+20+25)/5 = 2.975 !!

While Average of Anomalies gives us:

((15.8-15) + (10.4-10) + (20.4-20) + (25.3-25))/4 = 0.475

Obviously both values don’t match what the correct value would be if station C were included, but the second method is much closer to the correct value. Bearing in mind that Teleconnection means that adjacent stations will have similar changes in anomaly anyway, this ‘Average of Anomalies’ method is much less sensitive to variations in station availability.

Now let’s consider how this approach could be used when looking at station histories over long periods of time. Consider 3 stations in ‘adjacent’ locations. A has readings from 1900 to 1960. B has reading from 1930 to 2000 and C has readings from 1970 to today. A overlaps with B, B overlaps with C. But C doesn’t overlap with A. If our reference period is say 1930 – 1960, we can use the readings from A & B. But C doesn’t have any readings from our reference period. So how can we splice together A, B, & C to give a continuous record for this location?

Doesn’t this mean we can’t use C since we can’t reference it to out 1930-1960 baseline? And if we use a more recent reference period we lose A. Do we have to ignore C’s readings entirely? Surely that means that as the years roll by and the old stations disappear, eventually we will have no continuity to our record at all? That’s not good enough.

However there is a way we can ‘splice’ them together.

A & B have a common period from 1930-1960. And B & C have a common period from 1970-2000. So if we take the average of B from 1930 to 1960 and compare it to the same average from A for the same period we know how much their averages differ. Similarly we can compare the average of B from 1930-1960 to the average for B from 1970-2000 to see how much B has changed over the intervening period. Then we can compare B vs C over the 1970-2000 period to relate them together. Knowing these three differences, we can build a chain of relationships that links C1970-2000 to B1970-2000 to B1930-1960 to A1930-1960 

Something like this:

'Chaining' station histories together

 

If we have this sort of overlap we can ‘stitch together’ a time series stretching beyond more than one station’s data. We have the means to carry forward our data series beyond the life (and death) of any one station, as long as there is enough time overlap between them. But we can only do this if we are using our Average of Anomalies method. The Anomaly of Averages method doesn’t allow us to do this.

So where has this got us in looking at our problems? The Average of Anomalies approach directly addresses problem #1. Area Weighted Averaging addresses problem #2. Teleconnection and comparing a station to itself helps us hugely with problem #3 – if fog provides local insulation, it probably always had, so any changes are less related to the local conditions and more to underlying climate changes. Local station bias issues still need to be investigated but if they don’t change over time, then they don’t introduce ongoing problems. For example, if a station is too close to an artificial heat source, then this biases that station's temperature. But if this heat source has been a constant bias over the life of the station, then it cancels out when calculate the anomaly for the station. So this method also helps us with (although doesn’t completely solve) problem #4. In contrast, using the Anomaly of Averages method, local station biases and erratic station availability will compound each other making things worse.

So this looks like a better method.

Which is why all the surface temperature analyses use it!

The Average of Anomalies approach is used precisely because it avoids many of the problems and pitfalls.

In Part 1B I will look at how the main temperature records actually compile their trends.

0 0

Bookmark and Share Printable Version  |  Link to this page | Repost this Article Repost This

Comments

Comments 1 to 16:

  1. Thank you for this. The attempt to educate the local rightwing radio talk show host about how measurements are adjusted has been an upward battle. This will help me clarify my own thoughts about the process which should help me when speaking to local denialists.
    0 0
  2. Gary

    Thanks.

    This is the first in a series of 4 posts on this over the next week or so.
    0 0
  3. Glenn, will you be addressing in a later part why station numbers might change?

    For example, in the GHCN the Toronto Canada temperature station has one of the longest records of both max & min temperatures in the world so it’s a good one for certain analyses. But it had Station ID CA006158350 from 1840 to mid 2003 and then temperature reporting ceased (though precipitation records continue until now). There’s a gap of a few months and then from 2004 to present Toronto temperatures are reported as Station ID CA006158355 instead. The latitude, longitude & altitude are identical for both Station IDs. Major instrument changes maybe? This complicates getting the full data set.
    0 0
  4. Re OP you wrote:-

    "Obviously both values don’t match what the correct value would be if station C were included, but the second method is much closer to the correct value."

    I am so glad you are able to say which is the 'correct' value for temperature; it seems fully in accordance with the principles of scientific climatology!
    0 0
    Response:

    [DB] Glenn was using a specific simple example to illustrate the principles underlying the measurements of the temperature records.  If you were thanking him for the clarity of the illustration and the sense it made, then you're welcome.

    If, on the other hand, your had other, ideological, meanings for making your comment, then those ideological meanings and intimations have no place in the science-based dialogues here.

  5. Damorbel, I find it interesting you can't see what is meant by correct, which should be obvious if you read the post.
    0 0
  6. SoundOff

    I am not planning on looking at the specific issues of individual stations. Rather the purpose of these series is the general principles of how the temperature record is handled.

    That said, variations in stations, station ID's etc are unfortunately just what the teams who compile the records have to deal with. Unfortunately they don't control the information sources they are dependent on. The stations are controlled by various national meteorological services around the world. And the primary function of the stations is meteorological. The climatological function piggy-backs on top of that. So the national agencies do all sorts of things for all sorts of reasons, good & bad, and the temperature record guys just have to wear it and do the best they can with the data they get.

    Hence the importance of the Average of Anomalies approach.
    0 0
  7. Very informative - Thanks.
    Shame about the typical (unconnected) so-called skeptical misrepresentation above !
    0 0
  8. Thank you John, this helps a lot in clearing up the Hansen-mechanisms ...

    Did you know about the activities of Gistemp (http://clearclimatecode.org/gistemp/)?
    0 0
  9. This is a great series; thanks for doing it. I do have a question about the actual temperature measurements. Can you expound a bit on what is actually measured and recorded at the stations? I have an image of a person going out and looking at a thermometer every hour and writing down the temperature that he observes. What can we say about the accuracy of the instrument and the precision of his observation? Does he then report every day the hourly recordings? Is his station location (lat/long/alt) also captured somehow? Is a daily average computed from the hourly data points? Thanks.
    0 0
    Response:

    [DB] This may help with some of your questions.

  10. "This is direct observational evidence of Teleconnection. Not just climatological theory but observation."

    The theory of teleconnection comes from weather patterns which connect regions mainly by the jet stream. The weather those regions then becomes correlated. For example a strong jet in the western U.S. leading to a strong high in the Atlantic or any other similar combinations. There is no other theory of long distance connection that I am aware of.

    Since weather teleconnections are large scale weather patterns they are not part of local station temp. correlation which are due simply to local air exchange. There are many different teleconnection patterns worldwide with various amounts of persistence (esp. seasonality) and influence. Here's an example of a lake in Siberia influenced by ENSO: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3040185/
    0 0
  11. Eric.

    Yes Teleconnection occurs at the wider weather system scale. But it also occurs on a smaller, sub-1000km scale as well due to the basic fact that similar weather patterns pass over adjacent regions. And this is the point of the scatter diagrams from Hansen & Lebedeff 1987 above. Observationally they show correlations between random pairs of stations out to 1000 km ranges.
    0 0
  12. Glenn, you have given an example, where Average of Anomalies produces a better result, than Anomaly of Averages. However, if you take other numbers, you might get quite the opposite: a better result with Anomaly of Averages.

    E.g. take this, please:

    A = 15, B = 10, C = 5, D = 20 & E = 25

    Then for one day's data their individual readings are:

    A = 15.8, B = 10.4, C = 15, D = 20.4 & E = 25.3

    Then you will get 2.4 instead of 0.52, the other numbers will remain the same and, if my calculation is correct, this time Anomaly of Averages will get your a better result.

    The whole thing depends on what the station C would have recorded on that missing day, if that had been possible. You actually do not know that, because there is no record for that day. Hence you do not know, which method is better, this is the problem.
    0 0
  13. First, thanks for the post. I've found it immensely helpful. I read it when it was first posted and have now found my way back to it to refresh myself.

    I have one problem, however. I understand the advantages of the 'average of anomalies' method for problems with variations in average temperature between locations, but I'm missing how it is less sensitive to dropped stations than is finding the anomaly of the averages.

    To test this, I generate an array of 500 average temperatures that vary between 15 and 30 degC, then generate an array of 500 temperatures that are allowed to vary randomly from each station average (within certain bounds). I then create arrays with certain "stations" removed (I've played around with the number, but I started out dropping 20 stations), and then compare all the values: real, average of anomalies, and anomaly of averages. It turns out that neither method is consistently closer to the real value and, in fact, upon 500 iterations of the code, it seems each is the better estimate about 50% of the time.

    Is this actually the case, or is there some physical explanation for the advantage of finding the 'average of anomalies' when accounting for dropped stations that wouldn't be reflected in my randomly-generated values?
    0 0
  14. heb0 @13, actual temperature series from weather stations that are close together geographically are highly correlated (see fig 3 from Hansen and Lebedeff in the main article). They may, however, differ greatly in the absolute value of their measurements. Thus, for example, if you have two weather stations close together, but one on the top of a mountain while the other is near the base, their temperature anomalies from day to day are likely to be very similar, even though the difference in altitude will cause one to consistently record temperatures significantly lower than the other. Indeed, the weather station on top of the mountain may consistently record temperatures much lower relative to the one at the base, than the day to day differences in temperature at both stations.

    The strong regional correlation, and hence near approximation of the anomalies of nearby stations, is, I believe what makes the average of the anomalies superior to the anomaly of the averages. It is certainly what is violated in Greg Houses example @12.

    It would be interesting if you were to check this by imposing strong regional correlation on your model, instead of allowing them, as you currently do, to fluctuate at random with respect to each other.
    0 0
  15. How are daily averages for the raw data calculated? Is it as simple as min+max/2?
    0 0
  16. Tom Curtis @14, thanks, the large anomaly in Greg House's example (as well as the small number temperatures) was what prompted me to try this with a larger number of stations and a more realistic anomaly. I feel like it makes sense (in a sort of intuitive manner) that the average of anomalies method would provide a more realistic approximation for stations with regional correlations, but I took the wording of the article to mean that the reason for this was something other than a physical cause.

    Now that I read that section again, it does say "Bearing in mind that Teleconnection means that adjacent stations will have similar changes in anomaly anyway, this ‘Average of Anomalies’ method is much less sensitive to variations in station availability." (emphasis mine), so I suppose it was a failure in reading on my part. Thanks for helping me sort that out.
    0 0

You need to be logged in to post a comment. Login via the left margin or if you're new, register here.



The Consensus Project Website

TEXTBOOK

THE ESCALATOR

(free to republish)

THE DEBUNKING HANDBOOK

BOOK NOW AVAILABLE

The Scientific Guide to
Global Warming Skepticism

Smartphone Apps

iPhone
Android
Nokia

© Copyright 2014 John Cook
Home | Links | Translations | About Us | Contact Us