Climate Science Glossary

Term Lookup

Enter a term in the search box to find its definition.

Settings

Use the controls in the far right panel to increase or decrease the number of terms automatically displayed (or to completely turn that feature off).

Term Lookup

Term:

Settings

Beginner Intermediate Advanced No Definitions Definition Life:

All IPCC definitions taken from Climate Change 2007: The Physical Science Basis. Working Group I Contribution to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change, Annex I, Glossary, pp. 941-954. Cambridge University Press.

Home

Arguments

Software

Resources Comments

The Consensus Project

Translations

About Support

	Climate's changed before
	It's the sun
	It's not bad
	There is no consensus
	It's cooling
	Models are unreliable
	Temp record is unreliable
	Animals and plants can adapt
	It hasn't warmed since 1998
	Antarctica is gaining ice
	View All Arguments...

Latest Posts

Watts' New Paper - Analysis and Critique

Posted on 2 August 2012 by dana1981, Kevin C

"An area and distance weighted analysis of the impacts of station exposure on the U.S. Historical Climatology Network temperatures and temperature trends"
Paper authors: A. Watts, E. Jones, S. McIntyre and E. R. Christy

In an unpublished paper, Watts et al. raise new questions about the adjustments applied to the U.S. Historical Climatology Network (USHCN) station data (which also form part of the GHCN global dataset). Ultimately the paper concludes "that reported 1979-2008 U.S. temperature trends are spuriously doubled." However, this conclusion is not supported by the analysis in the paper itself. Here we offer preliminary constructive criticism, noting some issues we have identified with the paper in its current form, which we suggest the authors address prior to submittal to a journal. As it currently stands, the issues we discuss below appear to entirely compromise the conclusions of the paper.

The Underlying Problem

In reaching the conclusion that the adjustments applied to the USHCN data spuriously double the actual trend, the authors rely on the difference between the NCDC homogenised data (adjusted to remove non-climate influences, discussed in detail below) and the raw data as calculated by Watts et al. The conclusion therefore relies on an assumption that the NCDC adjustments are not physically warranted. They do not demonstrate this in the paper. They also do not demonstrate that their own ‘raw’ trends are homogeneous.

Ultimately Watts et al. fail to account for changing time of observations, that instruments change, or that weather stations are sometimes relocated, causing them to wrongly conclude that uncorrected data are much better than data that takes all this into account.

Changing Time of Observations

The purpose of the paper is to determine whether artificial heat sources have biased the USHCN data. However, accounting for urban heat sources is not the only adjustment which must be made to the raw temperature data. Accounting for the time of observations (TOB), for example, is a major adjustment which must be made to the raw data (i.e. see Schaal et al. 1977 and Karl et al. 1986).

For example, if observations are taken and maximum-minimum thermometers reset in the early morning, near the time of minimum temperature, a particularly cold night may be double-counted, once for the preceding day and once for the current day. Conversely, with afternoon observations, particularly hot days will be counted twice for the same reason. Hence, maximum and minimum temperatures measured for a day ending in the afternoon tend to be warmer on average than those measured for a day ending in the early morning, with the size of the difference varying from place to place.

Unlike most countries, the United States does not have a standard observation time for most of its observing network. There has been a systematic tendency over time for American stations to shift from evening to morning observations, resulting in an artificial cooling of temperature data at the stations affected, as noted by Karl et al. 1986. In a lecture, Karl noted:

"There is practically no time of observation bias in urban-based stations which have taken their measurements punctually always at the same time, while in the rural stations the times of observation have changed. The change has usually happened from the afternoon to the morning. This causes a cooling bias in the data of the rural stations. Therefore one must correct for the time of observation bias before one tries to determine the effect of the urban heat island"

Note in Watts Figure 16, by far the largest adjustments (in the warming direction) are for rural stations, which is to be expected if TOB is introducing a cool bias at those stations, as Karl discusses.

Instruments Change, Stations Move

As Zeke Hausfather has also discussed, the biggest network-wide inhomogeneity in the US record is due to the systematic shift from manually-read liquid-in-glass thermometers placed in a louvred screen (referred to in the U.S. as a Cotton Region Shelter and elsewhere as a Stevenson screen) to automated probes (MMTS) in cylindrical plastic shelters across large parts of the network in the mid- to late-1980s. This widespread equipment change caused an artificial cooling in the record due to differences in the behaviour of the sensors and the sheltering of the instruments. This is discussed in a number of papers, for example Menne et al. 2009 and 2010, and like TOB does not appear to be accounted for by Watts et al.

Additionally, the Watts paper does not show how or whether the raw data were adjusted to account for issues such as sites closing or moving from one location to another. The movement of a site to a location with a slightly different mean climatology will also result in spurious changes to the data. The Watts paper provides no details as to how or whether this was accounted for, or how the raw data were anomalised.

homogenized

Quite simply, the data are homogenised for a reason. Watts et al. are making the case that the raw data are a ‘ground truth’ against which the homogenisations should be judged. Not only is this unsupported in the literature, the results in this paper do nothing to demonstrate that. It is simply wrong to assume that all the trends in raw data are correct, or that differences between raw and adjusted data are solely due to urban heat influences. However, these wrong assumptions are the basis of the Watts conclusion regarding the 'spurious doubling' of the warming trend.

The Amplification Factor

The conclusion regarding the lower surface temperature warming trend is also at odds with the satellite temperature data. Over the continental USA (CONUS), satellites show a 0.24°C per decade warming trend over the timeframe in question. According to Klotzbach et al. (2010), which the Watts paper references, there should be an amplification factor of ~1.1 between surface and lower troposphere temperatures over land (greater atmospheric warming having to do with water vapor amplification). Thus if the satellite measurements were correct, we would expect to see a surface temperature trend of close to 0.22°C per decade for the CONUS; instead, the Watts paper claims the trend is much lower at 0.155°C per decade.

This suggests that either the satellites are biased high, which is rather implausible (i.e. see Mears et al. 2011 which suggests they are biased low), or the Watts results are biased low. The Watts paper tries to explain the discrepancy by claiming that the amplification factor over land ranges from 1.1 to 1.4 in various climate models, but does not provide a source to support this claim, which does not appear to be correct (this may be a reasonable range for global amplification factors, but not for land-only).

A discussion between Gavin Schmidt and Steve McIntyre on this subject led to the conclusion that the land-only amplification factor falls in the range of 0.78 to 1.23 (average over all global land areas), with a model mean close to 1 (using a script developed by McIntyre on 24 different models). Note that McIntyre is a co-author of Watts et al., but has only helped with the statistical analysis and did not comment on the whole paper before Watts made it public. We suggest that he share his land-only amplification factor discussion with his co-authors.

Another important consideration is that the amplification factor also varies by latitude. For example Vinnikov et al. (2005) found that at the CONUS latitude (approximately 40°, on average), models predict an amplification factor of approximately 1 (see their Figure 9). Note that this is the amplification factor over both land and ocean at this latitude. Since the amplification factor over land is less than that over the oceans, this suggests that the amplification factor over the CONUS land may even be less than 1.

Combining the latitude and land-only status of the CONUS, the amplification factor may very well be less than 1, but a range of values significantly lower than the 1.1 to 1.4 range used in the Watts paper would be reasonable.

Note also that as discussed above, the satellite data (like all data) are imperfect and are not a 'gold standard'. They are a useful tool for comparison in this study, but the satellite trends should not be assumed to be perfect measurements.

More Apples and Oranges

Watts et al. compare the best sited Class 1 and 2 stations (using their categorisation) to the total homogenised network. Strictly speaking, this is comparing apples and oranges; Watts' data are an inhomogeneous sub-sample of the network compared to a homogeneous total network. In practice, this methodological error doesn’t make much difference, since the homogenisation applied by NCDC produces uniform trends in all of the various classes.

However the use of a smaller network has disadvantages. When taking a gridded average of a sparse network, the impact of inhomogeneities is likely amplified and the overall uncertainty of variability and change in the timeseries increases.

The Class 1 & 2 sites used by Watts in this context represent just 20% of all the total CONUS network. The comparison of the raw Class 1 & 2 sites with the same network of homogenised data in Watts' own Figure 18 indicates likely inhomogeneities in that raw data. Watts et al. argue that the raw and adjusted Class 1 & 2 trends in Figures 18a and 18b are so different because "well sited stations are adjusted upward to match the already-adjusted poor stations," but this is simply not how the homogenization process is done. In reality the difference is likely due to the biases we have discussed above, indicating that the Watts raw data is inhomogeneous and influenced by these non-climate effects.

Determining whether or not the Class 1 & 2 raw data is homogeneous is therefore a key requirement of a revised manuscript. And since the Class 1 & 2 sites have been selected for good exposure, Watts et al. would need to show the cause of any statistical discontinuities that they find. This work has already been done by NCDC in the Menne at al. papers, which show influences from the range of factors discussed above, and not just urban influence.

The Watts final conclusion that adjusted temperature trends are 'spuriously doubled' (0.155°C vs. 0.309°C per decade raw vs. adjusted data) relies on a simple assumption — that the raw data must be correct and the homogenised data incorrect. There is no a priori basis for this assumption and it is unsupported by the literature.

Adjustments Make Little Difference Globally

While Watts et al. identify possible issues concerning the adjustments applied to station temperature records, it wisely makes no attempt to assess the global impact of the adjustments, which are beyond the scope of the work. Nonetheless, this is a significant question from a public interest perspective.

In order to answer this question, we willl try and estimate the maximum possible impact of station adjustments on the instrumental temperature record. To minimise the warming signal, we will use the simplest method for calculating a global temperature average - the CRU method, which is known to yield poor coverage at high latitudes and hence underestimate recent warming. Further, we'll assume that entirety of the data adjustments are wrong (ignoring proven bias corrections such as TOB). If we calculate land temperature averages from both the raw and adjusted data, we can see how much difference the adjustments make. The result is shown in the figure below (red and green lines).

Figure 1: Impact of GHCN adjustments.
Just to be absolutely sure, we can do a further calculation using just rural unadjusted data (using the GHCN station classifications) - the blue line.

While the adjustments do make a difference, the difference is small compared to the overall warming signal since 1979. Using a more sophisticated temperature calculation reduces this difference. Furthermore, we are only looking at land temperatures - 30% of the planet. Including the remaining 70% of the planet (the oceans which, if not precisely rural, are certainly not urban!) dramatically reduces the remaining impact of the GHCN adjustments. Indeed, comparison of warming trends over the oceans and over large inland lakes (including the North American Great Lakes) shows a high degree of consistency with terrestrial trends. Warming over the oceans and lakes is presumably not due to urbanisation.

The entire CRU-type calculation requires 65 lines of python code (by comparison, a modern airliner requires upwards of a million lines of code to fly). The code is available below.
Show code

import sys, math

# constants
years  = range(1880,2011)
months = range(1,13)
grid   = 5  # grid sampling in degrees
# optional command line arguments: inventory-file data-file population-code
fileinv = "qcu.inv" if len(sys.argv) <= 1 else sys.argv[1]
filedat = "qcu.dat" if len(sys.argv) <= 2 else sys.argv[2]
popcls  = "RSU"     if len(sys.argv) <= 3 else sys.argv[3]

# station data, indexed by station id
station_locn = {}  # latitude, longitude stored by station id
station_data = {}  # temperature data stored by station id, year, month

# read and store the station data
for line in open(fileinv):
  id = line[0:11]
  if line[73] in popcls:  # check rural/suburban/urban
    station_locn[id] = ( float(line[12:20]), float(line[21:30]) )  # location
    station_data[id] = {}  # initialise temperature data

# read and store the temperature data
for line in open(filedat):
  id = line[0:11]                                 # get station id
  if id in station_data:
    year = int( line[11:15] )                     # get year
    station_data[id][year] = {}
    for month in months:
      temp = int( line[11+8*month:16+8*month] )
      flag = line[16+8*month:19+8*month]
      if temp != -9999 and flag[0:2] == "  ":     # check for error flags
        station_data[id][year][month] = 0.01*temp  # if ok, store data

print >> sys.stderr, len(station_locn), len(station_data)

# Now calculate a baseline for each station
# This will be used to convert the temperature readings into anomalies.
# Without this step, missing months will add warming or cooling noise
# depending on the season and the introduction of new stations will
# add warming or cooling noise depending on whether the new stations
# are in hotter of colder regions.
baselines = {}
for id in station_data:
  # calculate mean temp by month over the years 1951-1980
  mtemps = {}
  for month in months:
    mtemps[month] = []
  for year in station_data[id]:
    if year >= 1951 and year <= 1980:
      for month in station_data[id][year]:
        mtemps[month].append( station_data[id][year][month] )
  # calculate averages,
  # but only include data if we have a robust baseline for this id/month
  baselines[id] = {}
  for month in months:
    if len(mtemps[month]) >= 15:
      baselines[id][month] = sum( mtemps[month] ) / len( mtemps[month] )

# Calculate monthly averages over all grid cells and stations
monthly = {}
for year in years:
  monthly[year] = {}
  for month in months:
    # make a map of grid cells with a list of anomalies in each cell
    map = [[ [] for i in range(360/grid)] for j in range(180/grid)]
    # fill in all the anomalies for this month in the map
    for id in station_data:
      # store a list of all the station anomalies in every cell in the map
      if year in station_data[id]:
        if month in station_data[id][year]:
          # only include data if we have a baseline
          if month in baselines[id]:
            # anomaly = temperature - baseline
            anom = station_data[id][year][month] - baselines[id][month]
            # add the anomaly to the list for the given map cell
            lati = int((station_locn[id][0]+90.0)/grid)
            lngi = int((station_locn[id][1]+180.0)/grid)
            map[lati][lngi].append(anom)
    # average over the cells
    stemp = 0.0
    swght = 0.0
    for lati in range(len(map)):
      w = math.cos(((lati+0.5)*grid-90.0)*3.1416/180.0)   # weight by cell area
      for lngi in range(len(map[lati])):
        if len( map[lati][lngi] ) > 0:
          t = sum( map[lati][lngi] ) / len( map[lati][lngi] )
          stemp += w*t
          swght += w
    stemp /= max( swght, 1.0e-20 )
    monthly[year][month] = stemp

# Print
for year in years:
  for month in months:
    print year+month/12.0-1/24.0, monthly[year][month]

Many others have done this comparison, including Caerbannog and Zeke Hausfather. Fawcett et al (2012) provide a comprehensive assessment of the sensitivity of Australian temperature trends to network and homogenisation choices, including comparison with an unhomogenised gridded analysis of Australian temperatures. Furthermore, the BEST project has obtained a similar result with a different, independent implementation of the station homogenization algorithm. It would be surprising if an independent approach were to yield a similar but incorrect result by chance.

Do the overall adjustments make a difference? Yes. Are they justified? Yes, according to the body of scientific literature. Watts raises a scientific issue, but one which only affects part of the adjustment. Does it matter? Not very much. Even if the entirety of the adjustments were wrong, we still see unprecedented warming over the past 40 years. And there is certainly not a factor of two difference between global warming trends in the raw and adjusted data.

Constructive Criticisms

It's worth noting that Peter Thorne of NCDC was interviewed by Andrew Revkin, and discussed three papers which NCDC has recently published (see here, here, here). In the first of those linked papers, they actually concluded that there likely remains a residual cool bias in the adjusted data, and that the adjusted data are consistent with reanalysis data (detailed in the third linked paper). Watts et al. do not address these papers. Ironically Watts responded to that interview by saying that Thorne needs to get out into the real world, but it is Watts et al. who have not accounted for real world effects like TOB, station movement, instrument changes, etc.

In its current form, the Watts paper contains little in the way of useful analysis. There are too many potential sources of bias which are not accounted for, too many apples-to-oranges comparisons, and they cannot draw any conclusions about urban heat influences until their data are homogenized and other non-climate influences are removed.

The primary conclusion of the paper, aside from not being supported by the analysis, is simply implausible. The CONUS surface warming trend proposed by the Watts paper appears to be inconsistent with the satellite observations, and overall global trends in raw data do not differ dramatically from those in the adjusted data. Comparing raw to adjusted data globally shows a rather small difference in long-term trends; far smaller than a factor of two.

The flaws we have identified entirely compromise the conclusions of the paper. Ultimately Watts et al. assume that all adjustments are 'spurious' unless due to urban heat influences, when in fact most of their identified discrepancy likely boils down to important adjustments for instrumental changes, TOB, and other influences they have not accounted for in their analysis. Watts et al. attempt to justify their assumption by asserting "well sited stations are adjusted upward to match the already-adjusted poor stations," but this is simply not how the homogenization process is done.

Fortunately McIntyre has acknowledged that TOB must be considered in their analysis, as has Watts, which is a good start, but they must also account for the other biases noted above in order to draw any valid conclusions about urban heat influences.

In conclusion, Watts et al. of course deserve the right to try to make their case in the peer-reviewed literature, however implausible that case appears to be. Therefore, we hope they will consider addressing the important concerns detailed above before they submit the paper to a journal. Otherwise we suspect the paper will not fare well in the peer review process. With said caveats carefully addressed and the conclusions amended if and where necessary, the paper has the potential to be a useful contribution to the climate science literature.

1 0

Printable Version | Link to this page

Comments

Prev 1 2

Comments 51 to 65 out of 65:

Trent1492 at 05:50 AM on 5 August, 2012
@Bernard J, What does calling this paper "Watts 12" mean?
0 0
ligne at 06:36 AM on 5 August, 2012
Trent1492: one way to refer to a paper is by the author(s), and the year it was published. see for example the reference to "Schaal et al. 1977" in the article. in this case, Bernard is implying that they're somewhat jumping the gun when they assuming that 5 months will be sufficient to get this particular mess into a publishable state.
0 0
Rob Honeycutt at 07:10 AM on 5 August, 2012
ligne... I think it speaks a lot to Watts naiveté that ~~you~~ one would even refer to a just submitted paper with the year of publication attached to it. I think the normal nomenclature would be to refer to it as "Watts et al, submitted."
0 0
Moderator Response: [RH] Didn't mean to infer that ligne was saying that.
nealjking at 07:42 AM on 5 August, 2012
About the UHI issue: Parker (2004) studied the question of the urban heat island effect and its impact on the temperature trends, by segregating the measurements into two groups: windy-day measurements and calm-day measurements. His conclusions: - On calm days, temperatures run higher than on windy days. The implication is that there is indeed a UHI effect, but a good wind blows it away. - Now if you study the trends for the windy days over the decades, and the trends for the calm days, they are essentially the same: the increase over the study period is the same. So the UHI doesn't seem to have been growing in coverage. For me, this seems to be pretty direct evidence that even though there really is a UHI effect, it is not contributing towards misleading us about temperature trends. Maybe cities are just not expanding fast enough.
0 0
Philippe Chantreau at 07:45 AM on 5 August, 2012
Neal, the Parker paper was discussed on SkS literally years ago. But thanks for reminding us of its existence, as it is indeed quite relevant.
0 0
dana1981 at 09:14 AM on 5 August, 2012
Rob @53 - I agree, except that Watts et al. isn't submitted either! And frankly it's in such bad shape that if they were to actually put in the time to do the analysis right, I doubt if it would be submitted in 2012 either. We have discussed the Parker (2006) paper mentioned by neal @54 here.
0 0
dana1981 at 09:16 AM on 5 August, 2012
Note that they could accurately call the paper "Watts et al. in preparation".
0 0
Rob Honeycutt at 10:11 AM on 5 August, 2012
Perhaps, "Watts et al, in shambles."
0 0
Bernard J. at 17:34 PM on 5 August, 2012
Given the equivocation from some of the apparent co-authors, even the "et al" might be optimistic...
0 0
caerbannog at 02:41 AM on 6 August, 2012
Just a couple of quick notes (and something interesting to try) re Kevin C's python script. It turns out that if you modify the script slightly to generate average annual (instead of monthly) anomalies and compare your results with the official NASA/GISS meteorological stations index, you will see that there's a pretty decent match, albeit with the python script output showing a bit more warming than the NASA/GISS results do. This is most likely due to the fact that the GHCN network samples that SH more sparsely than the NH, and as a result, you have more empty 5degx5deg grid cells in the SH than you do in the NH. Since the script does not interpolate to empty grid-cells, this causes the NH (which has warmed more than the SH) to be overweighted in the global averages, with the result being more apparent warming than the NASA/GISS results show. You can get around this by adding grid-cell interpolation code to the script (way too much work) or simply by increasing the grid-cell sizes from 5x5 to 20x20 degrees (a lazy boy's approach that works surprisingly well). If you plot the output of the script with the above mods (annual instead of monthly anomalies, 20x20 grid-cells) along with NASA/GISS results, you will see that the python script results match the NASA/GISS results ***amazingly closely***. A quick word of caution: Memory consumption with 20x20 grid-cells goes way up -- the run brought my old laptop w/1G memory to its knees. Make sure that you have at least 2G of memory (and no other memory-hogging apps running) before you try this.
0 0
Kevin C at 05:47 AM on 6 August, 2012
Caerbannog: I presume the comparison you are making is against the GISS dTs (land station) 1200km dataset? In which case that seems reasonable (except the comment on memory - that should go down with a coarser grid, and should be pretty minimal anyway. Something is fishy.) For my latest GISTEMP-like land/ocean calc, I have a more advanced version which (optionally) uses a constant area grid (a few lines of code) and a 1200km area of influence with conical weight function (rather more, especially if you want it to run in a reasonable time. Python is slow, unless you can do everything in SciPy classes). I do the calculation twice, once using land stations, and once using pseudo-stations generated from the gridded HadSST2 data (Nick Stokes' trick). Then dump the maps and use land/ocean masked average of the grids (a separate program at the moment). The main aim of this code was to be able to turn on and off different options to go stepwise from CRU to GISTEMP and find out where the differences came from (which turned out to be less interesting than you would expect. It's just coverage). It would have been nice to do the calc at #1 using Clear Climate Code, but I've never got it working on GHCN3 data, and I'm not sure they are maintaining it. Your original SkS post and comments gave me the leg up I needed to get started on this. Thanks again for that.
0 0
caerbannog at 12:41 PM on 6 August, 2012
Caerbannog: I presume the comparison you are making is against the GISS dTs (land station) 1200km dataset? Yes, that's correct. The results were so similar to the NASA land station results that it was quite striking. In which case that seems reasonable (except the comment on memory - that should go down with a coarser grid, and should be pretty minimal anyway. Something is fishy. I'm gonna take back the comment about memory usage. The first time I ran the script with the 20x20 deg grid size, my laptop went into a 10-minute "swap-fest", where it was completely unusable. Assumed that was due to memory consumption (was reluctant to run the script again to verify that). After seeing your reply here, I decided to reboot my system, and run the script again. This time, it ran w/o any problems.
0 0
Zeke Hausfather at 09:38 AM on 11 August, 2012
Be very careful when using GISTemp land products; they are not always what you expect them to be! (see this for more details: http://rankexploits.com/musings/2010/the-great-gistemp-mystery/) Gavin provided me the correct GISTemp values to use for the land record a few months back: http://www.columbia.edu/~mhs119/Temperature/T_moreFigs/Tanom_land_monthly.txt You can also use NCDC's record for comparison, since its a simple 5x5 lat/lon gridding method with a land mask and 30-year common anomaly period: ftp://ftp.ncdc.noaa.gov/pub/data/anomalies/monthly.land.90S.90N.df_1901-2000mean.dat
0 0
Kevin C at 06:29 AM on 13 August, 2012
Zeke: The way I deal with this is to always start from the gridded data and calculate my own temperatures, applying a high resolution mask if I want land or ocean only temperatures. That way comparisons are always like-with-like. Well, as long as you have separate land and ocean grids; NCDC is a problem (and obviously UAH). Doing this you can pick out some interesting features. For example there's a step change between the Hadley and GISTEMP co-located land data around 1933 (BEST mention something like this in their paper). I never got to the bottom of that.
0 0
KR at 04:00 AM on 31 August, 2013

Continuing from an unfinished page here:
As I pointed out on a WUWT thread that claimed adjustments were distorting the data,
It could be argued that it’s better to look at raw temperature data than data with these various adjustments for known biases. It could also be argued that it’s worth not cleaning the dust and oil off the lenses of your telescope when looking at the stars. I consider these statements roughly equivalent, and (IMO) would have to disagree.
Various replies (arguments from authority, etc) followed, mostly ignoring the rather copious literature on time of observation bias.

0 0
Evan Jones at 09:44 AM on 26 June, 2015

You'll be seeing the real thing soon enough. It has been a long hard slog with many corrections.
Here we offer preliminary constructive criticism, noting some issues we have identified with the paper in its current form, which we suggest the authors address prior to submittal to a journal.
We appreiate the criticisms here. Some of them are quite valid. We will be providing answers on the homogenization issue. TOBS, station moves, and MMTS conversion are now fully addressed. Data is fully anomalized (though we show both sets).

We will, of course, provide documentation showing all comparison. For example, you can compare Class 1\2 stations raw (+MMTS adjustment) with fully adjusted Class 1\2 data (or with any other set or subset, either raw+MMTS or fully adjusted.

If anyone has any questions at this point, i will be happy to answer them.

0 0
Evan Jones at 20:55 PM on 26 June, 2015

I'd posit that someone here, who may have some conection to Anthony, invite him to *politely* discuss Dana and Kevin's analysis, in the SkS spirit.
That'll be me. Fire away.

0 0
Rob Honeycutt at 02:12 AM on 27 June, 2015

Evan @67... That suggestion for a discussion was made almost two years ago (and made by a commenter, not the original authors of the article). It seems prudent to wait until the paper is actually published and then SkS authors will review the paper again.

0 0
MA Rodger at 03:40 AM on 27 June, 2015

Rob Honeycutt @68.
The appearance of an ebullient Evan Jones to claim (in the present tense) credit for (actually) an almost three-year-old comment may be down to a new outlet appearing for the publication of atmospheric science. It is called The Open Atmospheric Society and according to this web-page the founder is a chap called Anthony Willard Watts which should make papers written by Anthony Willard Watts a lot easier to get published.
http://theoas.org/
http://dailycaller.com/2015/06/19/skeptics-found-scientific-society-to-escape-journals-that-keep-out-dissenters/

0 0
Rob Honeycutt at 06:12 AM on 27 June, 2015

MA Rogers... A tad ironic being that one of the latest conspiracy theories about Cook13 is that Environmental Research Letters was created for the purpose of publishing the 97% Consensus paper.

0 0
KR at 06:14 AM on 27 June, 2015

Rob Honeycutt - An excellent, and very amusing, point.

0 0
Rob Honeycutt at 06:25 AM on 27 June, 2015

Oh, and you're right. It was three years ago, not two! Amazing.

0 0
Evan Jones at 21:11 PM on 2 July, 2015

For me, closer to four. I have been at it since Fall et al. (2011), of which I was a co-author. It was necessary to convert the ratings from Leroy (1999) to Leroy (2012). After the criticisms here and elsewhere, it was then necessary to address the outstanding issue of TOBS, station moves, and MMTS conversion. Then swing it all over from USHCN2.0 to USHCN 2.5.
The appearance of an ebullient Evan Jones to claim (in the present tense) credit for (actually) an almost three-year-old comment
Anthony made the decision to pre-release in order to obtain hostile independent review so any issues could be addressed before submission. Having done that, peer review will go a lot easier.
I hashed this out on Stoat, but we made note of the criticisms here, as well. I am not disputing the problems with raw data. But I do have some issues with the adjustment procedure, especially with homogenization.
Doc. VV and I burned through a forum-plus on Sou's blog discussing that point. He says aour findings are interesting, but thinks the divergence between the well and poorly sited stations will be acounted for by jumps. But I've run the graphs, and the divergence is as smooth as silk. He also points out that homogenization will not work if thereis a systematic error in the data. In this cas, that systematic error is microsite.

The problem is that only 22% of sample is well sited, so the well sited stations are brought into line with the poorly sited stations rather than the other way around. If one does not address siting, it is a very easy error to miss. I find it is quite unintentional on NOAA's part, but it is a severe error, nonetheless.
If anyone has any specific questions about the current state of the study, I will be pleased to answer.

0 0
John Hartz at 23:38 PM on 2 July, 2015

@Evan Jones: Is this the "game-changing" paper that Watts promised a few years ago?

0 0
KR at 00:36 AM on 3 July, 2015

Evan Jones - The only relevant 'systematic error' would of course be one in trends, in change. The issue of site quality is relevant to certainty bounds, but unless you have evidence of consistent trends in microsite driven changes in temperatures, I don't think there's much support for overall missed trend errors.
Not to mention that the US is only ~4% of the globe. I am always saddened by the arguments (both implicit and explicit) on WUWT and other sites claiming (without evidence, mind you) that US adjustments are wrong, usually with claims of malicious distortion and conspiracies, and therefore that global temperature trend estimates are also claimed to be wholly wrong.
I fully expect that the upcoming paper, regardless of the strengths or weaknesses of the work and data, will be be (mis)used as part of such arguments, just as Fall et al 2011 was. Which IMO is unfortunate; it subtracts from the actual worth of these papers.

0 0
Rob Honeycutt at 02:17 AM on 3 July, 2015

John Hartz... Yup, that's the one. Which brings up a good point.
If Watts had "[decided] to pre-release in order to obtain hostile independent review" why would he be touting it as a "game changing paper?" I find this notion to be dubious at best and downright deceptive if true.
If you have a paper where you're looking for independent public review, then you should state as much up front! That's not what Tony did. He posted it making wildly unsubstantiated claims about it being something important, influential and about to be published. It strikes me as being supremely self-deluded to suggest posting it before publication was for the purposes of review.

0 0
MA Rodger at 02:48 AM on 3 July, 2015

Rob Honeycutt @76.
Indeed, for all the world, the description of the pre-release at the time was not consistent with the suggestion @73 that it was a "pre-release in order to obtain hostile independent review so any issues could be addressed before submission," and certainly never a review which would last for years. Planet Wattsupia was actually shut down to allow Watts to get the paper finished and rushed out into the public domain. The official "Backstory" to this drama ended thus:-
(11) And the result is what you see in the press release today. (12) Finally got to take a shower today about 2PM. Prior to that, Kenji* was offended.(13) Now on to final polish thanks to WUWT peer review and submission.
(*Kenji is a dog, apparently.) So would anyone in their right mind describe the hammering the paper has received since the press release as a "final polish"?

0 0
John Hartz at 03:58 AM on 3 July, 2015

Rob Honeycutt:
If I were authoring a "game changing" paper on manmade climate change, I certainly would concentrate on completing it as expeditiously as possible .
It appears that Watts has been too busy throwing red meat to his WUWT followers to have focused on his "game changer."
What a farce!

0 0
michael sweet at 04:45 AM on 3 July, 2015

Evan Jones,
You are claiming major changes have been made. You ask for feedback, but you have not shown what the major changes are. How do you expect us to make comments when you do not show your changes? If you want additional feedback, it is your responsibility to show the changes you have made so that people here can evaluate them. So far you are just advertising your "new" paper.

0 0
KR at 05:14 AM on 3 July, 2015

Now now, folks. Watts, Jones, etc. can hold Fall et al 2011 to their credit, as a paper that took considerable work and was published in a peer reviewed journal - despite the conclusions going against their expectations:
...the overall mean temperature trends are nearly identical across site classifications.
While the initial unpublished and apparently unreviewed paper received a rough reception in its previous foray, due in large part to really significant issues regarding incorrect conclusions drawn by not applying any of the significant known corrections for data errors, I for one am willing to withhold judgement on the next iteration until after I've read it.
Mind you, I do not hold high expectations, considering the quality (or lack thereof) of error correction and homogenization discussions that have occurred at WUWT. But we'll have to wait and see.

0 0
Evan Jones at 13:50 PM on 3 July, 2015

Is this the "game-changing" paper that Watts promised a few years ago?
Yes.
The only relevant 'systematic error' would of course be one in trends, in change.

Yes, of course.
The issue of site quality is relevant to certainty bounds, but unless you have evidence of consistent trends in microsite driven changes in temperatures, I don't think there's much support for overall missed trend errors.
I agree completely. We have powerful evidence of the effect on trend (sic).
Not to mention that the US is only ~4% of the globe.
Not relevant in this particular case. All that is required is a sufficient amount of data to compare siting. What's sauce for the USHCN is sace for the GHCN. But I also agree that a survey of all GHCN stations is called for. But that's not so easy (if even possible at this time) for a whole host of reasons.
I am always saddened by the arguments (both implicit and explicit) on WUWT and other sites claiming (without evidence, mind you) that US adjustments are wrong, usually with claims of malicious distortion and conspiracies, and therefore that global temperature trend estimates are also claimed to be wholly wrong.
I claim all that — with very strong evidence — but WITHOUT the conspiracy theories. I do not think this is fraud, merely an error. An understandable one, one I might easily have made myself.
All data and methods will be fully archived in a form that can be easily altered or re-binned if you think we got parts of it wrong, so you can run your own versions should you desire. I will assist. Bear in mind that we are not trying to convince our "pals". That's always easy. Too easy. We are trying to convince our opponents in this.
I fully expect that the upcoming paper, regardless of the strengths or weaknesses of the work and data, will be be (mis)used as part of such arguments, just as Fall et al 2011 was. Which IMO is unfortunate; it subtracts from the actual worth of these papers.
I have and will continue to insist publicly that the results of these papers do not dispute the reality of AGW. But it does dispute the amount.

0 0
Evan Jones at 14:01 PM on 3 July, 2015

If you have a paper where you're looking for independent public review, then you should state as much up front! That's not what Tony did. He posted it making wildly unsubstantiated claims about it being something important, influential and about to be published. It strikes me as being supremely self-deluded to suggest posting it before publication was for the purposes of review.
But we did.
The pre-release of this paper follows the practice embraced by Dr. Richard Muller, of the Berkeley Earth Surface Temperature project in a June 2011 interview with Scientific American’s Michael Lemonick in “Science Talk”, said:
I know that is prior to acceptance, but in the tradition that I grew up in (under Nobel Laureate Luis Alvarez) we always widely distributed “preprints” of papers prior to their publication or even submission. That guaranteed a much wider peer review than we obtained from mere referees.

That is exactly what our "purposes" were, and that is exactly what we did. And I have been addressing the resulting (exraordinarily valuable) independent review ever since. It takes time.

0 0
Evan Jones at 14:05 PM on 3 July, 2015

If I were authoring a "game changing" paper on manmade climate change, I certainly would concentrate on completing it as expeditiously as possible .
It appears that Watts has been too busy throwing red meat to his WUWT followers to have focused on his "game changer."
What a farce!

He has had to wait on me. Sorry about that.And I have been concentrating on completing it. I have put in ~3000 hours on this since Fall et al., give or take, plus fulltime work. It has consumed my life. And now, finally, we are just about "there".

0 0
Evan Jones at 14:37 PM on 3 July, 2015

You are claiming major changes have been made. You ask for feedback, but you have not shown what the major changes are.
Fair enough. I will go into some detail. The three major issues are addressed:
1.) MMTS Adjustment: All MMTS station data is jumped at (and after) the month of conversion according to Menne (2009) data: Tmax: +0.10C, Tmin: -0.025C, Tmean: +0.0375C. (There is also an issue with CRS, per se., which I'll discuss if you wish.)
2.) TOBS: Stations with TOBS flips from afternoon to morning (and a small handful vice-versa) are dropped. This avoids the issue of TOBS adjustment while still maintaining a robust sample.
3.) Station Moves: Any station that has been moved after Apr. 1980 is dropped unless both locations are known AND the rating remains the same. Also, if the rating changed during the study period (1979 - 2008), we dropped the station even if the location(s) was known. It is not controversial that a change in site rating will likely affect trend (by creating a jump). But what we want to demonstrate, unequivocally, is that trends (either warming or cooling) are exaggerated by poor microsite even if the microsite rating is unchanged throughout the study period.
4.) In addition, we provide data during the cooling interval from 1999-2008. A ten-year period is obviously not sufficient to assess overall climate change (and we are not trying to do that). But it is quite sufficient to demonstrate that during that period that the badly sited stations cooled more quickly than the well sited stations.
Heat Sink Effect works both ways. It exaggerates trend in either direction, warming or cooling. What goes up must come down. The only reason that poor microsite has exaggerated warming is that there has been real, genuine warming to exaggerate in the first place.
Our hypothesis does not dispute global warming — it requires it.

Note: After dropping stations as per the above, we are left with a sample of 410 unperturbed stations (both well and poorly sited). However, we are not cherrypicking: the sataions we dropped have substantially cooler trends than the ones we retained.

0 0
John Hartz at 14:45 PM on 3 July, 2015

Evan Jones:
Since you are doing all of the heavy lifting on this paper, why is Watts listed as the lead author?

0 0
Evan Jones at 14:52 PM on 3 July, 2015

Now now, folks.
THANK you for that. Yes, we did not withhold our findings even though they disputed our hypothesis. One of the casrdinal rules of science is that one must never withhold one's findings.

I apprecite that you withhold judgment untill you have an opportunity to read the new version of the paper and carefully examine the supporting data.

We do drop all the NOAA major-flagged datapoints. Raw data has issues and is insufficient to our task. Even after bypassing the issue for moves and TOBS, we still must apply MMTS conversion adjustments. Our results are therefore not rendered in raw data. Also, we anomalize all data (which workes "against" us, overall, BTW).

0 0
Evan Jones at 14:58 PM on 3 July, 2015

Since you are doing all of the heavy lifting on this paper, why is Watts listed as the lead author?
For much the same reason that a general gets the credit for winning (or losing) a battle. Although I have refined the hypothesis somewhat and slogged through much mud, he is the one that found the mud in the first place. I am just the infantry. And he has also done much heavy lifting of his own in this, as well.

0 0
Evan Jones at 15:03 PM on 3 July, 2015

About the UHI issue:

We find that UHI has little effect on trend. The main driver appears to be heat sink on the microsite level. Well sited urban station trends run much cooler than poorly sited urban station trends.
And after dropping the moved airport stations, we are left with very few, and they do not show an elevated trend, either.
Mesosite issues do not appear to have a significant effect on trend. it all comes down to microsite.

0 0
Evan Jones at 15:17 PM on 3 July, 2015

So would anyone in their right mind describe the hammering the paper has received since the press release as a "final polish"?

Ayup. Finestkind.

0 0
Tristan at 16:30 PM on 3 July, 2015

As it warms (or cools), the Δtemp between the heat sink and the air surrounding the sensor diverges. Therefore, at Tmin the release of heat is proportionately greater (or lesser) and also earlier (or later) in the heat sink release process at Tmin by the end of the study period than it was at the start. This produces a disproportionate trend effect.
Therefore, a 1C offset in 1979 becomes a 1.5+ offset by 2008.
In a cooling phase, the process reverses itself, and cooling is exaggerated. That is demonstrated by the data from 1998 to 2008.

I'm not sure I understand the notion of 'trend exaggeration', Evan. Whether you get a positive or negative trend depends on where you start counting from. How can a microsite heat sink influence in 2008 (or whatever) go from doing nothing to doing something if you change whether the trend line starts at 1995 or 1998?

0 0
MA Rodger at 20:25 PM on 3 July, 2015

John Hartz @85.
There was another odd aspect to the authorship of Watts et al. (unsubmitted). The listing of the four authors (Watts, Jones, McIntyre, Christy) was followed by the following "plus additional co-authors that will be named at the time of submission to the journal". While at the time it appeared very odd, in the circumstnces it presumably will be a list of all the folk who identified and corrected all the mistakes in the original draft (if they care to be so named).

Tristan @90.
I think the source of that Evan Jones quote you present and what it is allegedly answering should be made a little more clear. It is from this 2014 HotWhopper comment thread (or threads - there was a previous one that it transferred from) which was exceeding long and didn't get very far (or questions such as that @90 would have been resolved).
One of Jones' final comments said "... But we cannot address all of this at once in one paper. I look forward to examining all of these issues." All rather ominous.

Evan Jones.
Given this record you have of filling up comment threads to no purpose, can you make clear your purpose here? Are you just after a bit of a chat? Are you here to announce the imminent pre-release of Watts et al.(unsubmitted) for a second time? Do you wish to share some specific aspect of its content with us denizens here at SkS (& if so, we would benefit from knowing what)? Or are you just trolling it?

0 0
michael sweet at 22:25 PM on 3 July, 2015

Evan Jones,
Thank you for the brief description. Since details are lacking it is difficult to respond in detail.
Your listed changes all seem to be by deleting stations from your analysis. Since you have a small sample to start with (and are using only US staions 4% of the globe) you will have to show that you have not deleted all your signal from the data (or added false signal through deletion of other data). You will have to be careful to avoid charges of cherry pickig. It is possible to obtain a graph you want by trying enough combinations of data until you get the one you want, especially with a small, noisy data set.
BEST's analysis is similar: they separate stations into two records when they detect a change. BEST matches NOAA. You will have to show why your approach is better.
Your time period of analysis (1979-2008) is unusual. Why do you truncate the data at 2008 when it is readily available to 2014. (Matching old data analysis is not a good excuse, they previously did not have the additional data). Someone will check to see if the data changes with the full record. You are open to accusations of cherry picking. While you have a 30 year record, generally the minimum needed to see global climate changes, your use of such a truncated time period (combined with your truncated site data and very small geological range) greatly increases the time needed to see significant changes. Claiming that 30 years is typically used is not enough when you limit your other data. You need to get an unbiased statistician to check if your very small data set is still significant. I doubt your time period is long enough to be significant with your truncated data set (I am not a statistician, but I have 20 years of professional data analysis). Differences are expected in small data sets (like yours) due to random variation, you are responsible for showing the changes are statisticly different (generally two sigma).
1999-2008 is too short a time to obtain significant differences with a truncated data set without strong statistical analysis. Eyeballing a graph does not count. You will be strongly criticized for too short a period of time.

0 0
Evan Jones at 22:26 PM on 3 July, 2015

I'm not sure I understand the notion of 'trend exaggeration', Evan. Whether you get a positive or negative trend depends on where you start counting from. How can a microsite heat sink influence in 2008 (or whatever) go from doing nothing to doing something if you change whether the trend line starts at 1995 or 1998?
It very much does depend on where one starts and where one ends. Heat sinks become disproportionately warmer as the ambient temperature rises. If there is a warming trend during the interval, waming will therefore be exaggerated by a funcrion of the amount of the disparity.
If it is a cooling trend (e.g., CONUS 1998 - 2008, the process reverses and the poorly sited stations cool more rapidly than the well sited stations.

For the period from 2005-2015, the trend is essentially flat. That is why the CRN (all well sited) trends do not diverge from COOP on a year-to year basis. In order for HSE to exaggerate trend,there must first be a trend to exaggerate.
But HSE even does show up internally in the CRN vs. COOP data — COOP shows slightly warmer summers and slightly cooler winters than CRN. That is consistent with the HSE hypothesis. Both warming and cooling are exaggerated.

0 0
KR at 22:41 PM on 3 July, 2015

"Heat sinks become disproportionately warmer as the ambient temperature rises..."
Not by any definition of "heart sink" that I'm familiar with. I sincerely hope you have properly defined the terms you are using.

0 0
Evan Jones at 22:48 PM on 3 July, 2015

Your listed changes all seem to be by deleting stations from your analysis. Since you have a small sample to start with (and are using only US staions 4% of the globe) you will have to show that you have not deleted all your signal from the data (or added false signal through deletion of other data). You will have to be careful to avoid charges of cherry pickig. It is possible to obtain a graph you want by trying enough combinations of data until you get the one you want, especially with a small, noisy data set.

Quite correct. Indeed we will have to demonstrate that. In order to avoid cherrypicking, we use many graphs with many combinations and binnings and show them all. All of them show Class 3\4\5 stations warming faster than cooling stations. (This shows up in the regional data as welll as nationally.)
We also show the data for the stations we dropped so as to deflect suggestions of cherrypicking. The dropped stations (both compliant and non-compliant) show substantially less warming than the ones we retained.

I put all the data up on a series of excel sheets to be used as a maleable tool for further study. This enables a reviewer (hostile or otherwie) to re-bin, add, or drop stations as one desires, either to expand on the hypothesis or if one thinks we did the binning wrong or in a prejudiced manner. You don't like our MMTS jumps? Replace them with your own. You think we unjustifiably dropped a station? Just drop it back in. Think a station we included should be dropped? Do that.
This will all be archived and easily accessible once we publish.

0 0
Evan Jones at 22:56 PM on 3 July, 2015

Not by any definition of "heat sink" that I'm familiar with. I sincerely hope you have properly defined the terms you are using.
I think we have. We use the term to distinguish it from waste heat. A heat sink absorbs and re-emits heat. Even the normal environment is a bit of a heat sink, which is why Tmax occurs hours after the maximum solar input. Excess heat sink (such as a paved road) in close proximity to the sensor increases this effect.

0 0
Evan Jones at 23:17 PM on 3 July, 2015

BEST's analysis is similar: they separate stations into two records when they detect a change. BEST matches NOAA. You will have to show why your approach is better.
BEST coincides well with Fall (2011). But both of those studies use the obsolete Leroy (1999) rating system, whixh accounts for distance-to-sink only. We use the upgunned Leroy (2010), whxich includes the area covered by the heat sink as well as the distance. We had a joke back in the Fall (2011) days that all Class 4s are equal, but some Class 4s are more equal than others. Leroy (2010) resolves this .
Fortunately for us, we are dealing with the data- and metadata-rich USHCN. That allows us to avoid the can-'o-worms of trend-splitting: We merely drop those staions that would require a split. The number remaining is more than sufficient to demonstrate statistical significance at the 95% confidence level.

Mosh and Zeke have no such luxury. They can't drop those. USHCN is dense and rich. Outer Mongolia, not so much.
When BEST makes a split, they are doing pairwise with stations with artificially high trends. That will tend to skew their results. This can be corrected by using only unperturbed Class 1\2s for pairwise adjustment. It will be interesting to see the results of BEST applied to our "unperturbed" set using Leroy (2010) parameters.

In short, our method (dropping) is "better", but Mosh is doing the best he can with what he has. He can't drop. We can and do. In future, I will be looking at methods for adjustments for perturbed stations based on the empirical results of this study. Only after all such adjustments are made will pairwise (including homogenization) yield valid results.

0 0
Evan Jones at 23:36 PM on 3 July, 2015

Your time period of analysis (1979-2008) is unusual. Why do you truncate the data at 2008 when it is readily available to 2014. (Matching old data analysis is not a good excuse, they previously did not have the additional data).
We have a number of reasons.
We require a period of unequivocal waming. We need to do that in order to demonstrate what occurs during warming (or during the 1999-2008 interval, cooling). For the HSE hypothesis to be consistent, it has to work both ways.
For my money, I'd have prefered 1977-2006, as this excludes the 1998-1999 drop. But we wanted to match with the onset of satellite data, and we wanted a 30-year interval for our main series. (We do include a 20-year set from 1979-1998 to show a warming-only interval.)
The recent data (to 2014) is flat, and therefore not as (directly) useful to our purposes. There will be little or no trend divergence between well and poorly sited stations during a flat period. (We do compare the wonderfully well sited CRN with COOP from 2005-2014 to demonstrate this non-effect.)
We needed to find an interval of warming and cooling where there was a sufficient number of eligible (unperturbed) stations) and sufficient metadata. 1920 - 1950 would have done just as well but for the lack of metadata.
The 1979-2008 period was the only stretch that had both warming and sufficient metadata. And the (short) 1999-2008 period was the only interval of definite cooling when the required metadata is available. We were lucky to come up with even that much.
In short, for the period when there was warming, the welll sited stations warmed less. For the interval of cooling, the well sited stations cooled less. And for the flat trend after, there is no divergence. All of which is consistent with the HSE hypothesis. It appears from the data that we are solid three ways from Sunday.

0 0
Evan Jones at 23:50 PM on 3 July, 2015

Someone will check to see if the data changes with the full record. You are open to accusations of cherry picking.
True. (We welcome that.) Note, however, that we are not directly measuring global warming. We are only addressing the effect of heat sink on sensors. A 30-year intervaal is necessary for assessment of global trend, yes. But when we use the 1999-2008 series, we are not saying that implies that we are not warming overall or that there is actually global cooling. We use that interval to only to demonstrate the spurious cooling effect of poor microsite during a cooling period.
Differences are expected in small data sets (like yours) due to random variation, you are responsible for showing the changes are statisticly different (generally two sigma).
Well, J-NG has run a preliminary set of Monte Carlos and for our full set he shows chance of overlap at 0.00000% between the Class 1\2s compared with both Class 3\4\5, and the fully adjusted data for all classes. So our results appear to be robust.

0 0
Evan Jones at 23:55 PM on 3 July, 2015

1999-2008 is too short a time to obtain significant differences with a truncated data set without strong statistical analysis. Eyeballing a graph does not count. You will be strongly criticized for too short a period of time.
We do that strong statistical analysis. The divergene shows in our subsets as well as in our full sets. Actually, the graphs look much the same — until you drop in the trendlines. That is because the divergence is gradual over time, not a result of jumps in the data.
We use the intervals available to us (reasons in pervious comment). I can answer criticisms regarding not using a longer set.

0 0