GHCN and Adjustment Trends

In his blog post, Giorgio Gilestro claims to show that the adjustments made by GHCN to temperature data do not induce  artificial trends into the data based mainly on this histogram of adjustments.

Arguments were made on the blog that the temporal order of the adjustments makes a difference and in fact these arguments seem to be correct.   I checked this through the following reasonably simple analysis:

I downloaded the GHCN datasets that he used and read them into R  line by line.  The next step was to identify which lines in the adjusted set corresponded to specific lines in the raw data set.  There were one obvious error in the adjusted set: the last 10 lines were identical  (but continued nothing but missing value codes.  After fixing that, it was found that 31 adjusted lines had no corresponding raw partners – including a full station record.  These were also removed.

The individuals differences were calculated for each month and station to produce a set of  5068104 values.  Of these 4% were not available and 32% were zeroes.  The remainder were adjusted in one way or another, some by fairly large amounts.   The resulting differences were averaged over each year for each station:

Nothing obvious here except for the size of some of the adjustments.

Finally, these averages were again averaged over the stations to calculate an average for each year.:

The graph has some interesting features.  First of all, there is a fairly linear trend from about 1900 to the present.  The increase is about  0.25o.  A second unexpected feature was the fact that there seems to be a fairly constant reduction of 0.10o C from about 1990 to 2006!  The expression “hiding the decline” comes to mind and I believe this would need some sort of explanation.

The R program is available as a pdf here.

Update: December 13, 2009

Since posting this yesterday, I have become aware of a post doing a similar analysis by blogger hpx83.  His results are basically the same as the ones I have posted.  However, using further information about  the stations, he has also included a graph of the number of stations available in a given year:

What I find particularly interesting about this in relation to the adjustment plot directly above is the post 1990 portion which parallels the sudden drop in the adjustments at that time.  Further looks are probably in order.

As well, Jean Demesure commented on the use of  a pdf file for posting the R script.  I appreciate the problem, however quotes are not properly handled if the script is pasted into the blog post and WordPress limits my ability as to what types of files I can upload to the blog.The only extensions allowed for files containing text  are doc, docx, odt, and pdf.  Even txt files cannot be uploaded.  I have just  done a simple experiment which seems to work.  A text which is renamed as .doc will upload.  I have done this with the current script and it is now here: ghcnR .  It opened without difficulty in a simple text editor.  I will do this in the future.

96 Comments

Filed under Uncategorized

96 responses to “GHCN and Adjustment Trends

  1. Pingback: GCHN Adjustments – Statpad « the Air Vent

  2. Mesa

    Thanks! I posted some of the rationale for doing this at RC….interesting and somewhat odd stuff….

    Mesa

  3. Chris

    I was one of the ones commenting about the timing of the adjustments on the GG blog. Not sure why I bothered as the point had already been made, but the vast majority of the comments were praising the “good work”.

    Vastly different picture here with adjustment timing factored in. Is it correct that even with the positive adjustments the net effects were all negative on an annual basis?

    • KevinUK

      Chris,

      It’s important as Steve M always say to ‘watch the pea under the thimble’. Doing a high level statistical analysis as GG has done can often not reveal (‘hide’ is such a bad word as it implies motive!) some interesting stuff that can be seen below as you start to drill down. RomanM is drilling and needs to keep drilling.

      There are lots of us on the blogosphere doing exactly what RomanM is doing here. I’m doing it using a complete different different programming language and doing it truely independently of RomanM. This is in stark contrast to NOAA/GISS/CRU who as the CRU email show were most definitely not independently checking each others analyses.

      Now please understand that this is going to go where ever the science (ok strictly speaking maths) is going to take it. I’m sure we are all going to enjoy the journey even if we may all have different expectations for the outcome (or more honestly wish for different outcomes from the analysis?).

      KevinUK

  4. Dude

    Innnnnteresting.

  5. Peter

    Great post!
    while you are on to it, maybe to tweak your program to make a separate analysis of any potential bias introduced by the initial quality control that is mentioned in the bloggpost you are referring to.

    Here…
    ["So, let’s have look. I took the CRU/GHCN dataset available here and compared all the adjusted data (v2.mean_adj) to their raw counterpart (v2.mean). The GHCN raw dataset consists of more than 13000 station data, but of these only about half (6737) pass the initial quality control and end up in the final (adjusted) dataset."]

    It would be interesting to study the difference in the raw_discarded set vs. the raw_included set.

    • KevinUK

      Peter,

      Very good point! I ‘d completely forgotten about the quality control. The only problem is they haven’t done any adjustment calcs for the station data they rejected (as far as I know?). To do that we’d need to include the rejected raw data back into the raw dataset, do the adjustment calcs with the new more ‘inclusive’ dataset and re-do RomanM’s type of analysis. Sounds like we need EMSs help.

      KevinUK

  6. Jean Demesure

    Please Roman, next time, save your R scripts simply in .txt format.
    C&Paste from PDF really sucks (lots of lines are simply corrupted by the paste).
    Otherwise, neat script, thank you !

  7. KevinUK

    RomanM

    Nice analysis so far and its consistent with gg’s finding. keep it up! Could you do the last chart for the NH and SH seperately and even better for latitudes greater and less than 60 deg C North/South as well I..e four charts in all. I love the second chart. It looks like an Etruscan vase turned on it’s side.

    It looks like the majority of the adjustments fall with the +/- 2 deg. C range. Why do you think the peak/trough in the envelope of these adjustments occurs in about the 1930s to 1940s? and why is the neck of the vase so much less than its body i.e. why are the adjustments only + 0.5 to -1.0 Deg. C post about 1980?

    I think you’ve got the makings of a good CA guest post here. Why don’t you send Steve M a link to this thread? Either way I’m going to put a post on CA linking to it.

    KevinUK

    • RomanM

      Kevin, the information about the location of the station was not included in the data sets which I downloaded. I would have to look around their ftp site to figure out where it is and write a script to read it. At the moment, I don’t have the time to do that.

      Regarding the adjustments post-1980, I believe part of the answer is contained in the update which I added today. There is a large quantitative change in which stations are available and many are dropped about that time. the remaining stations have lower adjustments.

      I didn’t post this on CA because Steve is in the midst of the “Gate” analysis and I didn’t wish to be a distraction. The downside of this is that I am not used to the increase in comments on my site (which I mainly started to do tests before posting at other places and as a repository for linking to my graphs and documents from other sites). Unlike using external hosts for these types of links, I get to keep control of the files. For this latter purpose, the blog has been very useful

  8. Arun

    Interesting. Just a few minutes ago, I saw only one comment on this post. Now there are 7. If this post is important as I think it is, showing +0.25 of highly systematic warm bias added in the 20th century, there will thousands of comments on in the total blogosphere soon.

    • KevinUK

      Arun,

      Welcome to the crest of the wave that is real time science in action on the blogosphere.

      KevinUK

      • RomanM

        Arun, they appeared “suddenly”, because I just got up this morning and “approved” the comments (which were made by people in other parts of the world where it was day, not night).

        In regard to Kevin’s comment, here is a graph showing the statistics for hits at my site:

  9. John F. Pittman

    Roman, I wonder if you could post the GHCN yearly anomoly on the Mean Annual GHCN Adjustment graph.

  10. JP Miller

    Great work, Roman M. However, everybody, what do you make of the AMAZING symmetry of the changes in GGs graph? Is it really likely that over a 150+ year time period there would be an equal number of +/- adjustments? What about UHI adjustments? Wouldn’t that fact alone suggest a noticeably greater number of negative adjustments? Looks to me like someone was doing their best to “balance” the number of +/- so as to look (clumsily) “even-handed.”

    • RomanM

      If you look carefully at gg’s graph at the top of this page, the symmetry is not around zero, but some negative value. The same applies to the individual corrections themselves.

      Individuals corrections (5068104 of them):

      NAs 4.1%
      Zeroes 32.2%
      Positive 24.2%
      Negative 39.6%

      There does not appear to be any attempt at balancing the numbers.

  11. Craig Loehle

    Dear Roman,
    I am working on a paper on adjustments. Looking at your adjustments graph above, they are all negative. I think the shape is right, but why all negative? Also, please email me, I would like to cite that graph somehow (or work with you on this). cloehle at ncasi.org

    • Nick Stokes

      Craig, remember that these are averaged over all sites. My perspective on the comparison of the GG and Rom plots is this. You have variation over time and over stations. GG calculates a summary stat over time (regression slope) and shows the distribution over stations. Roman gets a summary stat over stations (average) and shows the distribution over time.

      GG’s mean says that over stations the average adjustment uptrend is 0,0175C/decade. If you plot a line back with that slope, it tracks Roman’s curve well. The max downslope is about 1905-2005, where the regression slope is 0.023 C/decade.

      In fact, the most comparable (to GG) measure of slope for Roman’s plot is probably the regression weighted by the number of stations in each year. That comes to 0.0170 C/dec – very close to GG.

      • RomanM

        Nick, I tend to agree with you that Giorgio’s results are consistent with what I came up with. However, his graph actually did not justify what he maintained – that the positive and negative adjustments cancelled each other out with no net effect. It was not centered at zero, and this was not visually evident because of the sheer numbers of observations being summarized. That is similar to the effect of viewing the “Etruscan urn” graph in the head post. Looking at the adjustments as a function of time was necessary to evaluate the effect of these adjustments on the global trends. The totality of the adjustments forms a substantial portion of the temperature trend.

        I don’t think there is any question that warming has occurred in the past 150 years. The question in all of this is how much is real and how much is an artefact of the statistical processing choices and methods. This is important not only for itself, but also in the fact that this data is used in scientific studies for calibrating paleo seriess, evaluating models to determine climate sensitivity and any other situations where the effect of temperature change may be a factor. One must also remember that other reasons for “adjustment” could also be needed (e.g. UHI) and these could affect the results further.

  12. Charlie

    Hasn’t it already been shown that the TOBS (time of observation) adjustments pretty much account for most of the warming in the US temperature record?

    Is that what is going on in the whole GHCN database?

    Or perhaps some sort of effect where TOBS corrections on nearby sites then show up on other sites when homogenation takes place?

    The other basic thing I don’t understand is how homogenization or other corrections propagate in the database. Does station 1 correction affect station 2’s reading. And then in the next round of processing station 2’s adjusted reading affect station 1??

    I’m just throwing out a possible answer for why there seem to be many more adjustments than there are changes in station record.

  13. CC

    Further to Craig Loehle’s comment above, something doesn’t add up here. Gilestro’s histogram is centred on zero, or almost zero, meaning that when all the adjustments are taken into account there is no net change up or down. Yet your time-dependent analysis shows that the yearly average adjustments are ALL below zero, to a greater or lesser extent. By eye, the overall net change should be about -0.1 or -0.15 degrees. Why doesn’t this show on Gilestro’s histogram?

    • RomanM

      Please see my reply to JP Miller above.

      Nick Stokes (who replied in another thread, ‘Test Post’, on my blog here – I haven’t figured out how to move it to this thread yet) duplicating gg’s results in R indicates that the adjustments have a contribution to the trend of about .17 C/century. This is reasonably consistent with what I have come up with in a simple look at the adjustments.

  14. Greg F

    RomanM,
    Don’t know if you have checked this out but Chiefio has done a number of posts on GISS/NOAA/NCDC. The post “NOAA/NCDC: GHCN – The Global Analysis” is a good place to start.

  15. How does your second figure jibe with my finding that GG’s results show both + and – adjustments average out to 0.1° C/decade (the units he claims he used) with the overall average adjustment being +0.017° C/decade since there is a 12% bias for positive adjustment (56% are + and 44% are -)?

    Ah…he’s doing slopes and you’re doing absolute values so they are entirely different perspectives. Either positive and negative absolute adjustments can lead to a positive slope change. If most records in fact don’t go back further than 1880 (where GISS chops data off at in fact) then a big negative pull on that era like shown would indeed cause a positive adjustment bias in slope. However, your first plot shows that data actually kicks in by 1850 (shame on GISS!), so the effect on overall slope would not be very intense but could effect shape noticeably but perhaps not significantly.

    The 1990-2000 decline would not hide a decline but hide an incline (by about 0.05° C vs. earlier adjustment values), right?

    Your overall negative values for the global collection (GHCN) mismatch the overall positive values for the US collection (USHCN) plotted here (I’ve taken someone’s graph and added Celsius for which ticks are spaced 5/9 the width of Fahrenheit ticks on a thermometer since between melting ice and boiling water there are 100 degrees C instead of 180 degrees F):

    http://i45.tinypic.com/2r74goi.jpg

    That adjustment would also cause a positive adjustment bias in slope to the US database. A much bigger one than your second plot would cause to the global database since US adjustments max out at +0.55° C instead of your global adjustment values at -0.26° C. Is it thus the case that the USHCN is being adjusted much more than the GHCN? Thus the global network may be masking troubles with the US one which is where a real problem may exist.

    I’m still confused about what stage in the process such an overall alteration in shape could be introduced while not altering the slope much (GG’s valid point). Is homogenization done by USHCN or GHCN?

    Is there a *overall* adjustment done *as* an overall adjustment? In other words, if there’s any monkey business at what level of government (literally) is it occurring at? Low level people with observation bias who are messing with individual stations would likely show up as an overall slope change (greater than GG’s finding of 0.017° C/decade), since they would not be expected to successfully coordinate their work with each other to adjust for such an eccentric thing as overall slope. So instead of a conspiracy it would amount to one person, or a very small group.

    I have failed to locate the simple plots of adjusted/unadjusted overall US (USHCN) and global (GHCN) data. Just lots of hits for individual station studies. If the overall plots don’t look much different then what’s the problem?

    Ugh, it may be time to finally learn R. I’m a chemist not a programmer!

  16. Mesa

    Some of the confusion is over time periods. The corrections from 1900 on cause a warming of about .5 F. The corrections prior to that cause a cooling of about the same magnitude. The concern is that the warming corrections occur during the period of time we are looking at big CO2 increases, and that the magnitude of the corrections is about the same as the corrected temperature change. So, effectively the viability of the CO2/temperature historical link comes down to whether you believe in the corrections.

  17. Andy

    Used your ghcnR file (opened in wordpad, notepad and textpad) and it’s still not working. It seems to fall apart here:

    + v2.madjx = v2.madj[-which(is.na(inds))]
    Error: unexpected symbol in:

    v2.madjx”

    Not sure what the problem is.

  18. Brian Brademeyer

    >>> A second unexpected feature was the fact that there seems to be a fairly constant reduction of 0.10o C from about 1990 to 2006! The expression “hiding the decline” comes to mind and I believe this would need some sort of explanation.

    If one was actually trying to “hide the decline”, wouldn’t the adjustments have shown an *increase*?

    • RomanM

      Yes, you are quite right. My comment was a bit snide and did not properly reflect the downward direction of the adjustment until 2006. There was a second this time upward jump in 2006 which did fit the comment.

      After seeing the curve displaying the number of stations used, it is readily apparent that the both of these must be related to changes in the number of stations.

  19. GG studied the GHCN (Global Historical Climatology Network) and found not much change in slope due to adjustments. Now this page presents a study on GHCN that shows that there is in fact a curious curve of adjustment to the non slope absolute value of the GHCN. However they are small in value.

    It’s the USGCN (United States Historical Climatology Network) that is the one that shows large value adjustments. Here are the two plots of absolute value adjustments at the SAME SCALE:

    http://i46.tinypic.com/6pb0hi.jpg

    What one now needs is a GG histogram of USGCN adjustment influence on slope.

    I think there’s no smoking gun for those who adjust the global set but there may indeed be one for the US set.

    • JP Miller

      Nic, your rough graph looks right, but isn’t the smoothly upward slope of the 1920-1990 time period darned suspicious looking? One would think there would be negative adjustments for UHI that would dominate otherwise normally distributed adjustments to produce a negative slope in the 20th Century. And, are we sure the “raw” data references GHCN (or USGCN) are using are truly “raw”?

      I would not be so quick to conclude “nothing to see here” on GHCN. I still want a thorough, open examination of truly raw temp records and then let climate scientists hash out how to adjust them to create a usable temp record.

      My hunch is there won’t be enough temperature change in the last 60 or 25 years to make any case for AGW. I could be wrong, but want to be proven so — not shouted down by authority or consensus.

      • I don’t think there is any heat island adjustment in these series at all. GISS does that. Not sure who else might.

        The smoothly upward slope is suspicious even in the GHCN case, sure, but isn’t big enough to matter much, unlike USGCN.

        However, another aspect that would fail to show up in either raw or adjusted data is thermometer station location over time. Even if there was cooling going on, a “march of the thermometers” from developed Europe circa 1900 down into more tropical areas by 2000 would mean that a simple average of all the thermometers would show a heating trend! That’s one reason why the very concept of a “global average” is problematic. Many posts on this blog highlight the location issue: http://chiefio.wordpress.com/2009/08/17/thermometer-years-by-latitude-warm-globe/

        Your hunch is likely right. I base that on a firm appreciation of single site records that go way way back:

        http://i45.tinypic.com/125rs3m.jpg

        There are about ten more out there, most of them with missing gaps, but only 3 show any recent upturn from the usual steady incline.

  20. JP Miller

    Oh, and I meant to add that the entire spatial agglomeration of individual site temp records into a global temp record is another area that needs parsing…

    There’s lots the peer reviewed literature has not reviewed.

    Amazing that the vast majority of climate scientists who have used the final product of CRU, GHCN, GISS have allowed this lack of thorough, critical examination of how the temp record is built to go on for so long.

  21. Nick Stokes

    I mentioned on the test thread that I was able to reproduce Giorgio’s results using R. The histogram is here. I get the same mean, 0.0175 deg C/decade. and standard deviation 0.189 C/dec.

    Apologies for putting that post, with the R script, in the wrong place. However, it might actually be the best place for it, to keep this thread more readable. It’s here.

  22. Bill Illis

    The new USHCN V2 has adjustments that amount to about 0.45C since about 1915-1920. (versus 0.55F in the previous version).

    The TOBs adjustment (shown as separate impact on the max and min temperatures).

    http://img69.imageshack.us/img69/6590/ustobs.png

    The Homogeneity adjustment (which is more designed to match up nearby stations to catch moves, equipment changes etc. than UHI).

    http://img109.imageshack.us/img109/7312/ushomogenizationovertob.png

    This is outlined in this paper published in the Bulletin of the American Meteorological Society in 2009 (here is a free version).

    http://ams.confex.com/ams/pdfpapers/141108.pdf

  23. EdBhoy

    Can anyone give a credible explanation as to why the corrections should show this distribution in time? How can this be explained from a scientific point of view? I’m stumped!

    • RomanM

      The final graph in the head post shows the number of stations active in the data set for a given year. One strong possibility for the shape is the removal of old stations and/or the addition of new ones with differing adjustments. The preponderance of negative of negtive adjustments accounts for the negative sign of the curve.

      Whether the stations deserve to be adjusted as they are is a quite different question.

      • F. Patrick Crowley

        It looks like the 1990+ era decline in upward temperature correlates with the decline in the number of reporting stations, i.e. the Soviet Arctic ones. perhaps there was no need to adjust upward due to a selective removal of sample locations. If you increase the number of hot sites versus cold ones, you do not have to do any adjustments to show a global increase.

  24. Tom C

    This is a blockbuster finding. Or am I missing something?

    • RomanM

      Not necessarily. One would still have to determine whether the amount and direction of the adjustments is justifiable on an individual station basis. There are a lot of stations and that could be a pretty hefty task.

      • KevinUK

        Tom C

        RomanM is spot on here. The people making these adjustmenst must justify what that have done to each and every individual station. This statistical spreading of algorithmic adjustments to indiviual stations is just not physically justifiable. TOBS in a classic. Just becaus ethey don’t want to enter the time at which they observations were made, they use and algorithm instead. What nonsense given he importance of this dataset.

        Given the importance of the global and regional mean temperature anomaly charts to the case that claimed temperature increases correlate with CO2 increasing in the latter part of the 20th century is one of the primary pieces of evidence for man-caused global warming then far more effort must be placed on getting these individual station adjustments right. At the moment just because the adjustments applied as they are now produce the result they desire, the advocates of man-caused global warming like Eric ‘There is nothing to see here so just move on’ Steig thinks there is no need to do this. .

        Given that trillions of dollars (some of it mine)of ‘climate damage’ compensation is to be paid from developed countries to developing countries on the basis of this evidence, some of us (myself included) strongly disagree. It isn’t going to take that much effort to do these adjustments including UHI in a far more physically justifiable way. The advocates of MCGW know that if we do then the outcome may not support their justifications for a massive transfer of wealth. It may also remove the justification for their Enron inspired ‘cap and trade’ systems from which many of them are doing very nicely thank you very much.

        KevinUK

        PS RomanM, that web stats chart you posted looks remarkably like a hockey stick? Almost like the one for WUWT following the CRU emails release. Keep up the good work!

  25. Nick Stokes

    I reproduced this plot with the help of the R code. It looked the same. But it seems entirely consistent with Giorgio’s result. I calculated the regression slope over 1905-2005. It was 0.023 C/decade, or 0.23C/century. GG got 0.0175 C/decade. These figures, of course, can’t be expected to match perfectly, but they seem similar.

    • Nick Stokes

      Just one quibble though – has the last plot been truncated to eliminate positive values at the sparsely represented ends? I couldn’t see this in your R code, but I notice a lot of dots on the zero line. I got several positive values, both before 1850 and also post 2000. They aren’t very significant because of the small numbers of stations, but especially the cluster of dots at the right upper corner gives an impression which may not be correct.

      • RomanM

        No, I have not truncated anything. The code I posted is what was run to make the graphs.

        Can you be more specific which pre-1850 values seem to be incorrect? I also don’t understand what the “incorrect impression” of the upper right corner is. I didn’t look at the values in detail, but it appears that the values in the last three or four years seem to have fewer adjustments. Is this a problem?

      • Nick Stokes

        I asked only because I was trying to check my own code. I found a glitch in my summing program that was affecting the most recent years, so those now agree with yours. The issue with the older years really isn’t worth any attention, but here is a sample of what I got:

        Figures for number of stations (in year), total adjustment diffs, average

        …….Year Number Total Adj Average
        [28,] 1840 35 -1.3853030 -0.0395800866
        [29,] 1841 36 10.1340909 0.2815025253
        [30,] 1842 37 6.7242424 0.1817362817
        [31,] 1843 43 10.3166667 0.2399224806
        [32,] 1844 42 -3.0371212 -0.0723124098
        [33,] 1845 43 17.8704545 0.4155919662
        [34,] 1846 37 -3.0394444 -0.0821471471
        [35,] 1847 37 26.4454545 0.7147420147
        [36,] 1848 33 -3.4184848 -0.1035904500
        [37,] 1849 41 -1.3104545 -0.0319623060
        [38,] 1850 42 10.4699242 0.2492839105

    • steven mosher

      see how easy it is to collaborate and actually move a discussion forward when you have the code? gosh, there’s an idea!

      • RomanM

        Don’t be snide, mosh. This isn’t Open Mind! :)

        Nick, I see what the situation is. I was lazy in my programming so I averaged the available data for a station in a given year (using an existing R function) and then calculated the annual averages over all the stations. You calculated the sums and the number of observations, for a given year and all the stations to calculate the result.

        Your method is more “correct” – the results would be the same if I weighted the stations by the number of observations instead of equally.

        I figured that with the humongous size of the data set, there would not be much of a difference particularly in a plot, however, when you get down to counts of double digits, there certainly could be (and is) a difference. Sorry for putting you to the work of trying to find a non-existing error.

      • Nick Stokes

        Thanks, Roman, yes its a very minor point. I was just curious.

  26. Anna

    I’m sure it’s completely coincidental, but when these adjustments are applied to temperature graphs with a linear increase in temperature, they result in lovely Hockey-Sticks ; )

  27. Brian Macker

    It’s amazing how the mistakes made by the meteorologists follow such a uniform V shaped trend. One would think that mistakes would happen randomly, and when noticed would be fixed. Thus they should not be uniformly distributed over time. So it is stupid to believe that the trend in corrections is an attempt to correct mistakes. It has the fingerprints of an attempt to modify data.

  28. Jean Demesure

    In fact, the V shape including older temperatures is irrelevant in the AGW narrative : what is presented by the IPCC and counts in the public mind is the “0.7 degC warming over ***the 20th century***”. And what happened is that in that warming, about 0.25 degC has been due to “adjustment” at the GHCN (add to this further tweakings from CRU, GISS… to get “value-added” temperature).
    In short, at least a third of the warming has occured… in the computers.
    http://img692.imageshack.us/img692/4990/ghcnadjustment.png

  29. You seems to know stadistics ( i only know ‘mean’ or was average? )
    Can help me and explain what i see here:
    http://spreadsheets.google.com/pub?key=tVV0JAI6BxY8dqPiTWXlWUg&gid=4

    Hadcrut bias uncertainty Up minus LOW
    1850 to 2008
    What happen since 40’s-50’s decade?
    see the image :
    http://spreadsheets.google.com/pub?key=tVV0JAI6BxY8dqPiTWXlWUg&oid=4&output=image

    No more uncertainty?
    Can explain please?

    • RomanM

      The spreadsheet you have linked to appears to deal with the Hadcrut temperature anomaly series. The word “bias” in statistics refers to the amount that an estimator might systematically over- or under-estimate a population value it is trying to estimate due to the circumstances under which it is trying to determine that value.

      Without knowing where this data came from, I cannot give you a definitive answer as to what it represents. providing a link to the original source of the data would be helpful.

      However, if I had to guess (I like puzzles!), my guess would be that the bias could be related to SSTs (sea surface temperatures) which were “adjusted” by about .3 degrees from the early 1940’s on due to a supposed instantaneous change in the measurement procedure from the use of canvas buckets to measuring in a more automated way. This would alter the global series by a somewhat smaller amount.

      • data are in first tab ( see on top of page)
        http://spreadsheets.google.com/pub?key=tVV0JAI6BxY8dqPiTWXlWUg&gid=1
        Are the standard file you can download for HadCRUT3 time series, monthly, form
        http://hadobs.metoffice.com/hadcrut3/diagnostics/global/nh+sh/

        Data format:
        * Column 1 is the date.
        * Column 2 is the best estimate anomaly.
        * Columns 3 and 4 are the upper and lower 95% uncertainty ranges from the station and grid-box sampling uncertainties.
        * Columns 5 and 6 are the upper and lower 95% uncertainty ranges from the coverage uncertainties.
        * Columns 7 and 8 are the upper and lower 95% uncertainty ranges from the bias uncertainties.
        * Columns 9 and 10 are the upper and lower 95% uncertainty ranges from the combined station and grid-box sampling, and coverage uncertainties.
        * Columns 11 and 12 are the upper and lower 95% uncertainty ranges from the combined effects of all the uncertainties.

        i plot column 8 minus column 7:
        * Columns 7 and 8 are the upper and lower 95% uncertainty ranges from the bias uncertainties.

        Thanks in advance.

      • RomanM

        El Abuelo, this particular document deals with the uncertainty (i.e. error bounds) of the hadcrut temperature anomaly series due to different sources of possible error ranging from sources such as sampling variability, regional coverage, and bias due to changes which occurred over time.

        The graph which you gave in your first comment had to do with the bias effects as described on pages 9 -11 of the paper which is linked on the page you got the data from.

        It explains that they considered bias from two sources, urbanization (usually referred to as UHI – the fact that as a city grows around a temperature station, measured temperature may be affected by the changes in the environment and changes in the measuring equipment from one type of thermometer and/or enclosure to another. The reason that the curve dropsclose to zero by 1950 is due to the fact that there was a major changeover in equipment at that time which they assumed was done slowly during that time period.

        The estimated 95% bound for the bias error would be one-half of the the absolute value of what you have graphed since the one column represents the anomaly plus the possible error and the other minus the possible error.

        I hope you find the explanation helpful. My initial guess was incorrect. Darn ;)

  30. ThinkingScientist

    Hi RomanM, I have done the same analysis as you using GHCN and got the same results. I posted a question at CA and they pointed me here. In case you want to discuss further or swap info, contact my email directly

  31. KevinUK

    RomanM,

    Just read your Update

    “Update: December 13, 2009

    Since posting this yesterday, I have become aware of a post doing a similar analysis by blogger hpx83. His results are basically the same as the ones I have posted. However, using further information about the stations, he has also included a graph of the number of stations available in a given year:

    hpx83 is demonstrating how useful a properly normalised version of the GHCN database can be.

    I’m doing something very similar and I’m reproducing some of his, EMSs and other bloggers analyses, so that I can be sure that I haven’t made any mistakes in importing and normalising the data. At the same time I’m developing a web application (web based front end to teh data) for others to be able to see, filter, export and chart and analyse this data. It’s not R, but it’ll be very user freindly (as a result) as I intend to make this useable ‘by the masses’ and not just by people like ourselves who know programming languages well and so can readily analyse and so look for and easily identify trends is this type of complex data.

    KevinUK

    • RomanM

      Great idea Kevin! Maybe the climate community can be shamed into doing their own job better by folowing the example of people like yourself.

      That sort of thing is beyond my abilities as a web artist. please keep us informed of your progress.

      Yes, I did notice the “hockey stick” appearnace of the hits on my web site. The final tally for yesterday was over 3000. If this keeps up I might have to add some new photos to give people sometyhing interesting to look at ;)

  32. steven mosher

    Nice blog RomanM.

  33. Nick Stokes

    Just noting wrt that last graph on temperature stations. It’s a plot of points that appear in the _adj file. I think what happens is that the normal GHCN algorithm needs a continuous range to adjust, and so can’t adjust up to the present. US stations are treated differently, because of USHCN. I don’t fully understand this, but I think after the number drops at about 1994, the remaining 1200 or so stations are mostly US.

  34. El Abuelo

    Thanks!.
    That was i suspect: NO UHI.
    Or only 0,0055º decade.
    And i like his last sentences:
    “Recent research suggests that this value is reasonable, or possibly a little conservative (…) Recent temperatures may be too high due to urbanisation, but they will not be too low”

  35. Green R&D Mgr

    Roman,
    Great work. This is consistent with what I have seen sampling a fair number of random individual stations.

    The other attributes that would be interesting is what impact have these adjustments made to the spread of temperatures by date and the temp trend by decade.

    It seems the homogenizaton is doing just that, homogenizing the slope and temps to a more consistent story. Leaving an impression that the error margin is narrower that it would actually be.

    It has been my impression that result of the adjustments to individual stations has been to consistently narrow the spread they show the earlier you get in the 20th century. Hence the late 20th century extreme temps stand out more as unusual in absolute value and ramp.

    Some stats on these would be most useful to the discussion. So many times the devil is in the details…:-).

    Nice work.

    • KevinUK

      Green R&D manager

      “It seems the homogenizaton is doing just that, homogenizing the slope and temps to a more consistent story. Leaving an impression that the error margin is narrower that it would actually be.

      It has been my impression that result of the adjustments to individual stations has been to consistently narrow the spread they show the earlier you get in the 20th century. Hence the late 20th century extreme temps stand out more as unusual in absolute value and ramp.

      Some stats on these would be most useful to the discussion. So many times the devil is in the details…:-).”

      This is a very important point especially you rlas sentence “Hence the late 20th century extreme temps stand out more as unusual in absolute value and ramp.”! It’s the same for proxy temperature reconstructions. They must nt show a significant MWP followed by a significant cooling period to the LIA followed by recovery to our modern day warming period. In other word sthe flat handle of their hockey stick is just as important as their manufactured blade. They deliberately cut off the data at 1880 and don’t show any historical data prior to that i.e ‘The Little Ice Age Thermometers’ as Tony B calls them.

      http://climatereason.com/LittleIceAgeThermometers/

      If they didn’t cherry pick their start date then people would see plenty of evidence of a cooling period (from the MWP to the LIA) preceding the recovery from the ‘nadir’ (lowest point) of the LIA.

      KevinUK

  36. Jason

    Roman,

    It would be interesting to find out what portion of the trends in graph #3 are the result of station drop off, and what portion results from changing adjustments on the same set of stations.

    I’d love to see a version of this graph from 1975 to 2000 in which all stations that are missing data from more than one year during that period are removed.

  37. Andy

    Roman, it’s been said that UHI would show up in night time temp changes so I adapted your script to get a look see. The structure of the files is the same so the only thing I should need to do is change the file names being read in and it should operate the same way. Makes sense, but that didn’t happen.

    After running the line:
    adjs = diff.calc(v2.meanx,v2.madjx)

    I get this error:

    Error in outmat[i, 1] = as.numeric(substr(chx1, 13, 16)) :
    subscript out of bounds

    Any ideas?

    • RomanM

      You don’t mention it, but I assume that you are talking about using the two min files from the GHCN site.

      I don’t have time this morning to look at it, but I will do so this afternoon.

    • Andy

      When running the script on the min temperature files, it seems to break down with this code:

      v2.madjx = v2.madj[-which(is.na(inds))]
      indsx = inds[-which(is.na(inds))]
      v2.meanx = v2.mean[indsx]
      idv2x = idv2[indsx]
      idv2adjx = idv2adj[-which(is.na(inds))]

      All of these “x” variables are empty, which explains the subscipt out of bounds error.

      Any ideas?

      • RomanM

        Not off hand.

        The function “reconcile” checks the adjusted data file to see what the common stations and years are and returns the appropriate line number for each unadjusted year and station combination. The variable inds contains those line “numbers”.

        The portion you have posted in your comment removes the unmatched line numbers.

        I will have to download the min files and see what they look like. It may be tomorrow afternoon before I can do that.

      • RomanM

        I have now looked at the min (and the max) files and figured out what caused the glitch. In the “mean” files there were some adjusted lines that did not correspond to anything in the unadjusted data file. (how would that occur anyway?). The code you gave included the removal of those lines.

        Neither the max nor the min files displayed that same property so the removal was unnecessary. trying to remove a NULL set of lines produced an error. I have redone the script and uploaded versions for the max files and min files to this web site.

        The extensions say “doc”, but the files are simple text files. The names of many of the variables are the same so running one script after the other would overwrite those variables. No attempt was made to link specific station-year maxs and mins to each other. In fact, the number of maxs values is not the same as the number of min values anyway.

        Some observations on the results: min values are depressed more than the max values. Also, oddly enough, from 1991 to 2006, the average max adjustments become positive and increase to about .05 during this time interval.

        Hope everything works for you.

      • Andy

        It works fine now. Thanks for your efforts. I haven’t had a chance to look extensively at your code changes yet but I’ll get to it tomorrow.

        I did glance over the Peterson paper at

        http://www.ncdc.noaa.gov/oa/climate/ghcn-monthly/images/ghcn_temp_overview.pdf

        and I have to say, these databases at GHCN are a mess.

        It all seems so adhoc.

        Since GHCN is the basis for much of the historical temperature data from a variety of sources it seems they (the climate scientists) really should get a handle on the data before they make pronouncements such as they do.

  38. Hi Roman,

    have you considered using code.google.com to host your code? It’s a proper Subversion repository, so you can track any changes, and you can include links directly to the source without having to play games with the file name extensions.

    Cheers,
    — Rafe

  39. Marguerite Mingorance

    There are a number of potential problems with these graphs.

    First, the dataset trend distribution is too pretty. Stations get moved, the equipment changed, it is erratic, there is no reason to expect that the data would graph out so pretty.

    Second, the slight positive bias bothers me. Even if moves and gear changes average out, stations that started in natural areas but were subsumed by urban sprawl should have been adjusted down to compensate for UHI effects. This should show up in the distribution.

    Most all the means in the third graph are negative, I would expect this bias to show up in the second graph (Annual Averages). Is there an error?

    Third, only about half the GHCN records are actually used by people, so there would be no reason to bother adjusting the others. If it is true that all the adjustments are on the 600 or so records that get used in IPCC reports and such, then you have averaged down the mean with a bunch of irrelevant records, and the actual mean adjustments should be twice what show up.

    Fourth, any sites that have been encroached on by urban sprawl should have negative adjustments applied. These need to be checked for.

    • RomanM

      There are a number of potential problems with these graphs.

      First, the dataset trend distribution is too pretty. Stations get moved, the equipment changed, it is erratic, there is no reason to expect that the data would graph out so pretty.

      My opinion is that the graph shape is more a result of the pattern of addition or deletion of stations over time than from the specific type of adjustment made at those stations. Even if the adjustment at a particular station is constant over time for a station, as that station is added to or removed from the data set, it will have an impact on the average at those times. Compare graph three to graph four from about 1900 forward. The number of stations before that time was relatively smaller and presumably the adjustments were of a different character.

      Second, the slight positive bias bothers me. Even if moves and gear changes average out, stations that started in natural areas but were subsumed by urban sprawl should have been adjusted down to compensate for UHI effects. This should show up in the distribution.

      I agree that UHI adjustments should reduce the rising slope in graph 4, but in order to look at that one would need to see exactly which stations were adjusted and by how much. Over the past several years, there have been examples of where the amount of the adjustment was extremely small or even in the wrong direction for some stations.

      Most all the means in the third graph are negative, I would expect this bias to show up in the second graph (Annual Averages). Is there an error?

      No, no error that I am aware of. The number of points in graph 2 is enormous (which is why I reduced the plotting symbol size way down). It is visually pretty much impossible to see the trend in this cloud of points. As I mentioned in the post, I included it to show the variation in magnitude of the individual adjustments over time.

      Third, only about half the GHCN records are actually used by people, so there would be no reason to bother adjusting the others. If it is true that all the adjustments are on the 600 or so records that get used in IPCC reports and such, then you have averaged down the mean with a bunch of irrelevant records, and the actual mean adjustments should be twice what show up.

      Yes, I agree here too. However, I don’t know which records are used in the gridded and global series, so it was not possible to narrow the data down for that purpose.

      Fourth, any sites that have been encroached on by urban sprawl should have negative adjustments applied. These need to be checked for.

      This too needs metadata for a proper evaluation.

      • Marguerite Mingorance

        I hope you search out which records are actually used in graphs presented by GW folks.

        I’d like to see you graph the records that they eliminate, to see if they exhibit any trends (like a cooling one, that they sought to eliminate).

        Another problem: Let’s assume that the adjustments are valid (I don’t think they are, but let’s assume for a moment). What would you expect the graph of the mean of the adjustments to look like?

        Changes to location of measurement sites can result in anomalies that are both up and down. Changes to equipment would be the same. Even if changes in a particular year followed a pattern (like if they replaced one model of recorder with another, across a bunch of stations, and the new gear all recorded a bit higher, and had to be adjusted for), from year to year those means should bounce around.

        They don’t. They make a LINE, from 1900 to the present. WTF? As Seaman Beaumont says in The Hunt for Red October, “That’s got to be man-made.”

      • Marguerite Mingorance

        Oops, I forgot to acknowledge the dip in graph 3, around 1990. However, it corresponds with the sudden reduction in data sets between 1987 and 1990.

      • RomanM

        Determining the records that are used is a lot more work than I would like to put into this data. I think that it will be a more productive use of time to look at the recently released CRU stuff to figure out what is there.

        Valid changes due to changes in location and observing methodology should produce a step change. It is possible, however, that altering the time of observation routine could be unidirectional resulting in temperatures that are consistently higher or consistently lower (but same direction for multiple stations in a network). However, these should all be one-time step changes.

        The only adjustments that would be variable over time could be for slow changes to the immediate environment of the station (e.g. UHI). If the intent is to isolate climate effects from UHI, then one would expect adjustments decreasing over time to account for the non-climatic increase due to urban development. I can’t think of too many genuinely quantifiable effects that would demand a monotonically increasing adjustment. Deterioration of equipment is a possibility but I would think that scientific study would be needed to determine the magnitude and pattern of such an effect.

        Yes, the slope is man-made, but it is still my opinion that it is more related to the changes in station numbers than anything else.

      • Marguerite Mingorance

        You mean elimination of station data sets that don’t confirm their main thesis creates a trend in the data?

  40. Pingback: Do “Adjustments” Lead to a Warming Bias? « the Air Vent

  41. P. Solar

    Using the word sourcecode in square brackets should allow posting this sort of code. Here goes:

    #####################
    # read data from two files which have been downloaded from 
    # http://www1.ncdc.noaa.gov/pub/data/ghcn/v2/
    # and decompressed by an external program
    
    #v2.mean.Z
    #v2.mean.adj.Z
    
    v2.mean = readLines("v2.mean") 
    v2.madj = readLines("v2.mean_adj")
    
    length(v2.mean) # 595759
    length(v2.madj) # 422373
    
    #last ten lines of adjusted file are identical and contain no information
    #remove 9 of them
    v2.madj = v2.madj[1:422364]
    
    #identify matching station and year lines in both sets
    #extract identifying info
    
    idv2 = substr(v2.mean,1,16)
    idv2adj = substr(v2.madj,1,16)
    
    sum(idv2[-length(idv2)] > idv2[-1]) #0
    sum(idv2adj[-length(idv2adj)] > idv2adj[-1]) #0
      
    #check to see if both setrs are in alphabetical order
    #if so the pairing process is faster
    
    
    #function to pair lines
    reconcile= function(dat1,dat2) {
       leng1 = length(dat1)
       leng2 = length(dat2)
       id.pos = rep(NA, leng2)
          curr = 1
      for (i in 1:leng2) { j = curr
        while (dat2[i] >= dat1[j])  {j=j+1}
       if (dat2[i]==dat1[j-1]) {
           id.pos[i]=j-1
           curr = j}}
       id.pos }
    
    inds = reconcile(idv2,idv2adj)
    
    #check to see if there are adjusted lines without originals in the raw data
    #remove if necessary
    
    sum(is.na(inds)) #31
    
    v2.madjx = v2.madj[-which(is.na(inds))]
    indsx = inds[-which(is.na(inds))]
    v2.meanx = v2.mean[indsx]
    idv2x = idv2[indsx]
    idv2adjx = idv2adj[-which(is.na(inds))]
    
    identical(idv2x,idv2adjx) # TRUE
    
    #function to calculate individual monthly differences
    
    diff.calc = function(dat1,dat2) {
          len = length(dat1)
      outmat = matrix(NA,len,13)
        st = 17 + (5*(0:11))
        en = st+4
        x1 = x2 = rep(NA,12)
     for (i in 1:len) {chx1 = dat1[i]
        chx2=dat2[i]
        outmat[i,1] = as.numeric(substr(chx1,13,16)) 
        if (outmat[i,1] != as.numeric(substr(chx2,13,16))) return("Error")
       for (j in 1:12) {
         x1[j] = as.numeric(substr(chx1,st[j],en[j]))
         x2[j] = as.numeric(substr(chx2,st[j],en[j]))}
       x1[x1==-9999]=NA
       x2[x2==-9999]=NA
       outmat[i,2:13] = (x2-x1)/10}
      outmat}
      
    #adjustment = adjusted - unadjusted
    adjs = diff.calc(v2.meanx,v2.madjx)
    
    #some statistics
    12*422342  # 5068104 total number of monthly values
    sum(is.na(adjs[,-1])) # 205985 (4.06%) NAs
    sum( adjs[,-1]==0,na.rm=T) # 1631153  (32.18%) unadjusted values
    
    #calculate annual average for each station in a given year
    year=adjs[,1]
    ann.mean = rowMeans(adjs[,2:13],na.rm=T)
    
    #calculate average of all adjustments in a given year
    annadj = data.frame(year,ann.mean)
    
    aveadj = c(by(annadj,annadj$year, function(x) mean(x$ann.m)))
    
    plot(year,ann.mean,cex=.25,main = "Annual Averages for Individual Stations",
       xlab="Year", ylab="Degrees (C)" )
    
    plot(as.numeric(names(aveadj)),aveadj, main = "Mean Annual GHCN Adjustment",
      xlab = "Year",ylab = "Degrees (C)")
    
    
    • RomanM

      P. Solar: Thanks for the information.

      Actually, this post is almost two years old and I have used the “sourcecode” tag in some later threads, e.g.here.

      It does make it easier for the reader to copy the code, because a mouseover produces a floating window in the upper right portion of the text and this allows the reader to copy all of the code with a single click without having to select that code first.

      However, long scripts can be somewhat bulky and interfere with the “flow” of a post so that sometimes it might be preferable to put them in a separate file.

  42. PSolar,

    May I ask how you came across this thread? As RomanM says it’s almost two years old and is IMO a seminal thread as it subsequently sparked off a lot of activity by the ‘Blackboard crew’ as I call them (zeke h, nick s, r broberg, moshpit, the ccc guys etc) to attempt to refute Roman’s analysis here.

    Have you read this thread and if not please do so. For example http://statpad.wordpress.com/2009/12/12/ghcn-and-adjustment-trends/#comment-195 and hopefully you’ll agree that despit ethe fact that a further two years have expired the GHCN database is still a mess. BEST haven’t improved the situation in any real way and in fact, if anything they’ve WORST it.

    Now whatever happened to Giorgio Gilestro? Most of the people contributing to this thread are still around and stiil post regularly on various blogs (particlarly CSIRO Mannian hockey-stick apologist Prof. Nick Stokes BSc,MSc,PhD. GG is conspicuous by his absence.

    KevinUK

  43. Roman, I just left this comment at Lucia’s citing your “Mean Annual GHCN Adjustments” graph from this post. It falls into the category of “things that make you say hmmm”.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s