GHCN Twins

There has been a flurry of activity during the last several months in the area of constructing global temperature series. Although a variety of methods were used there seemed to be a fair amount of similarity in the results.

Some people have touted this as a “validation” of the work performed by the “professional” climate agencies which have been creating the data sets and working their sometimes obscure manipulation of the recorded temperatures obtained from the various national meteorological organizations that collected the data. I for one do not find the general agreement too surprising since most of us have basically used the same initial data sets for our calculations. I decided to take a closer look at the GHCN data since many of the reconstructions seem to use it.

At this point, I will look at some of the data which can be found in the GHCN v2.mean.z file. This represents monthly mean temperatures calculated for a collection of temperature stations around the globe. People tend to refer to these as “raw” data, but statistically this is not really the case. Each monthly record represents a calculated value of a larger set of data.  The means can be calculated in different ways. Since there can be missing daily values due to equipment malfunctions and other reasons, decisions have been made and implemented in how to do the calculation in such cases. It is very possible that further  “adjustments” may have also been made before the data reaches GHCN.

A description of the format of all the temperature datasets is given in the readme file at the same site. In particular, station series id formats are as follows:

Each line of the data file has:

station number which has three parts:
country code (3 digits)
nearest WMO station number (5 digits)
modifier (3 digits) (this is usually 000 if it is that WMO station)

Duplicate number:
one digit (0-9). The duplicate order is based on length of data.
Maximum and minimum temperature files have duplicate numbers but only one time series (because there is only one way to calculate the mean monthly maximum temperature). The duplicate numbers in max/min refer back to the mean temperature duplicate time series created by (Max+Min)/2.

In this analysis, I will be referencing the station with a single id which is constructed from the station number by connecting it to the modifier with a period. The temperature data in the R program will consist of a list of (possibly multivariate) time series with each element of the list containing all of the “duplicates” for a particular station.

Some “quality control” has also been done by GHCN.  The earlier “readme” also explains the file v2.mean.failed.qc.Z:

Data that have failed Quality Control:
We’ve run a Quality Control system on GHCN data and removed data points that we determined are probably erroneous. However, there are some cases where additional knowledge provides adequate justification for classifying some of these data as valid. For example, if an isolated station in 1880 was extremely cold in the month of March, we may have to classify it as suspect. However, a researcher with an 1880 newspaper article describing the first ever March snowfall in that area may use that special information to reclassify the extremely cold data point as good. Therefore, we are providing a file of the data points that our QC flagged as probably bad. We do not recommend that they be used without special scrutiny. And we ask that if you have corroborating evidence that any of the “bad” data points should be reclassified as good, please send us that information so we can make the appropriate changes in the GHCN data files. The data points that failed QC are in the files v2.m*.failed.qc. Each line in these files contains station number, duplicate number, year, month, and the value (again the value needs to be divided by 10 to get degrees C). A detailed description of GHCN’s Quality Control can be found through

I didn’t really find the “detailed description”, but a check of the file indicated that almost all of the entries in the file represented temperature values that had been removed from the data set (replaced by NAs). I could only find seven monthly temperatures where the original value was replaced by a new one. Without the necessary supplementary metadata, there is no sense at looking at that file any further.

The names of the stations and other geographic data for them can be found in the v2.temperature.inv file. There are 7280 stations listed with 4495 unique WMO numbers. Each station can have one or more (up to a maximum of 10) “duplicates” so there are a total of 13486 temperature series in the data set. The duplicate counts look like:

Dups: 1 2 3 4 5 6 7 8 9 10
Freq: 4574 1109 601 502 269 111 56 44 12 2


Before the data can be used to construct the global record, it is necessary to somehow combine the information from the various duplicate versions into a single series.  One reasonably expects that the duplicates should be pretty much identical (with the occasional error) since they are supposedly different transcriptions of the same temperature series. The difficulty is that there are almost 13500 series which have to be looked at – not a simple matter.

The 4574 stations which were represented by a single series can be ignored for the moment – there is little that can be done to evaluate – so, for simplicity, I decided to only look at the “twins”, i.e. those 1109 stations which have exactly two records. These were identified and the range for the simple difference between the two series was calculated. No heavy duty stats were necessary to take a look at the amount of agreement there was between them.

I expected most of the stations to look like this:

However, there were others that looked like this one:

How many? Well that was the surprise! I graphed those which were not identical over their overlap periods and put the graphs into pdfs.

No overlap: 232 stations (no plots)
Zero difference: 152 stations (no plots)
Range between 0 and 1- : 233 stations (4.9 MB pdf)
Range between 1 and 3- : 321 stations (6.9 MB pdf)
Range between 3 and 12.9 : 171 stations (4.1 MB pdf).

The latter two files are the more interesting ones.  “Duplicate” has taken on a whole new meaning for me.

If there are any errors in my results, R scripts or explanations of the phenomena in the plots, I would like to hear about them. 

I have uploaded the R script as an ordinary text file called twin analysis.doc.


Filed under Uncategorized

42 responses to “GHCN Twins

  1. Mark F

    Kuska looks to have the two series offset in time, perhaps by as much as a season or two. ??

    • RomanM

      I don’t think so.

      That possibility occurred to me as well and I checked over a portion of the data with up to a two year interval with no matches. I believe that something else must be going on, but I can’t say what at this stage.

  2. I’ve checked your results and can verify that at least for kuska and its partner station serahs, both have one series which is different. I can’t see any offset in the time axis, but also suspected the same thing when i saw the seasoal component. My guess is they are different instruments located reasonably near each other.

    It’s just another example of why the temperature data is like a box of old socks.

  3. RomanM

    There are one heck of a lot of stations like that in the data set.

    Do you think that these different instruments would be located at the same site? If they are classified as “close stations”, they have different three-digit modifiers. In my understanding, these are supposed to be “duplicates” of the same station, yet there is no obvious pattern to the differences.

    It also appeared to me that some countries (China?) had more than their share of such not-obviously-the same pairs.

  4. I think the difference might be in how the mean is calculated. If the data were hourly, perhaps the day could be averaged for mean whereas max/min could also be used. That might explain the seasonal component.

    For Kuska, the two sets have extreme values which visually track very very closely. I don’t believe that two stations within a couple of miles would be that perfect. These things aren’t cared for that well.

    • RomanM

      Jeff, I can’t believe that there could be that much variation in the difference of the two series, if the means were calculated in two different ways. Strange.

      I checked to see what Gistemp did to “combine” these into a single series. The average of the two differs from the combined result be either exactly either .10 or .15 (again with no discernible pattern). If what you suggest is correct, it would not make much sense for me to just basically average them.

  5. Pingback: Fraternal or Paternal « the Air Vent

  6. vdp

    perhaps Jeff is right that it is max/min versus a more proper mean. That meta data must exist. I am also curious to see the magnitude of disagreement after annual smoothing and on the same vertical scale as the temperature series. Is it significant for the kuska trend.

    • RomanM

      Re: vdp ,

      Right now, I am more interested in understanding exactly why these drastic differences exist.

      If Jeff is right, then as mentioned above, I fail to understand why these two series should be averaged as Gistemp does. If you only do this with some of the stations (e.g. you can’t with stations with only one series), this introduces unnecessary biases and confuses error bounds.

      In my book, getting the data properly collected is just the first step. Once that is done, one can move on to the rest of the process.

  7. I don’t have a clue whether my guess is right. After all this is climate science, where three guess explanations for increasing Antarctic ice pass the peer bar.

    Once something is proposed, isn’t it accepted then? Why do I get the short straw?

    Oh yeah, I blog, mea culpa.

    Welcome to the club Roman.. hehe.

  8. I had started quantifying duplicates too for recent data for other reasons. There is one issue that is driving me nuts and I don’t know if it has any effect on the outcomes – since I first noticed it in January I find it again and again – overlapping station records with the same name and WMO number but with a different modification flag. These were previously represented in the GISS analysis by a single data set but now have been split into two.

    A few examples (at random from SE Asia): Jaipur/Sang (WMO code 42348 – mod flags 0, 1); Chhor (WMO code 41768 – mod flags 0, 3); Luang-Prabang (WMO code 48930 – mod flags 0, 4); Zhanyi (WMO code 56786- mod flags 0, 10);

    I’ve had to leave off this while work is busy at present. Any thoughts anyone?

    • intrepid_wanders

      Check the “mod” flag against the site quality with the US sites (visual)/and Surface Stations (WUWT). I always thought the “mod” was site quality, but I do not believe there is a “10” like in Zhanyi’s case. “9” should be a missing value and “1” chould be good data.

    • Re: Verity Jones (Jul 19 19:25),

      If this is still unresolved:

      What step in the GISS analysis are you referring to?

      In the case of Luang-Prabang v2.mean starts out with three records (0:1951-91 with gaps, 1:1961-80 and 4:1987 on). Step 1 combines 0 and 1, but cannot combine these with 4 because of an overlap failure, and step 2 retains both, but would presumably have dropped 4 at an earlier date once the record length dropped below 20 years, in 1986 or earlier, leaving a single data set. (Without checking I cannot now recall if step 1 has a similar “20 year rule”, but step 2 does).

      In the case of Chhor, v2.mean starts out with four records (0:1932-74 with gaps, 1:1971-91, 2:1972-90 and 3:1987 on). Step 1 combines 1, 2 and 3 (as 3) but cannot combine these with 0 because of an overlap failure, and step 2 retains both, and it is not immediately obvious why there would be a single data set at any recent date.

      In the case of Jaipur / Sang v2.mean starts out with two records (0:1881-1981 with a gap from 1961-1980 and 1:1961-80). Step 1 fails to combine these because of the overlap failure, and step 2 retains both. Again it is not immediately obvious why there would be a single data set at an earlier date, although if you have the files for that earlier date if might be worth checking whether v2.mean contains one record or two. Is there a possibility that the 1961-80 record was removed from an earlier combined record on the basis of station metadata coming to light which suggested that this would be appropriate?

      Finally, the two Zhanyi records are not two records for a single station, but two separate stations at different altitudes, 1900 m and 866 m. Do you have an earlier file with only one of these two?

      I’m curious how far back you have GISS output files to find these single records. The earliest I have is 2008_04, and from memory I think the archived output on the GISS server only went back to 2008_01 before they ran out of disk space and dropped the older archived files. I did suggest at that time that an external hard drive would probably be adequate to retain these files and meet any download demands if added to the server, but I suppose the NASA budget could not stretch that far. Perhaps I should have suggested Walmart.

  9. There is something odd about station 38974. Here is the inventory entry:
    22938974001 SERAHS 36.53 61.22 279 286R -9FLDEno-9x-9WARM GRASS/SHRUBC
    22938974002 KUSKA 36.53 61.22 625 286R -9FLDEno-9x-9WARM GRASS/SHRUBC
    Two stations, same WMO no, same lat/lon, but 346m difference in altitude. If you look on GE, the region is flat and about 238 m altitude.

    It’s in Turkmenistan.

    Here are the duplicates from 1970 to 1973. It seems that dup 2 just has consistently more seasonal variation. Temps in dec C x 10:

    2293897400201970 36 95 96 174 225 247 262 271 191 149 120 46
    2293897400211970 25 87 90 167 230 256 271 280 193 144 105 34

    2293897400201971 11 77 133 164 227 263 272 248 195 151 133 92
    2293897400211971 0 66 123 154 235 271 285 255 199 145 124 84

    2293897400201972 -31 -73 63 164 192 245 245 217 199 152 115 5
    2293897400211972 -38 -84 53 160 189 250 256 224 199 143 102 -3

    2293897400201973 2 95 98 175 203 265 272 257 185 153 109 46
    2293897400211973 -10 84 87 167 208 276 282 269 189 147 93 36

    • RomanM

      Re: Nick Stokes ,


      From the CRU file crustnused.txt, we see different station id numbers but identical location and different altitudes:

      389740 365 -612 279 SERAHS TURKMENISTAN
      389870 365 -612 625 KUSKA TURKMENISTAN

      I looked up the two names (Kushka appears to be the name for Kuska) on Google maps and it showed locations about 100 km apart. It is possible that in this case, it could be two different stations, but with errors in the descriptive data .

    • JR

      There is no such thing as station “38974” in GHCN. They are different stations “22938974001” and “22938974002”. They share the same WMO number because that is the closest WMO station to the two stations, but their modifiers are “001” and “002” so they are not the actual WMO station with the number 38974. If it was the actual WMO station, the modifier would be “000”.

      Using observations from ISH, I was able to compute monthly means that get close to GHCN. To me, it looks like “22938974002”, duplicate “0” is actually the ISH station labeled 389870, and it looks like “22938974001”, duplicate “2” is actually the ISH station labeled 389740. I’m working on the others to see if I can identify them.

  10. D. Robinson

    Does it make any sense to see if the differences between temp sets have a net impact on the temperature trend?

    • RomanM

      Re: D. Robinson ,

      Not sure what you mean by temp sets. The “twins” being considered are supposedly two records from the same station. it is my understanding that these are usually ordered by length from largest to smallest (0 longest, 1 next longest, etc.).

      What I would like to do is get some idea why there is so much difference between some of the pairs.

      • Geoff Sherrington

        Regarding Australia at least, “twins” can arise from revision of old metadata sheets (a continuing process). In this case the results are different for a reason and it is not valid to average them. One is deemed better than the other. I have no idea how a large scale program could cover all eventualities and so I merely repeat the calls of others for a clean slate, agreed rules & revision of the whole global data set.

        Preferably the revision would leave out missing data instead of trying to infill it.

        Because the instrumented period is used to calibrate so many proxies, it is urgent that an agreed and stable temperature record be established. If it is not, then proxy work loses incentive, value or even validity.

  11. toto

    From the GHCN description paper (the one they point to on their website):

    “Unfortunately, because monthly mean temperature has been computed
    at least 101 different ways (Griffiths 1997), digital
    comparisons could not be used to identify the remaining duplicates. Indeed, the differences between two different methods of calculating mean temperature at a particular station can be greater than the temperature difference from two neighboring stations. Therefore, an intense scrutiny of associated metadata was conducted. Probable duplicates were assigned the same station number but, unlike the previous cases,
    not merged because the actual data were not exactly identical (although they were quite similar).”

    So “duplicate” seems to mean “separate series, computed in possibly different ways, for the same station”. I think your results are quite compatible with this description. In particular the “Kuska” series you show in your post may well result from silly things like different start/end days for each “month”, normalisation, whatever. (Most examples in your PDFs look more like isolated, erroneous data points in one of the series).


    • RomanM

      Re: toto ,

      I am not sure that 101 ways is not some sort of hyperbole.

      I agree that the different methods may very well produce different results, however, I have a hard time imagining that this should produce differences that look almost random.

      If I calculate the mean using hourly data and a second “mean” by averaging the max and min, these would likely not be the same, but I would expect that one of these would likely be quite consistently higher or consistently lower than the other depending on latitude and local climate.
      Why would two records exist using separate methods during an overlap period? Furthermore, would not the use of the former method be more prevalent in more modern times? It seems a bit of a stretch as an explanation.

      I found the following statement curious:

      Probable duplicates were assigned the same station number but, unlike the previous cases,
      not merged because the actual data were not exactly identical (although they were quite similar).

      This implies that they have done previous merging in forming some of the “raw” series.

      If the remainder of the series, “could not be identified as to provenance, I would seriously question their use in a reconstruction. Whether they belong to another station or are created by using a different calculational method, it is a case of “apples and oranges”.

      • toto

        Why would two records exist using separate methods during an overlap period?

        Because the data come from many, many different sources, which may have different procedures for calculating monthly averages.

        If I calculate the mean using hourly data and a second “mean” by averaging the max and min, these would likely not be the same, but I would expect that one of these would likely be quite consistently higher or consistently lower than the other depending on latitude and local climate.

        As I said, simply using a different starting day for each month would produce a pattern similar to the one you show in this post (seasonal variation in the difference due to lag, while preserving the overall dynamics). That’s just one possibility, of course. I’m not saying that’s what happened here.

        This implies that they have done previous merging in forming some of the “raw” series.

        The “previous cases” were actually the Max and Min series, which they say are unique.

        As I said, the paper has a lot of useful information in it. I guess it would be worth checking it out, if you ever decide to seriously work on that stuff.

        Good luck!

  12. BobN

    Roman – The daily mean for a station can be quite different if one uses hourly data versus min/max. A while back I looked at about 5 months of hourly data for a USGS temperature gage location (Lambertville NJ). The difference in daily mean calculated using the 24 hourly readings versus the average of the daily min/max was up to several degrees C. The difference in monthly means ranged from 0.4 to 1.1 C. It was also interesting to note that, in my very limited evaluation, the monthly mean calcuated with the average of min/max was always warmer than that caculated with hourly data. (Don’t know that this would affect any trend though)

    • RomanM

      Re: BobN ,

      It was also interesting to note that, in my very limited evaluation, the monthly mean calcuated with the average of min/max was always warmer than that caculated with hourly data.

      This is one of my points in my reply to toto above.

  13. RomanM,

    GHCN is built up of 30-odd different historical datasets all meshed together. Its not inconceivable that two historical datasets might have included the same station with differing methods of mean calculations and different labels/ids.

    As an aside, it might be an interesting exercise to run temp series using different methods of dealing with dups (e.g. drop all dups, average all dups, etc.) to see the potential magnitude of the bias introduced by differing methods of reconciling duplicate records.

    • RomanM

      Re: Zeke ,

      That raises the question of where is the metadata for indicating where each series came from and what methods were used in the creation of the data. Indeed, if this is the case, then the data can hardly be considered “raw”.

      My suggestion for the first run in the exercise might be to look at the 4574 stations with only a single series.

      • Steven Mosher

        I can do that since I have stats on how many duplicates per station.

        Should take me a couple mintues.

  14. Steve McIntyre

    A few years ago, I inquired about metadata for provenance of different GHCN versions (some of which comes from Smithsonian WWR, for example) plus other sources listed in the original GHCN article by Peterson (as I recall). There didn’t seem to be any documentation available at the time. They said that they’d look into it, but I never heard back.

    As others have observed, sometimes the versions seem to be scribal variations of the same data set. But in other cases, different series seem to have been combined.

  15. Nice, now that’s a proper thread.

  16. RomanM

    toto ,

    I have seen the Peterson-Vose paper, but until looking at the data itself, I was unaware just how varied these duplicate series were.

    The paper does not however offer much help in evaluating the provenance of a particular series and I suspect that the metadata which Steve requested likely does not exist in a usable (electronic) form.

    Under the circumstances, I doubt that putting all of these series through a sausage grinder to calculate a global record is a good way to go. IMHO, a better route would be to create a new data base with a clear single aim of optimally monitoring the the temperatures on both a regional and global basis (possibly in conjunction with satellite measurements) . This would need to be implemented by a professional agency with the ability and resources to access the necessary information world wide.

    Meanwhile, it might be interesting to continue using what does exist using different approaches.

    • Ruhroh

      Dear Dr. R;

      A recent recipe tip for sausage grinders was recently offered up at .

      Troyca did get an email reply from Menne regarding his ‘Pairwise Homogeneity Adjustment’ techniques for creating USHCNv2;

      ” 1) The PHA is run separately on Tmax & Tmin series to produce the USHCNv2 dateset. We then compute Tavg as (Tmax+Tmin)/2. ”

      I have tried to bring this to the attention of various purportedly-statistically-sanguine bloggers and posters, but no one seems to agree with my perception that this is egregiously improper. ( I make no claim to sanguinity of the type I just defined.)

      From my simplistic viewpoint, the PHA algo is intended to ‘adjust’ discontinuities due to undocumented station moves and other such impairments. Given that, it is hard for me to understand separate ‘adjustment’ of Tmin and Tmax without linking them. If it is necessary to adjust Tmax at a certain time, due to an apparent discontinuity (caused by e.g. a station move), I think that same point in time would need to also adjust Tmin. It seems totally bogus to separately ‘adjust’ the Tmin and Tmax series and then combine them by a pointwise average.

      So, is this OK in your view?
      (perhaps it is a ‘victimless-crime’ as no one gives any weight to the increasingly-bizarre NASA machinations), but it is still our tax dollars at work…

      Thanks for your hosting,

      • RomanM

        There is nothing inherently incorrect in looking independently at how the max and min relate at a particular location relate to those of neighboring stations. Obviously, this can give one more information about the way a possible move may have affected the patterns of a particular weather station. In cases for which the moves are documented, one would certainly do this. IMHO, any adjustments would then be made taking both of these sequences into account.

        I have always had strong doubts about the general use of a single station (or even two or three) for making ad hoc adjustments when the supposed change in the station behavior is undocumented. The dependency on the validity of the temperature series is extremely important – errors (or the presence of unrecorded factors) at the comparison stations can make the situation worse. A better method could be to construct a “regional temperature series” with and/or without the station being corrected and examine the difference between that and the station in question. This is similar to the concept of looking at the residuals in a regression.

        It could be done on the max or the min or the “average”, (max + min)/2 (which a statistician would term the mid-range . However, if one were to examine the min and max separately, I would hope that the reasons for making any adjustments whatsoever would then be reconsidered simultaneously for both series. Not to do so is at best naive data analysis.

        By the way, if one accepts that the mid-range is indeed a valid statistic for summarizing the “temperature” at a location, there is no mathematical reason why one could not meaningfully calculate that same statistic when either the max or the min has been separately adjusted. The devil (as always) is in the specific details.

  17. Pingback: Kuska and Serahs « The Whiteboard

  18. I was able to identify the source of


    as Tmean/Tmax/Tmean data originating from a Russian exchange program and archived as NDP040.

    In addition, NDP048 includes data for the same set of stations as 4 and 6 hourly synoptic data.

  19. RomanM


    I will take a look at what you’ve done and leave a comment at your site.

  20. Steven Mosher

    So I guess I wasnt so stupid looking at how duplicates are handled. hehe.

    The other thing I looked at was the “missing” duplicates. The duplicates come in a series
    0,1,2,3,4.. up to 9
    In some cases the series is complete. In other cases NOAA has removed the duplicate, becuase they determined that it was a duplicate.. Mc mentioned this a while ago. I have some Nasty R that calculates all this stuff if you want it Roman.

  21. MacViolinist

    I’m interested in this project. I’m a former database engineer now going back to school to study statistics, and I happen to own a professional research company. (It’s consumer stuff, and likely not interesting in itself to you. That’s why I’m going back to school.)

    If anyone wants to point me in the direction I need to go with the data/specs, I would be interested in creating the database RomanM mentioned above.

    I don’t think the technology would be difficult. More just knowing where to find good data. If you’re at all interested, this would be a very cool project to work on while I’m an old man in a young school. 🙂

  22. MacViolinist

    Just adding a comment so that I get notified if anyone replies. Should have done that in the first place.

  23. Ruhroh

    Thanks for your response to my inquiry regarding the propriety of separately ‘adjusting’ Tmax and Tmin series in a completely unlinked fashion, and then combining them on a point-by-point basis.

    As I understood the PHA algo, it is intended to automatically detect and correct for discontinuities introduced by (undocumented) station moves, etc.
    But I don’t remember any justification of applying it to overnite lows independently from daily high temps. In USHCNv2, this and many other devlish details are buried so deep in dreck that they cannot be exorsized, one by one.
    Your comments about choosing the midrange are not lost on me.

    Anyway, your timely response will allow me to move on to a new idea. As I often read to my kids,
    “Sausage My Nose!” …

    Carpe Dinero
    (there was no ‘reply’ link to allow this response to hook onto yours.)

  24. Pingback: More on running the USHCNv2 PHA (Response from Dr. Menne) « Troy's Scratchpad

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s