This is a post about the quality of temperature data. You can spend a lot of time to find methods to maximize the information you squeeze out of data, but unless the data itself is reliable, all of the effort is wasted. Recently, I ran across an example which I found somewhat disconcerting.
I had been testing some methods for estimating the temperature at particular locations in a geographic grid cell from the temperature data set released by the Met Office. The grid cell was chosen on the basis that was a reasonable collection of stations available for use in the procedure: 40 – 45 N by 100 – 105 W in the north central region of the United States. I chose a station with a longer fairly complete record and my intent was to look at distance based weighting for estimating the temperature at that station site using the neighboring stations. Then I could compare the actual measured temperature to the estimated temperature to evaluate how well I had done. But my results seemed poorer than I had expected. At that point, I thought that perhaps I should look more closely at the station record.
The station I had chosen was Rapid City, South Dakota – ID number 726620 with a current population close to 60000 people according to Wikipedia . For comparison purposes, I collected the same station’s records from a variety of other sources: GISS “raw” (same as “combined”) and homogenized directly from the Gistemp web pages, GHCN “raw” and adjusted from the v2 data set and what was listed as the same two GHCN records from the Climate Explorer web site. The subsequent analysis proved quite interesting.