Met Office and the Cru Data

Some days ago, the UK Met Office released a subset of the HadCrut3 data set at this web page.  I thought that it might be useful for people to be able to download the data and read it into the R statistical package for further analysis.  Along the way, I discovered, like Harry, that not everything is as quite as advertised so hopefully my observations on the differences will also prove useful.

The released data is available as a windows zipfile.   In the set, each station is available as a single text file (with no extension).  The files are contained in 85 directories (at this point in time) whose two digit names are the first two digits of the station numbers of the files in that directory.  It makes it easy to locate a station, but adds a bit of complication to trying to read the entire set simultaneously.  However, it turns out that  R actually can do it pretty easily J.  One word of warning:  When I downloaded the zipfile yesterday, I had found that new stations had been added to the set without any announcement of the changes on the web change that the set had been altered.  In the future, it is a good idea to keep an eye on the size of the file to determine if it is being altered.  Note to Met Office – it would be nice to have a date attached to the file indicating the most recent version.

I tried to download the file and unzip it in R, but there were problems so the recommended action is to do a manual download and unzip to a new directory which contains no other files.  Otherwise, the R script which I will provide will not work properly.

Some other things to be aware of:

  • The format of each station file is described in great detail on the metoffice web page linked above.  Each file is supposed to start with 21 lines of information about the data followed the temperatures (not anomalies) in an array whose rows represent years and columns months. Well… not exactly.  In fact, at the moment, the number of information lines varies from 13 to 23, so reading the information requires a slightly different approach.  Again, the features and functions of R proved useful.
  • The missing value designation is given as -99.  Yes, for the temperatures.  However, it appears the height (altitude) of the station missing value is -999 (not indicated on the web page).
  • The longitude value for the station is backwards from what I have usually seen in most other situations.  Negative values indicate East of Greenwich and positive values West.  This was evident from a simple plot of the station locations. Not exactly wrong, but possibly misleading in an analysis.

There may be other things I have missed however what I have done seems to work reasonably well.  My script contains the following functions:

  • find.obs:  given a set of file names, read the descriptive information to determine what is station info and what is data.  Output is the number of “info” lines in each file.
  • given a set of file names output from the previous function, read the descriptive information.  Output includes a list of all station information plus the monthly temperature “normals” and monthly “standard deviations”.  These come from various sources (indicated in the station information and may not match the averages calculated in the reference time period.
  • met.dat: given same information as for, read the data exactly in the format in the file
  • monthly.calc:  calculate the “normals” for a given reference period
  • anom.calc: calculate anomalies either using a pre-specified vector of normals or from scratch for a given time period.  The output is a list of time series.
  • annual.calc:  calculate annual station means for a specified set of stations (set up for temperatures, not anomalies).  Output is a matrix of  time series.

I hope there are no bugs in the script which can be found here.  Enjoy Kenneth!


Filed under Uncategorized

10 responses to “Met Office and the Cru Data

  1. hpx83

    I have been able to process this format and slam it into an SQL-database. If you’re interested, I could probably generate some other format for you – if it simplifies your work. I’ll be found at my blog 🙂

    // Hpx

    • RomanM

      I went to your blog, but I couldn’t figure out an appropriate place to leave a comment.

      Thanks for the offer, but as far as my personal needs are concerned, the problem is under control. I can see where the ability to access and/or process or plot data either for individual stations or groups of stations could be useful to the general public.

      The Met Office site allows for extraction of individual stations, but that’s all.

  2. steven mosher


    Head over here

    He’s got a post up on watts today struggling with the error bars due to coverage bias in brohan 06. I’m swamped today,

  3. Patrick M.

    The link for “windows zipfile” seems to be broken?

  4. RomanM

    Thanks, Patrick. It is fixed now.

    • RomanM

      I actually saw it earlier today.

      In my earlier examination of the data set, I had notice that there were a fair number of “normals” (i.e. monthly means purportedly calculated for the period 1961-1990) which were not the same as the calculation done on the downloaded data. However, in the station information portion of the data there is a variable called “normal source” indicating where the monthly values given came from. The value “Data” indicates that these are calculated from the data. The other choices were “Extrapolated” and “WMO” (and some NAs). There are some “Data”s which were not within the rounding error limits and these must be the ones JGC told Met about. Since I hadn’t looked at the program they use for calculating the anomalies, I mentally filed it away for later.

      As far as their statement that the coverage error is due to missing Arctic and Antarctic data, I am not sure that I buy that. If the data used is the same as what is used in their gridded data, then I suspect Africa and South America are a larger problem than the poles for undercoverage.

  5. Karl Lehenbauer

    Hi Roman,

    Thanks so much for your readmet.r program. I have only been doing R for a couple weekends, and reading this was very helpful, like how you construct what you’re going to fill in advance. (I was trying to construct stuff more on the fly.)

    I reformatted your program and commented it pretty extensively — I find this helps me to understand what’s going on. The file is at if you’re interested.


    • RomanM

      Glad to hear that it was of use. I took a look at what you did and it will make it easier for people to customize the script to adapt it for their own purposes. I hope you didn’t feel like Harry of climategate fame while you were doing the comments. 😉

      R has great capability for creating structures as the results of operations, but I tend to think about script flow better if I set up well defined structures as I go along. This focuses my thinking and tends to reduce the number of programming errors that must be located when the script bails out on me.

      R has many different types of variables: vectors, matrices, lists, etc. and using the properties of these can really simplifies matters. Of all the programming languages that I have used (since 1968!), R is the most versatile of the bunch and by far the best, although the initial learning stages were somewhat frustrating.

  6. Pingback: The Goreinch « TWAWKI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s