I will be away from home (carrying out a study of the effect of Seasonal Local Warming on golf courses of the Dominican Republic) for a week so I will likely not reply to any comments during that time. It`s a tough job, but somebody has to do it…

## Will the Real Rapid City Please Stand UP!

This is a post about the quality of temperature data. You can spend a lot of time to find methods to maximize the information you squeeze out of data, but unless the data itself is reliable, all of the effort is wasted. Recently, I ran across an example which I found somewhat disconcerting.

I had been testing some methods for estimating the temperature at particular locations in a geographic grid cell from the temperature data set released by the Met Office. The grid cell was chosen on the basis that was a reasonable collection of stations available for use in the procedure: 40 – 45 N by 100 – 105 W in the north central region of the United States. I chose a station with a longer fairly complete record and my intent was to look at distance based weighting for estimating the temperature at that station site using the neighboring stations. Then I could compare the actual measured temperature to the estimated temperature to evaluate how well I had done. But my results seemed poorer than I had expected. At that point, I thought that perhaps I should look more closely at the station record.

The station I had chosen was Rapid City, South Dakota – ID number 726620 with a current population close to 60000 people according to Wikipedia . For comparison purposes, I collected the same station’s records from a variety of other sources: GISS “raw” (same as “combined”) and homogenized directly from the Gistemp web pages, GHCN “raw” and adjusted from the v2 data set and what was listed as the same two GHCN records from the Climate Explorer web site. The subsequent analysis proved quite interesting.

Filed under Uncategorized

## Anomaly Regression – Do It Right!

I have been meaning to do this post for several years, but until now I have not found a particularly relevant time to do it. In his recent post at the Air Vent, Jeff Id makes the following statement:

Think about that. We’re re-aligning the anomaly series with each other to remove the steps. If we use raw data (assuming up-sloping data), the steps

in this casewere positive with respect to trend, sometimes the steps can be negative. If we use anomaly alone (assuming up-sloping data), the steps from added and removed series aretoward aalwaysreduction in actual trend. It’s an odd concept, but the key is that they are NOT TRUE trend as the true trend, in this simple case, is of course 0.12C/Decade.

The actual situation is deeper than Jeff thinks. **The usual method used by climate scientists for doing monthly anomaly regression is wrong! **Before you say, “Whoa! How can a consensus be *wrong*?”, let me first give an example which I will follow up with the math to show you what the problem is.

Filed under Uncategorized

## Faster Version for Combining Series

I have written a faster version for combining temperature time series (allowing for weights) in R. I had hoped to post it along with an example using it, but I got sidetracked so the example is not completed yet. However, I am posting the new function so that Jeff can use it in his latest efforts. This version runs a LOT faster and more efficiently than the one posted earlier .

As well, I have found a “bug” in that version which causes the script to fail when any series is missing all of the values for some month. When the data is run in the newer version, you get results for all months, but I don’t think the results are necessarily realistic for the month in question. There is no way to infer what the values for that month might look like for that station without making further assumptions so, at the moment, the best bet is to remove the offending series from the analysis and run the program without it. I have included a short program to identify possible problem series.

Anyway, here is the updated version. I might get the example done tomorrow if the nice weather we are currently having goes away. ;)

####Function for combining series # For even faster calculation, use all=F # to speed up multiple grid calculations temp.combine = function(tsdat, wts=NULL, all=T) { ##### version2.0 ### subfunction to do pseudoinverse psx.inv = function(mat,tol = NULL) { if (NCOL(mat)==1) return( mat /sum(mat^2)) msvd = svd(mat) dind = msvd$d if (is.null(tol)) {tol = max(NROW(mat),NCOL(mat))*max(dind)*.Machine$double.eps} dind[dind<tol]=0 dind[dind>0] = 1/dind[dind>0] inv = msvd$v %*% diag(dind, length(dind)) %*% t(msvd$u) inv} ### subfunction to do offsets calcx.offset = function(tdat,wts) { ## new version nr = length(wts) delt.mat = !is.na(tdat) delt.vec = rowSums(delt.mat) row.miss= (delt.vec ==0) delt2 = delt.mat/(delt.vec+row.miss) co.mat = diag(colSums(delt.mat)) - (t(delt.mat)%*% delt2) co.vec = colSums(delt.mat*tdat,na.rm=T) - colSums(rowSums(delt.mat*tdat,na.rm=T)*delt2) co.mat[nr,] = wts co.vec[nr]=0 psx.inv(co.mat)%*%co.vec } ### main routine nr = nrow(tsdat) nc = ncol(tsdat) dims = dim(tsdat) if (is.null(wts)) wts = rep(1,nc) wts=wts/sum(wts) off.mat = matrix(NA,12,nc) dat.tsp = tsp(tsdat) for (i in 1:12) off.mat[i,] = calcx.offset(window(tsdat,start=c(dat.tsp[1],i), deltat=1), wts) colnames(off.mat) = colnames(tsdat) rownames(off.mat) = month.abb matoff = matrix(NA,nr,nc) for (i in 1:nc) matoff[,i] = rep(off.mat[,i],length=nr) temp = rowMeans(tsdat-matoff,na.rm=T) pred=NULL residual=NULL if (all==T) { pred = c(temp) + matoff residual = tsdat-pred } list(temps = ts(temp,start=c(dat.tsp[1],1),freq=12),pred =pred, residual = residual, offsets=off.mat) } #pick out those series with have at least nn + 1 observations in every month #Outputs a logical vector with TRUE indicating that that sereis is OK dat.check = function(tsdat, nn=0) { good = rep(NA,ncol(tsdat)) for (i in 1:ncol(tsdat)) good[i]= (min(rowSums(!is.na(matrix(tsdat[,i],nrow=12))))>nn) good }

Filed under Uncategorized

## Combining Stations (Plan C)

Sometimes a dog can be not only a “best friend”, but also a catalyst for solving statistical problems. A case in point: last night at 2:30 AM, our dog, Mr. Chips, woke me up to let him out to do his business. When I couldn’t go back to sleep immediately, I lay in the dark rethinking my current math problem, in this case, the “offset matching” on which I have already posted several times.

I had been concerned with the implementation of weights for combining series and several things did not sit quite right in my mind. So, as is good statistical procedure, I reformulated the problem not from the point of view of Tamino’s sum of squared differences of pairs of offset station values, but rather from the consideration of the estimation problem itself. It put everything into place for me.

Filed under Uncategorized

## Weighted Seasonal Offsets

I have updated the R script for offset matching when combining multiple temperature time series so that it can now be used with different station weights. This is useful when area weighting is desired in computing a mean grid cell series or when combining grid cells from different latitudes. This method can now be used at every stage of determining a global series, from combining all sources at a single station to averaging mean latitudinal series to calculate a global mean – a good example of the adage “when all you have is a hammer, everything looks like a nail”. :)

Filed under Uncategorized

## Comparing Single and Monthly Offsets

On the Open Mind blog, *carrot eater* posted the following comment with an inline response from Tamino:

Are the offsets mu specific by month? I’m curious at what point you take out the seasonal cycle.

[Response: There’s a single mu for a given station. Only after combining stations do I compute anomalies to remove the seasonal cycle. Computing a separate mu for each month is a viable alternative, but forces all station records to have the same average seasonal cycle as the reference station. If one wishes to study the seasonal cycle itself, it’s better to use a single offset for the station as a whole.]

So, is Tamino correct that the difference is only important with regard to the seasonal cycle or is the situation more complicated than that?

Filed under Uncategorized