(This is merely a test of a full post.. The full post can be found at

http://www.climateaudit.org/?p=5151 )

So how *did* Steig et al. calculate the AWS reconstruction? Since we don’t have either the satellite data or the exact path they followed using the RegEM Matlab files, we can’t say for sure how the results were obtained. However, using a little mathemagic, we *can* actually take the sequences apart and then we can also calculate reconstructed values for those times when the reconstructions have been replaced by the measured temperature values. I realize that there are problems with the AWS data sets, but for our purposes that is irrelevant. This analysis can easily be repeated for the updated results if and when Prof. Steig does the recalcs.

We are given 63 reconstructed sequences each extending monthly from 1957 to 2006. The first step is to look only at the part of those sequences which is not affected by either the measured values nor by the satellite induced manipulation – the time period 1957 – 1979. We begin by using a principal component analysis (and good old R) on the truncated sequences. For our analysis, we will need three variables previously defined by Steve Mc.: Data, Info, and recon_aws. This data is available at *****. I also include some optional plots which are not run automatically. Simple remove the # sign to run them.

early.rec = window(recon_aws,start=1957,end = c(1979,12))

rec.pc = princomp(early.rec)

#plot(rec.pc$sdev)

#rec.pc

# Comp.1 Comp.2 Comp.3 Comp.4 …

#1.334132e+01 6.740089e+00 5.184448e+00 1.267015e-07 …

There are 63 eigenvalues in the PCA. The fourth largest one is virtually zero. This makes it very clear that the reconstructed values are a simple linear combination of only three sequences, presumably calculated from the RegEM machine. The sequences are not unique (which does not matter for our purposes. We can use the first three principal components from princomp:

rec.score = rec.pc$scores[,1:3]

#matplot(time(early.rec),rec.score,type = “l”)

#flip the PCs for positive slope

#does not materially affect the procedures

rec.score = rec.score%*%diag(sign(coef(lm(rec.score ~ time(early.rec)))[2,]))reg.early =lm(early.rec[,1:63]~rec.score[,1]+rec.score[,2]+rec.score[,3])

reccoef=coef(reg.early)

pred.recon = predict(reg.early)

max(abs(predict(reg.early)-early.rec)) #[1] 5.170858e-07

I have flipped the orientation of the PCs so that the trend coincides with a positive trend in time for each component. This also doesn’t make a difference to our result(I sound like the team now) because the coefficient which multiplies the PC in the reconstruction will automatically change sign to accommodate the change. pred.recon is the reconstruction using only the three PCs and the largest difference Steig’s values and pre.con is .0000005.

Now, what about the time period from 1980 to 2006? The method we have just used will not work because some of the reconstructed values have been replaced by measurements from the AWS. Suppose we assume that three “PCs” were used with the same combining coefficients as in the early period. The solution is then to find intervals of time in the latter time range where we have three (or more) sites which do not have any actual measurements during that period. Then, using simple regression, we can reconstruct the PC sequences. We can actually reduce the problem to just two intervals: 1980 – 1995 and 1996-2006:

#laterecon

#figure out first and last times for data for each aws

dat_aws = window(Data$aws[,colnames(early.rec)],end=c(2006,12))

not_na = !is.na(dat_aws)

trange = t(apply(not_na,2,function(x) range(time(dat_aws)[x])))#construct 1980 to 1995

#identify late starters

sort(trange[,1])[61:63]

# 89828 21363 89332

#1996.000 1996.083 1996.167

names1 = c(“89828”, “21363”, “89332”)

reg1 = lm(rec.score ~ early.rec[,names1[1]]+ early.rec[,names1[2]]+ early.rec[,names1[3]])

rec.pc1 = cbind(1,window(recon_aws[,names1],start=1980,end =c(1995,12)))%*%coef(reg1)

#matplot(rec.pc1,type = “l”)#construct 1980 to 1995

#identify earlier finishers

sort(trange[,2])[1:3]

# 89284 89834 89836

#1992.167 1994.750 1995.917

names2 = c(“89284″,”89834″,”89836”)

reg2 = lm(rec.score ~ early.rec[,names2[1]]+ early.rec[,names2[2]]+ early.rec[,names2[3]])

rec.pc2 = cbind(1,window(recon_aws[,names2],start=1996))%*%coef(reg2)

#matplot(rec.pc2,type = “l”)

Now we put the two together and check to see how the fit is (ignoring the elements that have been replaced)

#check for fit 1980 to 2006

late.rec = window(recon_aws,start=1980)

estim.rec = cbind(1,rbind(rec.pc1,rec.pc2))%*%reccoef

diff.rec = estim.rec-late.rec

max(abs(diff.rec[!not_na]))#[1] 5.189498e-07

matplot(time(late.rec),diff.rec,type=”l”,main = “Diff Between Recons and Measured Temps”,ylab = “Anom (C)”,xlab = “Year”)

Not bad! Again we have a good fit. The differences visible in the plot are the differences between the reconstruction we just did and the AWS data. Some of the larger ones are probably due to the already identified problems with the data sets.

Finally we put everything together into a single piece:

#merge results

recon.pcs = ts(rbind(rec.score,rec.pc1,rec.pc2), start = 1957,frequency=12)

colnames(recon.pcs) = c(“PC1″,”PC2″,”PC3”)

matplot(time(recon.pcs),recon.pcs,type = “l”,main = “AWS Reconstruction PCS”,xlab=”Year”, ylab =”PC Value”)

legend(1980,-40,legend = c(“PC1”, “PC2″,”PC3”),col = c(1,2,3), lty = 1)#trends (per year) present in the PCs

reg.pctrend = lm(recon.pcs ~ time(recon.pcs))

coef(reg.pctrend)

# PC1 PC2 PC3

#(Intercept) -199.522419 -62.90221792 -222.1577310

#time(recon.pcs) 0.101605 0.03189134 0.1128271

The trend slopes (which are per year) should be taken with a grain of salt since the PCs are multiplied by different coefficients (often quite a bit less than one).

Where can we go from here? I think one interesting problem might be to evaluate the effect of surface stations on the PC sequences (only pre-1980). This might give some insight on what stations drive the results. As well, AWS positions appear to be related to the coefficients which determine the weight each PC has on that AWS reconstruction. Finally, measures of “validation” can now be calculated using the observed AWS data to see how well they did. All without knowin’ how they dunnit!

Roman, Thanks much for this post. I’ve been through the whole thing thoroughly now.

Can you tell me if the anomalies are available for the AWS stations? I have the raw temperature data from CA but I am new to the surface data and how to get it?

Put a bee into your bonnet.

Wonder why

and how who dunnit.

==================

Open the door and everybody walks in! 😉

My first visitors. Welcome.

Kim, you are always inscrutable

Jeff, the “anomalies” were calculated by Steig et al when they did the reconstruction. We can extract them using

The values from the graph can be used to anomalize after 2006. Watch the quotes if you copy the R stuff. I often use search and replace in the R script editor to replace them if there are a lot.

Roman, good work on this. I was pleased to see this posted on ClimateAudit. Steve is brilliant and prolific, but it is nice to see him have some additional help. CA is developing a nice group of statistical experts.

I don’t know how Steve does it ! I found it pretty tough to try to keep up with and reply to comments in the thread he was kind enough to let me post. He is juggling three, four or more threads at a time!

This stuff appeals to the teacher in me. I look forward to spreading the statistical gospel to the climate scientist community. 😉

Roman,

Thanks for that. I was interested because feeding the non-anomalized data into the RegEm was concerning me, there are supposed to be corrections implemented for cyclic noise. I think they just fed the data in and ran the anomaly calc afterward.

I’m still fighting with setting up the matlab RegEm matricies, I need to input something for the TTLS truncation and so far no luck finding a good reference for that.

Great post by the way.