Over the last month or two, I have been looking at the response by Schmidt, Mann, and Rutherford to McShane and Wyner’s paper on the hockey stick. In the process, I took a closer look at the total least squares (error –in-variables or EIV) regression procedure which is an integral part of the methodology used by the hockey time in their paleo reconstructions. Some of what I found surprised me.
A brief explanation of the difference between ordinary least squares (OLS) and EIV is in order. Some further information can be found on the Wiki Error-in-Variables and Total least squares pages. We will first look at the case where there is a single predictor.
The OLS model for predicting a response Y from a predictor X through a linear relationship looks like:
α and β are the intercept and the slope of the relationship, e is the “random error” in the response variable due to sampling and/or other considerations and n is the sample size. The model is fitted by choosing the linear coefficients which minimize the sum of the squared errors (which is also consistent with maximum likelihood estimation for a Normally distributed response:
The problem is easily solved by using matrix algebra and estimation of uncertainties in the coefficients is relatively trivial.
EIV regression attempts to solve the problem when there may also be “errors”, f, in the predictors themselves:
The f-errors are usually assumed to be independent of the e-errors and the estimation of all the parameters is done by minimizing a somewhat different looking expression:
under the condition
X* and Y* (often called scores) are the unknown actual values of X and Y. The minimization problem can be recognized as calculating the minimum total of the perpendicular squared distances from the data points to a line which contains the estimated scores. Mathematically, the problem of calculating the estimated coefficients of the line can be solved by using a principal components calculation on the data. It should be noted that the data (predictors and responses should each be centered at zero beforehand.
The following graphs illustrate the difference in the two approaches:
What could be better, you ask. Well, all is not what it may seem at first glance.
First, you might have noticed that the orange lines connecting the data to the scores in the EIV plot are all parallel. The adept reader can see from considerations of similar triangles that the ratio of the estimated errors, e and f (the green lines plotted for one of the sample points), is a constant equal to minus one times the slope coefficient (or one over that coefficient dependent on which is the numerator term). The claim that somehow this regression properly takes into account the error uncertainty of the predictors seems spurious at best.
The second and considerably more important problematic feature is that, as the total-least squares page of Wiki linked above states: “total least squares does not have the property of units-invariance (it is not scale invariant).” Simply put, if you rescale a variable (or express it in different units), you will NOT get the same result as for the unscaled case. Thus, if we are doing a paleo reconstruction and we decide to calibrate to temperature anomalies as F’s rather than C’s, we will end up with a different reconstruction. How much different will depend on the details of the data. However, the point is that we can get two different answers simple by using different units in our analysis. Since all sorts of rescaling can be done on the proxies, the end result is subject to the choices made.
To illustrate this point, we use the data from the example above. The Y variable is multiplied by a scale factor ranging from .1 to 20. The new slope is calculated and divided by the old EIV slope which has also been scaled by the same factor.
If the procedure was invariant under scaling (as OLS is), then the result should be equal to unity in all cases. Instead, one can see that for scale factors close to zero, the EIV behaves basically like ordinary OLS regression . As the scale factor increases, the result (after unscaling) looks like 1/the OLS slope with the X and Y variables switched.
However, that is not the end of the story. What happens if both X and Y are each scaled to have standard deviation 1? This surprised me somewhat. The slope can only take either +1 or -1 (Except for some cases where the data form an exactly symmetric pattern for which ALL slopes produce exactly the same SS).
In effect, this would imply that, after unscaling , the EIV calculated slope = sd(Y) / sd(X). To a statistician, this would be very disconcerting since this slope is not determined in any way shape or form by any existing relationship between X and Y – this is the answer when the data points are in an exactly straight line or when they are uncorrelated. It is not affected by sample size so clearly large sample convergence results would not be applicable. On the other hand, the OLS slope = Corr(X,Y) * sd(Y) / sd(X) for the same case so that this criticism would not apply to that result.
So far we have only dealt with the univariate case. Perhaps if there are more predictors, this would alleviate the problems we have seen here. All sorts of comparisons are possible, but to shorten the post, we will only look at the effect of rescaling the all of the variables to unit variance.
Using R, we generate a sample of 5 predictors and a single response variable with 20000 values each. The variables are generated “independently” (subject to the limits of a random number generator). We calculate the slope coefficients for both the straight OLS regression and also for EIV/TLS:
All of the theoretical coefficients are supposed to be zero and with 20000 observations, the difference should not be large. In fact 95% confidence intervals for the OLS coefficients all contain the value 0. However, the EIV result is completely out to lunch. The response Y must be scaled down by about 20%, to have all of the EIV coefficients become small enough to be inside the 95% CIs calculated by the OLS procedure.
EIV on Simulated Proxy Data
We give one more example of what the effect of applying EIV in the paleo environment can be.
As I mentioned earlier, I have been looking at the response by Gavin and crew to the M-W paper. In their response, the authors use artificial proxy data to compare their EIV construct to other methods. Two different climate models are used to generate a “temperature series” and proxies (which have auto-regressive errors) are provided. I took the CSM model (time frame used 850 to 1980) with 59 proxy sequences as the data. An EIV fit with these 59 predictors was carried out using the calibration period 1856 to 1980. A simple reconstruction was calculated from these coefficients for the entire time range.
This reconstruction was done for each of the three cases: (i) Temperature anomalies in C, (ii) Temperature anomalies in F, and (iii) Temperature anomalies scaled to unit variance during the calibration period. The following plots represent the difference in the resulting reconstructions: (i) – (ii) and (i) – (iii):
The differences here are non-trivial. I realize that is not a reproduction of the total method used by the Mann team. However, the EIV methodology is central to the current spate of their reconstructions so some effect must be there. How strong is it? I don’t know – maybe they can calculate the Fahrenheit version for us so we can all see it. Surely, you would think that they would be aware of all the features of a statistical method before deciding to use it. Maybe I missed their discussion of it.
A script for running the above analysis is available here (the file is labeled .doc, but it is a simple text file). Save it and load into R directly: Reivpost