<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments for Statistics and Other Things</title>
	<atom:link href="http://statpad.wordpress.com/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://statpad.wordpress.com</link>
	<description></description>
	<lastBuildDate>Sun, 19 Feb 2012 22:15:14 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>Comment on EIV/TLS Regression &#8211; Why Use It? by Hu McCulloch</title>
		<link>http://statpad.wordpress.com/2010/12/19/eivtls-regression-why-use-it/#comment-1257</link>
		<dc:creator><![CDATA[Hu McCulloch]]></dc:creator>
		<pubDate>Sun, 19 Feb 2012 22:15:14 +0000</pubDate>
		<guid isPermaLink="false">http://statpad.wordpress.com/?p=440#comment-1257</guid>
		<description><![CDATA[I&#039;ve now had a chance to read at least the first few sections of Wayne Fuller&#039;s 1987 book Measurement Error Models.  Also relevant are Carroll and Ruppert, &quot;The Use and Misuse of Orthogonal Regression in Linear Errors-in-Variables Models,&quot; American Statistician 1996, 50: 1-6 (CR) and Ammann, Genton and Li, &quot;Technical Note:  Correcting for signal attenuation from noisy proxy data in climate reconstructions,&quot;  Climate of the Past 2010, 6:273-79 (AGL).  All three discuss both what I call &quot;Adjusted Least Squares&quot; above, as well as TLS.  

Fuller is the standard reference and derives what I called the Adjusted Least Squares estimator in his sections 1.1 and 1.2.  He awkwardly calls it &quot;least squares corrected for attentuation, but following AGL it would be appropriate to call it &quot;Attenuation-Corrected LS&quot;, or ACLS.  (In fact AGL call it ACOLS, but if OLS is modified, it is no longer ordinary, so the O becomes unnecessary or even incorrect.)  

Then in his section 1.3, Fuller derives the TLS estimator.  Unfortunately he does not mention either term, &quot;TLS&quot; or &quot;Orthogonal LS&quot;.  He provides three different derivations of the formula, but unfortunately does not include the nice SVD approach.  He shows that it is consistent (p. 32), and gives its asymptotic covariance matrix.  On p. 47, he discusses a method for obtaining confidence intervals that I haven&#039;t digested, but which sounds similar to the classical Fieller confidence sets for CCE, because &quot;The confidence sets constructed by the methods of this section will not always be a single interval and can be the whole line.&quot;  

AGL embrace ACLS as a solution to the inconsistency of ICE (Inverse Calibration Estimation)  when estimating past temperatures from noisy proxies.  In doing so, they invent what might be called &quot;Backward LS&quot; (BLS).  

They have a case in which their &quot;X&quot; is a noisy proxy (such as an isotope ratio), and &quot;Y&quot; is instrumental temperature, so that var(f) is important but var(e) is nearly 0 and the usual roles of &quot;X&quot; and &quot;Y&quot; are reversed.  Instead of just regressing &quot;X&quot; on &quot;Y&quot; and inverting to reconstruct Y from X as in CCE (Classical Calibration Estimation), they regress &quot;Y&quot; on &quot;X&quot; and then correct this ICE estimate  for attenuation using ACLS.  In order to obtain an estimate of var(f) they do regress X on Y and compute the variance of the residuals, but otherwise discard the consistent CCE estimate of the slope.  

Simulations indicate that AGL&#039;s BLS substantially mis-corrects for attenuation in the slope (and therefore also its reciprocal) when the significance of the slope is marginal (expected t = 2), if the standard unbiased estimator s^2 of the X on Y regression error variance is used to estimate var(f) and the adjustment factor is taken as infinity whenever var(x) &lt; var(f).  But surprisingly, BLS appears to exactly give the CCE estimator of the slope (and therefore the OLS estimator of its reciprocal) when the Mean Squared Error (dividing by n instead of n-2) is used to estimate var(f).   So at best it has no advantage over CCE and at worst it is a lot worse.  

AGL propose a further modification of ACLS  when both types of error are present, using a cross-validation procedure involving only the X and Y data to estimate their &quot;k&quot;.  However, this can&#039;t be valid, since as Fuller points out, the slope is not identified unless there is some external information about the absolute or relative size of the two types of error.  If var(e) (in this case the temperature measurement error) is known externally, it can just be used with ACLS to modify the CCE estimator as I noted in an earlier comment here.]]></description>
		<content:encoded><![CDATA[<p>I&#8217;ve now had a chance to read at least the first few sections of Wayne Fuller&#8217;s 1987 book Measurement Error Models.  Also relevant are Carroll and Ruppert, &#8220;The Use and Misuse of Orthogonal Regression in Linear Errors-in-Variables Models,&#8221; American Statistician 1996, 50: 1-6 (CR) and Ammann, Genton and Li, &#8220;Technical Note:  Correcting for signal attenuation from noisy proxy data in climate reconstructions,&#8221;  Climate of the Past 2010, 6:273-79 (AGL).  All three discuss both what I call &#8220;Adjusted Least Squares&#8221; above, as well as TLS.  </p>
<p>Fuller is the standard reference and derives what I called the Adjusted Least Squares estimator in his sections 1.1 and 1.2.  He awkwardly calls it &#8220;least squares corrected for attentuation, but following AGL it would be appropriate to call it &#8220;Attenuation-Corrected LS&#8221;, or ACLS.  (In fact AGL call it ACOLS, but if OLS is modified, it is no longer ordinary, so the O becomes unnecessary or even incorrect.)  </p>
<p>Then in his section 1.3, Fuller derives the TLS estimator.  Unfortunately he does not mention either term, &#8220;TLS&#8221; or &#8220;Orthogonal LS&#8221;.  He provides three different derivations of the formula, but unfortunately does not include the nice SVD approach.  He shows that it is consistent (p. 32), and gives its asymptotic covariance matrix.  On p. 47, he discusses a method for obtaining confidence intervals that I haven&#8217;t digested, but which sounds similar to the classical Fieller confidence sets for CCE, because &#8220;The confidence sets constructed by the methods of this section will not always be a single interval and can be the whole line.&#8221;  </p>
<p>AGL embrace ACLS as a solution to the inconsistency of ICE (Inverse Calibration Estimation)  when estimating past temperatures from noisy proxies.  In doing so, they invent what might be called &#8220;Backward LS&#8221; (BLS).  </p>
<p>They have a case in which their &#8220;X&#8221; is a noisy proxy (such as an isotope ratio), and &#8220;Y&#8221; is instrumental temperature, so that var(f) is important but var(e) is nearly 0 and the usual roles of &#8220;X&#8221; and &#8220;Y&#8221; are reversed.  Instead of just regressing &#8220;X&#8221; on &#8220;Y&#8221; and inverting to reconstruct Y from X as in CCE (Classical Calibration Estimation), they regress &#8220;Y&#8221; on &#8220;X&#8221; and then correct this ICE estimate  for attenuation using ACLS.  In order to obtain an estimate of var(f) they do regress X on Y and compute the variance of the residuals, but otherwise discard the consistent CCE estimate of the slope.  </p>
<p>Simulations indicate that AGL&#8217;s BLS substantially mis-corrects for attenuation in the slope (and therefore also its reciprocal) when the significance of the slope is marginal (expected t = 2), if the standard unbiased estimator s^2 of the X on Y regression error variance is used to estimate var(f) and the adjustment factor is taken as infinity whenever var(x) &lt; var(f).  But surprisingly, BLS appears to exactly give the CCE estimator of the slope (and therefore the OLS estimator of its reciprocal) when the Mean Squared Error (dividing by n instead of n-2) is used to estimate var(f).   So at best it has no advantage over CCE and at worst it is a lot worse.  </p>
<p>AGL propose a further modification of ACLS  when both types of error are present, using a cross-validation procedure involving only the X and Y data to estimate their &quot;k&quot;.  However, this can&#039;t be valid, since as Fuller points out, the slope is not identified unless there is some external information about the absolute or relative size of the two types of error.  If var(e) (in this case the temperature measurement error) is known externally, it can just be used with ACLS to modify the CCE estimator as I noted in an earlier comment here.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on EIV/TLS Regression &#8211; Why Use It? by RomanM</title>
		<link>http://statpad.wordpress.com/2010/12/19/eivtls-regression-why-use-it/#comment-1218</link>
		<dc:creator><![CDATA[RomanM]]></dc:creator>
		<pubDate>Sun, 12 Feb 2012 20:38:40 +0000</pubDate>
		<guid isPermaLink="false">http://statpad.wordpress.com/?p=440#comment-1218</guid>
		<description><![CDATA[Sorry for the delayed response, Hu.

The method I tried to use was adapted from my tls.alt function above, but re-weighting the variables with the estimated variances from the previous step.  However, the re-weights had the effect of stalling the coefficients on the OLS values and sending one of the weights off to infinity.

I think that your comments on the number of unknowns in TLS exceeding the amount of information in the observations is particularly on target here.  As is, that approach seems to be a bit of a dead end.]]></description>
		<content:encoded><![CDATA[<p>Sorry for the delayed response, Hu.</p>
<p>The method I tried to use was adapted from my tls.alt function above, but re-weighting the variables with the estimated variances from the previous step.  However, the re-weights had the effect of stalling the coefficients on the OLS values and sending one of the weights off to infinity.</p>
<p>I think that your comments on the number of unknowns in TLS exceeding the amount of information in the observations is particularly on target here.  As is, that approach seems to be a bit of a dead end.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on EIV/TLS Regression &#8211; Why Use It? by Hu McCulloch</title>
		<link>http://statpad.wordpress.com/2010/12/19/eivtls-regression-why-use-it/#comment-1217</link>
		<dc:creator><![CDATA[Hu McCulloch]]></dc:creator>
		<pubDate>Sun, 12 Feb 2012 16:41:04 +0000</pubDate>
		<guid isPermaLink="false">http://statpad.wordpress.com/?p=440#comment-1217</guid>
		<description><![CDATA[UC --  
   Thanks!  I couldn&#039;t find the Kendall series book you cite, but OSU has a volume with an article on this by the same Cheng and Van Ness that looks similar.  I&#039;ll take a look at it -- and have requested Fuller&#039;s book on Measurement Error problems as well.  (It&#039;s from the 80&#039;s - older than I thought.)

   I agree that CCE is ordinarily the way to go with proxies, since the regression error is much bigger than the measurement error in the instrumental temperature series (we hope!).  But TLS sounds good on paper and is popular with climate people, so it&#039;s good to figure out why it does or doesn&#039;t help.  

   According to Wiki, the emminent economist Paul Samuelson was enthusiastic about TLS  back in the 1940s when he was an eager-beaver  grad student.  But now the econometric texts don&#039;t even mention it.]]></description>
		<content:encoded><![CDATA[<p>UC &#8212;<br />
   Thanks!  I couldn&#8217;t find the Kendall series book you cite, but OSU has a volume with an article on this by the same Cheng and Van Ness that looks similar.  I&#8217;ll take a look at it &#8212; and have requested Fuller&#8217;s book on Measurement Error problems as well.  (It&#8217;s from the 80&#8242;s &#8211; older than I thought.)</p>
<p>   I agree that CCE is ordinarily the way to go with proxies, since the regression error is much bigger than the measurement error in the instrumental temperature series (we hope!).  But TLS sounds good on paper and is popular with climate people, so it&#8217;s good to figure out why it does or doesn&#8217;t help.  </p>
<p>   According to Wiki, the emminent economist Paul Samuelson was enthusiastic about TLS  back in the 1940s when he was an eager-beaver  grad student.  But now the econometric texts don&#8217;t even mention it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on EIV/TLS Regression &#8211; Why Use It? by uc</title>
		<link>http://statpad.wordpress.com/2010/12/19/eivtls-regression-why-use-it/#comment-1211</link>
		<dc:creator><![CDATA[uc]]></dc:creator>
		<pubDate>Fri, 10 Feb 2012 15:35:46 +0000</pubDate>
		<guid isPermaLink="false">http://statpad.wordpress.com/?p=440#comment-1211</guid>
		<description><![CDATA[Catching up slowly, Statistical Regression with Measurement Error 

http://www.amazon.com/Statistical-Regression-Measurement-Error-Statistics/dp/0340614617

seems to be a good book to start. If the ratio of the _error_ variances is known, the _data_ can be scaled so that this ratio becomes one. Then the maximum likelihood solution of the normal ME (measurement error) regression is orthogonal regression (p.9). 

For the calibration problem the book offers interesting reference for comparing ICE, CCE and ME,

C-L. Cheng and C-L. Tsai (1994). &quot;The comparisons of three different linear calibration estimators in measurement error models.&quot;

It is listed in here, http://www.stat.sinica.edu.tw/clcheng/index.htm but it is unpublished manuscript...  

In my opinion CCE is the only way to go with proxies. ICE implies structural model ( we know a lot about past temperatures prior to looking at proxies ), and for orthogonal regression the assumption seems to be that we know the ratio of the error variances prior to looking at proxies. But I&#039;m quite unfamiliar with this topic, so I might be wrong.]]></description>
		<content:encoded><![CDATA[<p>Catching up slowly, Statistical Regression with Measurement Error </p>
<p><a href="http://www.amazon.com/Statistical-Regression-Measurement-Error-Statistics/dp/0340614617" rel="nofollow">http://www.amazon.com/Statistical-Regression-Measurement-Error-Statistics/dp/0340614617</a></p>
<p>seems to be a good book to start. If the ratio of the _error_ variances is known, the _data_ can be scaled so that this ratio becomes one. Then the maximum likelihood solution of the normal ME (measurement error) regression is orthogonal regression (p.9). </p>
<p>For the calibration problem the book offers interesting reference for comparing ICE, CCE and ME,</p>
<p>C-L. Cheng and C-L. Tsai (1994). &#8220;The comparisons of three different linear calibration estimators in measurement error models.&#8221;</p>
<p>It is listed in here, <a href="http://www.stat.sinica.edu.tw/clcheng/index.htm" rel="nofollow">http://www.stat.sinica.edu.tw/clcheng/index.htm</a> but it is unpublished manuscript&#8230;  </p>
<p>In my opinion CCE is the only way to go with proxies. ICE implies structural model ( we know a lot about past temperatures prior to looking at proxies ), and for orthogonal regression the assumption seems to be that we know the ratio of the error variances prior to looking at proxies. But I&#8217;m quite unfamiliar with this topic, so I might be wrong.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on EIV/TLS Regression &#8211; Why Use It? by Hu McCulloch</title>
		<link>http://statpad.wordpress.com/2010/12/19/eivtls-regression-why-use-it/#comment-1199</link>
		<dc:creator><![CDATA[Hu McCulloch]]></dc:creator>
		<pubDate>Tue, 07 Feb 2012 16:45:10 +0000</pubDate>
		<guid isPermaLink="false">http://statpad.wordpress.com/?p=440#comment-1199</guid>
		<description><![CDATA[&lt;blockquote&gt;
After reading your comments, I also tried a sequential reweighted LS approach as well. At each step, the variances of e and f were re-estimated and the regression repeated using the new weights. &lt;/blockquote&gt;
How did you do this?  Originally I was hoping to iteratively estimate the variance of e from the TLS residuals, starting from knowledge of var(f) and an initial guess of var(e) (say from OLS of y on x).  But then I realized that TLS does not give back the true relative variance of f and e, unless the slope happens to be +/- 1 which would only be by accident.  

Although TLS is ML-motivated, it does not necessarily have the nice properties of ML.  ML is consistent if a fixed number of parameters are being estimated with an increasing number of observations.  But in TLS the number of unknowns increases 1-for-1 with the number of observations, so that x* and y* are not consistently estimated.  It appears that as a consequence, the variance ratio is not consistently estimated by the residuals, even though the slope appears to be consistently estimated (from my limited simulations).  

Please e-mail me (and the gang) if and when you add to this interesting discussion, as it proceeds rather slowly.]]></description>
		<content:encoded><![CDATA[<blockquote><p>
After reading your comments, I also tried a sequential reweighted LS approach as well. At each step, the variances of e and f were re-estimated and the regression repeated using the new weights. </p></blockquote>
<p>How did you do this?  Originally I was hoping to iteratively estimate the variance of e from the TLS residuals, starting from knowledge of var(f) and an initial guess of var(e) (say from OLS of y on x).  But then I realized that TLS does not give back the true relative variance of f and e, unless the slope happens to be +/- 1 which would only be by accident.  </p>
<p>Although TLS is ML-motivated, it does not necessarily have the nice properties of ML.  ML is consistent if a fixed number of parameters are being estimated with an increasing number of observations.  But in TLS the number of unknowns increases 1-for-1 with the number of observations, so that x* and y* are not consistently estimated.  It appears that as a consequence, the variance ratio is not consistently estimated by the residuals, even though the slope appears to be consistently estimated (from my limited simulations).  </p>
<p>Please e-mail me (and the gang) if and when you add to this interesting discussion, as it proceeds rather slowly.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on EIV/TLS Regression &#8211; Why Use It? by RomanM</title>
		<link>http://statpad.wordpress.com/2010/12/19/eivtls-regression-why-use-it/#comment-1179</link>
		<dc:creator><![CDATA[RomanM]]></dc:creator>
		<pubDate>Wed, 01 Feb 2012 18:49:33 +0000</pubDate>
		<guid isPermaLink="false">http://statpad.wordpress.com/?p=440#comment-1179</guid>
		<description><![CDATA[Hu, your point on the difference in usage of the two terms EIV and TLS is well taken.  I will start making that distinction in the future.

I have looked at the situation further and did some more analysis which seemed to add further insight in regards to some of your other points:

&lt;blockquote&gt;Although TLS sounds good on paper, in practice it is almost never useful, since it requires knowing the relative variances of the two errors e and f. &lt;/blockquote&gt;

I believe the problems with TLS to go deeper than that. If the relative variances are known, TLS can be applied after weighting the the SS components in the head post inversely to the variances of the e and f &quot;errors&quot; (thus reducing the problem to the equal variance case).  However, the residuals for e and f are still perfectly linearly correlated although the ratio between them is changed to take the weighting into account.

After reading your comments, I also tried a sequential &lt;em&gt;reweighted&lt;/em&gt; LS approach as well.  At each step, the variances of e and f were re-estimated and the regression repeated using the new weights.  However, all of the cases I tried converged to the end result of a LS where the variable with the higher variance has its error term reduced to zero variability.  I also looked very briefly at using maximum likelihood with normally distributed errors, but it wasn&#039;t clear that this would produce any better results.

With regard to the consistency of the parameter estimates, my gut feeling would be that in most cases the results would be consistent although there might be demonstrable bias in the parameters (that converged to zero as the sample size increased).  Where it could be bothersome is in the case that X and Y are uncorrelated with approximately equal variability. However, &#124;I am just guessing on this aspect and it could be wrong.

The possible use of Adjusted OLS is intriguing, but I would need to do some reading to say more about that.   

Getting insight on the behavior of the estimates and their standard errors is made more difficult because the mechanics of the calculation involve portions of the principal components of the variables - in the multivariate case particularly.  Something that might make it easier to comprehend is to do the calculations recursively and examine what happens in that process.  I wrote a short program for this:        

[sourcecode language=&quot;css&quot;]
tls.alt = function(yvec,xvec,wts = c(1,1),tol=1E-6) {
  #wts = (y-variable weight, x-variable weight)
  # typically weights would be 1/ variable variance
  #variables do not have to be centered
    dif = tol+1
    ywt = wts[1]; xwt = wts[2]
    newx = xvec
  while (dif&gt;tol) {
   lastx = newx
   regco = coef(lm(yvec~newx))
   newx = (ywt*regco[2]*(yvec-regco[1]) + xwt*xvec)/(xwt+ ywt*regco[2]^2)
   dif = max(abs(newx-lastx),na.rm=T)}
   yscore = regco[1]+regco[2]*newx
   xres = xvec-newx
   yres = yvec-yscore
   xss = sum(xres^2, na.rm=T)   
   yss = sum(yres^2, na.rm=T)
 list(coefs=regco,yscore = yscore, xscore = newx, xresid =xres,yresid=yres,xss=xss,yss=yss)} 
[/sourcecode]]]></description>
		<content:encoded><![CDATA[<p>Hu, your point on the difference in usage of the two terms EIV and TLS is well taken.  I will start making that distinction in the future.</p>
<p>I have looked at the situation further and did some more analysis which seemed to add further insight in regards to some of your other points:</p>
<blockquote><p>Although TLS sounds good on paper, in practice it is almost never useful, since it requires knowing the relative variances of the two errors e and f. </p></blockquote>
<p>I believe the problems with TLS to go deeper than that. If the relative variances are known, TLS can be applied after weighting the the SS components in the head post inversely to the variances of the e and f &#8220;errors&#8221; (thus reducing the problem to the equal variance case).  However, the residuals for e and f are still perfectly linearly correlated although the ratio between them is changed to take the weighting into account.</p>
<p>After reading your comments, I also tried a sequential <em>reweighted</em> LS approach as well.  At each step, the variances of e and f were re-estimated and the regression repeated using the new weights.  However, all of the cases I tried converged to the end result of a LS where the variable with the higher variance has its error term reduced to zero variability.  I also looked very briefly at using maximum likelihood with normally distributed errors, but it wasn&#8217;t clear that this would produce any better results.</p>
<p>With regard to the consistency of the parameter estimates, my gut feeling would be that in most cases the results would be consistent although there might be demonstrable bias in the parameters (that converged to zero as the sample size increased).  Where it could be bothersome is in the case that X and Y are uncorrelated with approximately equal variability. However, |I am just guessing on this aspect and it could be wrong.</p>
<p>The possible use of Adjusted OLS is intriguing, but I would need to do some reading to say more about that.   </p>
<p>Getting insight on the behavior of the estimates and their standard errors is made more difficult because the mechanics of the calculation involve portions of the principal components of the variables &#8211; in the multivariate case particularly.  Something that might make it easier to comprehend is to do the calculations recursively and examine what happens in that process.  I wrote a short program for this:        </p>
<pre class="brush: css;">
tls.alt = function(yvec,xvec,wts = c(1,1),tol=1E-6) {
  #wts = (y-variable weight, x-variable weight)
  # typically weights would be 1/ variable variance
  #variables do not have to be centered
    dif = tol+1
    ywt = wts[1]; xwt = wts[2]
    newx = xvec
  while (dif&gt;tol) {
   lastx = newx
   regco = coef(lm(yvec~newx))
   newx = (ywt*regco[2]*(yvec-regco[1]) + xwt*xvec)/(xwt+ ywt*regco[2]^2)
   dif = max(abs(newx-lastx),na.rm=T)}
   yscore = regco[1]+regco[2]*newx
   xres = xvec-newx
   yres = yvec-yscore
   xss = sum(xres^2, na.rm=T)   
   yss = sum(yres^2, na.rm=T)
 list(coefs=regco,yscore = yscore, xscore = newx, xresid =xres,yresid=yres,xss=xss,yss=yss)} 
</pre>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on EIV/TLS Regression &#8211; Why Use It? by Hu McCulloch</title>
		<link>http://statpad.wordpress.com/2010/12/19/eivtls-regression-why-use-it/#comment-1168</link>
		<dc:creator><![CDATA[Hu McCulloch]]></dc:creator>
		<pubDate>Sun, 29 Jan 2012 18:43:51 +0000</pubDate>
		<guid isPermaLink="false">http://statpad.wordpress.com/?p=440#comment-1168</guid>
		<description><![CDATA[Here&#039;s my 3-line Matlab script to compute univariate TLS on pre-centered data that has been normalized so that the variances of e and f are equal: 

% TLS0
% univariate TLS with pre-centered and pre-normalized data
%  (zero in name indicates restricted problem)
% x, y ~ n x 1
% x = xstar + f, y = ystar + e
% ystar = beta * xstar, var(f) + var(e) = min.
% x and y have been pre-normalized so that var(e) = var(f)
% by Hu McCulloch 1/28/12, after P. de Groen, &quot;An Introduction to 
%   Total Least Squares,&quot; Nieuw Archief voor Wiskunde, 1996, 14:237-53,
%   www.freescience.info 
function [beta] = TLS0(x, y) 
B = [x y];
[U, S, V] = svd(B, &#039;econ&#039;); % B*V = U*S, S is 2x2, V is 2x2
beta = -V(1,2)/V(2,2); % This correctly gives inf if V(2,2) = 0.
end
%  xstar and ystar are in U somewhere, but I&#039;m not certain how to 
%    recover them. 
return

It&#039;s not really necessary to define &quot;B&quot;, but I&#039;ve included it for clarity.]]></description>
		<content:encoded><![CDATA[<p>Here&#8217;s my 3-line Matlab script to compute univariate TLS on pre-centered data that has been normalized so that the variances of e and f are equal: </p>
<p>% TLS0<br />
% univariate TLS with pre-centered and pre-normalized data<br />
%  (zero in name indicates restricted problem)<br />
% x, y ~ n x 1<br />
% x = xstar + f, y = ystar + e<br />
% ystar = beta * xstar, var(f) + var(e) = min.<br />
% x and y have been pre-normalized so that var(e) = var(f)<br />
% by Hu McCulloch 1/28/12, after P. de Groen, &#8220;An Introduction to<br />
%   Total Least Squares,&#8221; Nieuw Archief voor Wiskunde, 1996, 14:237-53,<br />
%   <a href="http://www.freescience.info" rel="nofollow">http://www.freescience.info</a><br />
function [beta] = TLS0(x, y)<br />
B = [x y];<br />
[U, S, V] = svd(B, &#8216;econ&#8217;); % B*V = U*S, S is 2&#215;2, V is 2&#215;2<br />
beta = -V(1,2)/V(2,2); % This correctly gives inf if V(2,2) = 0.<br />
end<br />
%  xstar and ystar are in U somewhere, but I&#8217;m not certain how to<br />
%    recover them.<br />
return</p>
<p>It&#8217;s not really necessary to define &#8220;B&#8221;, but I&#8217;ve included it for clarity.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on EIV/TLS Regression &#8211; Why Use It? by Hu McCulloch</title>
		<link>http://statpad.wordpress.com/2010/12/19/eivtls-regression-why-use-it/#comment-1167</link>
		<dc:creator><![CDATA[Hu McCulloch]]></dc:creator>
		<pubDate>Sun, 29 Jan 2012 15:20:03 +0000</pubDate>
		<guid isPermaLink="false">http://statpad.wordpress.com/?p=440#comment-1167</guid>
		<description><![CDATA[Roman --
   I&#039;ve now had a chance to look at TLS a little, and have even implemented a simple program in Matlab.  
   Although you -- and much of the climate literature -- use &quot;EIV&quot; and &quot;TLS&quot; interchangeably, I view EIV (Errors in Variables) as a generic problem, and TLS (Total LS) as merely one proposed solution.  
   Although TLS sounds good on paper, in practice it is almost never useful, since it requires knowing the relative variances of the two errors e and f.  
   I was at first concerned that TLS might not even be consistent, since if one had scaled the variables so that e and f had equal variance, and then were to go back and re-estimate the variances of e and f from the TLS regression residuals, the ratio of the variances would equal the slope of the line, as shown in your figure 2 above, rather than unity.  But in simulations I did with sample size 100, 1000, and 100,000, the estimate of the slope converged on the true value for several combinations of parameters, so this apparently is not a problem.  Wikipedia and a helpful 1996 article by P. de Groen I found online do not mention the issue of consistency, so a proof of this could be a nice paper for someone (not me!).
   A far more useful solution to the EIV problem is what might be called Adjusted OLS:  As is well known (see Wiki or Pindyck and Rubinfeld&#039;s text), the standard OLS estimator 
bOLS = cov(x,y)/var(x)
has plim 
beta  var(x*)/var(x).
But this implies that the adjusted OLS estimator 
bADJ = bOLS var(x)/var(x*) = cov(x,y)/(var(x) - var(f)) 
is consistent.  The multivariate case is similar, I believe.  For some reason, neither Wiki nor deGroen  nor P&amp;R suggest this obvious adjustment.
   In climate data, for example, it is common to use a temperature series like HadCRU as the explanatory variable (when properly using CCE to calibrate a proxy, for example).  But CRU admits this has a measurement standard errror of .10 (in 1850) to .025 (since 1950), or about .05 on average.  Meanwhile, the series itself has risen by about 1 dC since it started, so that it has a variance of about 0.1.  This makes bOLS low by a factor of about 0.975, so that bADJ = bOLS * 1.026.  Not enough to worry about, as it turns out, but worth checking.  (CRU may in fact be overstating the precision of its temperature indices, but that is another matter.)  
  If var(x) turns out to be less than var(f), that just means that whatever variation there is in x is just noise, and one shouldn&#039;t even bother running the regression.  
  Another use for Adjusted OLS is in 2 Stage LS IV estimation -- the first stage regression gives a noisy but exogenous proxy for the endogenous regressor(s), but it also gives an estimate of the variance of the noise.  The second stage regression then can be boosted with bADJ to reduce its finite sample bias.  (In IV, the first stage noise goes to zero as the sample size gets infinite, so that 2SLS is consistent despite the EIV bias.  But still it&#039;s worth removing as much of the finite sample bias as possible with bADJ, particularly when the instruments are weak, as is often the case.)  
   In the limit when var(e) = 0, so that there is measurement error in x but no regression error in y, the natural solution is Reverse Regression -- regress x on y and invert.  This is the limit TLS yields, as shown in your third graph.  But although bADJ has the same plim, it will give a different actual value.  
  Wayne Fuller has a relatively recent book on Errors in Variables problems which I suspect may use bADJ, but I only took a cursory look at it some time ago.  
   Someone (not me) should work out the theory of standard errors for the Adjusted OLS and TLS estimators.]]></description>
		<content:encoded><![CDATA[<p>Roman &#8211;<br />
   I&#8217;ve now had a chance to look at TLS a little, and have even implemented a simple program in Matlab.<br />
   Although you &#8212; and much of the climate literature &#8212; use &#8220;EIV&#8221; and &#8220;TLS&#8221; interchangeably, I view EIV (Errors in Variables) as a generic problem, and TLS (Total LS) as merely one proposed solution.<br />
   Although TLS sounds good on paper, in practice it is almost never useful, since it requires knowing the relative variances of the two errors e and f.<br />
   I was at first concerned that TLS might not even be consistent, since if one had scaled the variables so that e and f had equal variance, and then were to go back and re-estimate the variances of e and f from the TLS regression residuals, the ratio of the variances would equal the slope of the line, as shown in your figure 2 above, rather than unity.  But in simulations I did with sample size 100, 1000, and 100,000, the estimate of the slope converged on the true value for several combinations of parameters, so this apparently is not a problem.  Wikipedia and a helpful 1996 article by P. de Groen I found online do not mention the issue of consistency, so a proof of this could be a nice paper for someone (not me!).<br />
   A far more useful solution to the EIV problem is what might be called Adjusted OLS:  As is well known (see Wiki or Pindyck and Rubinfeld&#8217;s text), the standard OLS estimator<br />
bOLS = cov(x,y)/var(x)<br />
has plim<br />
beta  var(x*)/var(x).<br />
But this implies that the adjusted OLS estimator<br />
bADJ = bOLS var(x)/var(x*) = cov(x,y)/(var(x) &#8211; var(f))<br />
is consistent.  The multivariate case is similar, I believe.  For some reason, neither Wiki nor deGroen  nor P&amp;R suggest this obvious adjustment.<br />
   In climate data, for example, it is common to use a temperature series like HadCRU as the explanatory variable (when properly using CCE to calibrate a proxy, for example).  But CRU admits this has a measurement standard errror of .10 (in 1850) to .025 (since 1950), or about .05 on average.  Meanwhile, the series itself has risen by about 1 dC since it started, so that it has a variance of about 0.1.  This makes bOLS low by a factor of about 0.975, so that bADJ = bOLS * 1.026.  Not enough to worry about, as it turns out, but worth checking.  (CRU may in fact be overstating the precision of its temperature indices, but that is another matter.)<br />
  If var(x) turns out to be less than var(f), that just means that whatever variation there is in x is just noise, and one shouldn&#8217;t even bother running the regression.<br />
  Another use for Adjusted OLS is in 2 Stage LS IV estimation &#8212; the first stage regression gives a noisy but exogenous proxy for the endogenous regressor(s), but it also gives an estimate of the variance of the noise.  The second stage regression then can be boosted with bADJ to reduce its finite sample bias.  (In IV, the first stage noise goes to zero as the sample size gets infinite, so that 2SLS is consistent despite the EIV bias.  But still it&#8217;s worth removing as much of the finite sample bias as possible with bADJ, particularly when the instruments are weak, as is often the case.)<br />
   In the limit when var(e) = 0, so that there is measurement error in x but no regression error in y, the natural solution is Reverse Regression &#8212; regress x on y and invert.  This is the limit TLS yields, as shown in your third graph.  But although bADJ has the same plim, it will give a different actual value.<br />
  Wayne Fuller has a relatively recent book on Errors in Variables problems which I suspect may use bADJ, but I only took a cursory look at it some time ago.<br />
   Someone (not me) should work out the theory of standard errors for the Adjusted OLS and TLS estimators.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on GHCN and Adjustment Trends by Layman Lurker</title>
		<link>http://statpad.wordpress.com/2009/12/12/ghcn-and-adjustment-trends/#comment-1017</link>
		<dc:creator><![CDATA[Layman Lurker]]></dc:creator>
		<pubDate>Tue, 27 Dec 2011 18:45:40 +0000</pubDate>
		<guid isPermaLink="false">http://statpad.wordpress.com/?p=234#comment-1017</guid>
		<description><![CDATA[http://rankexploits.com/musings/2011/climategate-investigation-tallblokegreg-laden-laframboise/#comment-87938]]></description>
		<content:encoded><![CDATA[<p><a href="http://rankexploits.com/musings/2011/climategate-investigation-tallblokegreg-laden-laframboise/#comment-87938" rel="nofollow">http://rankexploits.com/musings/2011/climategate-investigation-tallblokegreg-laden-laframboise/#comment-87938</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on GHCN and Adjustment Trends by Layman Lurker</title>
		<link>http://statpad.wordpress.com/2009/12/12/ghcn-and-adjustment-trends/#comment-1016</link>
		<dc:creator><![CDATA[Layman Lurker]]></dc:creator>
		<pubDate>Tue, 27 Dec 2011 18:41:44 +0000</pubDate>
		<guid isPermaLink="false">http://statpad.wordpress.com/?p=234#comment-1016</guid>
		<description><![CDATA[Roman, I just left this comment at Lucia&#039;s citing your &quot;Mean Annual GHCN Adjustments&quot; graph from this post. It falls into the category of &quot;things that make you say hmmm&quot;.]]></description>
		<content:encoded><![CDATA[<p>Roman, I just left this comment at Lucia&#8217;s citing your &#8220;Mean Annual GHCN Adjustments&#8221; graph from this post. It falls into the category of &#8220;things that make you say hmmm&#8221;.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

