On Sat, 12 Nov 2016, Sven Schreiber wrote:

> Am 12.11.2016 um 21:56 schrieb Allin Cottrell:
>> On Sat, 12 Nov 2016, Sven Schreiber wrote:
>>> While we're on the subject of the condition number and collinearity, I
>>> have a question about the following example: Open the example data
>>> hall.gdt and select the two variables "consrat" and "ewr"; the
>>> correlation output (again, right-click and then select from the
>>> context menu) then shows a corr coeff of just 0.16. The new
>>> collinearity analysis gives a whopping 634, an order of magnitude
>>> greater than the rule-of-thumb value 50. The bkw.gfn package confirms
>>> this value.
>>> This strikes me as qualitatively very different, and spontaneously I'm
>>> not sure why that is so. Any ideas?
>> The Pearson correlation coefficient is undefined if one or both of the
>> terms are constants. However, the Belsley condition number can handle a
>> constant, and presumably the big condition number in the example you
>> describe (but note, only when you include a constant) is due to the fact
>> that consrat itself is almost a constant.
> That's not quite right, consrat has a distribution not too far from a normal, 
> actually. What's true is that the scaling/range is quite different. But you 
> can take 10*consrat and the series then are more comparable (apart from the 
> mean). Which affects neither the correlation coefficient nor the condition 
> number, however, so I'm inclined to say that my question still stands.

Hmm, I guess so. Maybe worth noting that Belsley's condition number 
calculation involves scaling but not centering. So try the 

open hall.gdt
series c10 = consrat * 10
series cc = consrat - mean(consrat)
eval cnumber({const, ewr, consrat})
eval cnumber({const, ewr, c10})
eval cnumber({const, ewr, cc})

Multiplying consrat by 10 makes no difference to the condition 
number, as you say. But centering it reduces the value a great deal. 
(With correlation, of course, neither scaling not centering makes 
any difference.)


