[Gretl-users] Heckit

Gordon Hughes G.A.Hughes at ed.ac.uk
Mon May 28 05:00:20 EDT 2007


I believe that the Stata option - i.e. consistency with the ML 
estimator - should be the default option because (a) it is then clear 
what is going on, and (b) it is possible to compare the results for 
the two step estimator with the ML estimator.  (I have a general 
preference for the ML estimator because the two step estimator often 
seems to generate spuriously low standard errors on the 
coefficients.)  Further, the user should be discouraged from using 
the two step estimator with non-matching missing data for the reason 
that you give.

But, there is an important secondary consideration.  Certain types of 
survey design can generate systematic patterns of missing data - this 
would include the answers to questions that are only asked if someone 
is in the labour force or bought certain goods in the last month.  In 
such cases, exclusion of all observations with missing data can 
seriously compromise the possibility of estimating a model reliably 
if the pattern of questions asked/answers is correlated with the 
selection probability.

There is an alternative way of analysing such data.  It is 
straightforward to let a user construct their own two step estimator 
with different missing data as follows: (a) estimate the probit model 
for selection and generate the Mills ratio as a new variable; then 
separately (b) estimate the OLS equation including the Mills ratio as 
a dependent variable.  The corrections are not so difficult and 
anyone following this route explicitly should know what they are 
doing and can be warned in the documentation.  All that is needed is 
an option in the probit model to generate the Mills ratio as a 
post-estimation variable.  I haven't checked whether it is there 
already but it could easily be added.

Gordon Hughes   



More information about the Gretl-users mailing list