[Gretl-devel] Gretl and PUMS Data
Riccardo (Jack) Lucchetti
r.lucchetti at univpm.it
Thu Oct 25 13:17:07 EDT 2007
On Thu, 25 Oct 2007, Allin Cottrell wrote:
> When gretl encounters non-numeric data for a particular variable
> in a CSV import it treats the values of that variable as strings,
> constructs a numeric coding, and creates a "string table" that
> presents the coding to the user. BUT this is done only if
> non-numeric data are encountered in the first data row for the
> variable in question. That is, if we read (apparently) numeric
> data on rows 1 to k-1, then encounter non-numeric data on row k,
> we flag an error and stop reading.
>
> The trouble is that some of the PUMS variables are codings, some
> but not all values of which contain non-numeric characters. For
> example, NAICSP, the "NAICS Industry Code", which has values
> (among others) of 1133 and 113M.
>
> Here's a solution, perhaps not permanent if we can think of
> something better: I've added a new parameter to the "set" command,
> namely "codevars". You can do, for example,
[...]
The problem I see with this approach is that one has to know in advance
which variables must be treated specially. With large datasets, you may
not; the improved debugging info does help, but IMO only to an extent. A
possible alternative may be the following: first, read all the data as if
they were all strings. Then, with the data already in RAM, convert to
numeric whenever possible. This way, you read the datafile only once, and
the way stays open if we want, for instance, flag some of the variables as
dummies or discrete variables straight away.
What do you think?
Riccardo (Jack) Lucchetti
Dipartimento di Economia
Università Politecnica delle Marche
r.lucchetti at univpm.it
http://www.econ.univpm.it/lucchetti
More information about the Gretl-devel
mailing list