[Gretl-devel] slowness with panel sample restrictions (?) or pxnobs() function (?)

Allin Cottrell cottrell at wfu.edu
Thu Aug 11 21:16:30 EDT 2016

On Sun, 7 Aug 2016, Sven Schreiber wrote:

> Am 07.08.2016 um 11:44 schrieb Riccardo (Jack) Lucchetti:
>> On Sat, 6 Aug 2016, Sven Schreiber wrote:
>>> hi,
>>> in relation to the feature request about cross-section dependence
>>> testing in panels I've encountered some performance bottleneck.
>> The "smpl" command is rather CPU-expensive. Here's a different solution
>> (which can be further optimised a little, but I'm sure you get the idea):
> OK, thanks for the great suggestions. Indeed my own previous and faster code 
> had only N=140 sample restrictions, whereas the later version had (N-1)/2 
> more, so almost 70-fold, which then could explain going from "instantaneous" 
> to several seconds.
> It's a pity though, because my "elegant" version is much more readable 
> IMHO...

Readable, yes, I agree. But it involves about 39 thousand resizings 
of a dataset containing 1260 observations on 11 series, so I'm not 
surprised that it takes some time.

Jack's version avoids resizing the entire dataset and thereby is an 
order of magnitude faster. But this is a case where a native C 
version of the calculation is a good deal faster still.

It's not yet documented, and it's not yet represented in the GUI, 
but there's now an --xdepend option to the "modtest" command in git 
(for panel-data models only). It computes the Pesaran CD statistic, 
and is about 2.5 orders of magnitude faster than Jack's (optimal, I 
would say) hansl code.

set echo off
set messages off

function scalar SvensCD (series u)
   series unit = $obsmajor
   N = max(unit)

   rhos = zeros(N,N)
   Tcomms = zeros(N,N)
   loop i=1..N -q
     loop j=(i+1)..N -q
       smpl (unit == i || unit == j) --restrict --replace
       smpl pxnobs(u) == 2 --restrict
       matrix uij = {u}
       Tcomms[i,j] = rows(uij) / 2
       rhos[i,j] = corr(uij[1: Tcomms[i,j]], uij[Tcomms[i,j] + 1 : ])

   # Pesaran's CD
   scalar CD =  sqrt(2 / (N*(N-1))) * sum(sqrt(Tcomms) .* rhos)
   return CD
end function

function scalar corr_na(matrix x, matrix y)
   e = ok(x) && ok(y)
   H = selifr(x ~ y, e)
   return corr(H[,1], H[,2])
end function

function scalar JacksCD (series u)
   series unit = $obsmajor
   N = max($obsmajor)
   T = max(time)

   set skip_missing off
   set warnings off
   mu = mshape({u}, T, N)
   OK = ok(mu)
   matrix Tcomms = OK'OK
   matrix rhos = zeros(N,N)
   loop i = 1..N --quiet
     x = mu[,i]
     loop j = i+1..N --quiet
       rhos[i,j] = corr_na(x, mu[,j])

   scalar CD = sqrt(2 / (N*(N-1))) * sum(sqrt(Tcomms) .* rhos)
   return CD
end function

open abdata.gdt --quiet

ols n 0 --quiet
series u = $uhat

set stopwatch
eval SvensCD(u)
t0 = $stopwatch

eval JacksCD(u)
t1 = $stopwatch

modtest --xdepend
t2 = $stopwatch

print t0 t1 t2

Output here:

Read datafile /opt/esl/share/gretl/data/misc/abdata.gdt
Pesaran test for cross-sectional dependence
Test statistic: z = 96.788106,
with p-value = P(z) > 96.7881) = 0

              t0 =  1.6515259

              t1 =  0.20180730

              t2 =  0.00053288300


More information about the Gretl-devel mailing list