## Vector Autoregressions and Causality

by
Hiro Y. Toda, Peter C. B. Phillips

Citation

Title:

Vector Autoregressions and Causality

Author:

Hiro Y. Toda, Peter C. B. Phillips

Year:

1993

Publication:

Econometrica

Volume:

61

Issue:

6

Start Page:

1367

End Page:

1393

Publisher:

Language:

English

URL:

Select license:

Select License

DOI:

PMID:

ISSN:

**Updated:**October 29th, 2012

Abstract:

Econometnca, Vol. 61, No. 6 (November, 1993), 1367-1393

VECTOR AUTOREGRESSIONS AND CAUSALITY

This paper develops a limit theory for Wald tests of Granger causality in levels vector autoregressions (VAR's) and Johansen-type error correction models (ECM's), allowing for the presence of stochastic trends and cointegration. Earlier work by Sims, Stock, and Watson (1990) on trivariate VAR systems is extended to the general case, thereby formally characterizing. the circumstances when these Wald tests are asymptotically valid as X2 criteria. Our results for inference from unrestricted levels VAR are not encourag- ing. We show that without explicit information on the number of unit roots in the system and the rank of certain submatrices in the cointegration space it is impossible to determine the appropriate limit theory in advance; and, even when such information is available, the limit theory often involves both nuisance parameters and nonstandard distributions, a situation where there is no satisfactory statistical basis for mounting these tests.

The situation with regard to the use of causality tests in ECM's is also complex but more encouraging. Granger causality tests in ECM's also suffer from nuisance parameter dependencies asymptotically and, in some cases that we make explicit, nonstandard limit theory. Both these results are somewhat surprising in the light of earlier research on the validity of asymptotic X2 criteria in such systems. In spite of these difficulties, Johansen- type ECM's do offer a sound basis for empirical testing of the rank of the cointegration space and the rank of key submatrices that influence the asymptotics.

KEYWORDS:Error correction model, exogeneity, Granger causality, maximum likeli- hood, nonstandard limit theory, nuisance parameters, vector autoregression, Wald test.

1. INTRODUCTION

In their analysis of causality tests, SSW look specifically at trivariate systems and conclude that the Wald test has a limiting chi-squared distribution if the

'We thank D. W. K. Andrews and a co-editor for helpful comments on earlier drafts. Our thanks also go to Glena Arnes for her skill in keyboarding the manuscript and to the NSF for research support under SES 8821180.

H. Y. TODA AND P. C. B. PHILLIPS

time series are cointegrated and if the long run relationship involves the variable that is excluded under the null hypothesis (SSW, p. 135, paragraph 3 and footnote 3). Given the important empirical role of causality tests in levels VAR's, it seems reasonable to us to ask the following questions: to what extent are the conclusions of SSW generally valid; what form do the qualifications on the nature of cointegration that apply in trivariate systems take in the general case; are there other special cases of interest that are worthy of mention?

One object of the present paper is to explicitly address these questions. We extend the treatment in SSW of causality tests to the case of general VAR systems with an arbitrary number of cointegrating vectors. In particular, we are able to characterize those special cases where the limit theory is indeed x2.We also provide a breakdown of the general case where the limit theory has X2 and nonstandard components and may depend on nuisance parameters. We point to other special cases where the limit theory has nonstandard components but is free of nuisance parameters. We show that without explicit information about the number of unit roots in the system and the rank of certain submatrices in the cointegration space it is impossible to determine the appropriate limit theory in advance. Such information is typically unavailable a priori in much empirical work, more especially in empirical work conducted with VAR's where the emphasis is on unrestricted estimation unfettered by prior identifying information (Sims (1980)). But, even if this information is available, the limit theory in VAR estimation will still often involve nuisance parameters and then no satisfactory basis for mounting a statistical test of causality applies.

A second object of the present paper is to develop an asymptotic theory for causality tests in error correction models (ECM's) estimated by maximum likelihood. In keeping with our earlier theme of VAR estimation, we propose an asymptotic theory for Wald tests of causality in ECM's formulated as VAR's in differences with levels as additional regressor variables. Our framework is the same as Johansen (1988, 1991) and therefore has the advantage that pretests can be performed relating to key elements that drive the asyrnptotics, such as the dimension of the cointegration space and the rank of certain submatrices of the cointegrating matrix. In general, tests for causality in ECM's also suffer from nuisance parameter dependencies asymptotically. Moreover, in certain impor- tant cases, the limit theory of Wald tests for causality is also nonstandard and can be characterized in terms of nonlinear functions of X2 variates. Both these results may seem surprising given the assumed general validity of X2 asymptotics for Wald tests in such modeis. However, the situation is not as severe as it is in levels VAR estimation. In important subcases (where either the loading coefficient submatrices or the submatrices of cointegrating coefficients that are relevant under the null are of full rank) it is shown that the limit theory of Wald tests for causality is x2.

The plan of the paper is as follows. Section 2 details the models we shall consider, our notation, and some theoretical preliminaries. Section 3 studies Wald tests for causality in levels VAR estimation and Section 4 extends this analysis to Johansen-type ECM's estimated by maximum likelihood under

VECTOR AUTOREGRESSIONS

Gaussian assumptions. Section 5 concludes the paper and an Appendix contains many of the mathematical derivations. A summary word on notation. We use vec(A) to stack the rows of a matrix A into a column vector, [x] to denote the largest integer ~x,

and R(A) and R(A)' to signify the range space and its orthogonal complement, respectively, of a matrix A. We use the symbols " 5," " 5," and " = " to signify conver- gence in distribution, convergence in probability, and equality in distribution, respectively. The inequality " > 0" denotes positive definite (p.d.) when applied to matrices. We use I(d) to signify a time series that is integrated of order d, BM(0) to denote a vector Brownian motion with covariance matrix 0.We write integrals with respect to Lebesgue measure such as J;B(s) ds more simply as JB to achieve notational economy. (All integrals in this paper are from 0 to 1.) Finally all limits given in this paper are taken as the sample size T tends to w.

2. THE MODEL, NOTATION, AND OTHER PRELIMINARIES

Consider the n-vector time series {y,) generated by the kth order VAR model

where J(L) = C:=,J,L'-~ and {u,} is an iid sequence of n dimensional random vectors with mean zero and covariance matrix 2,> 0, such that Eluit12+'< co for some 6 > 0 (i = 1,.. . ,n). We shall initialize (1) at t = -k + 1,.. . ,O. Since the initial values {yo, y -,, . . . ,y-,+,) do not affect asymptotics, we can let them be any random vectors including constants. But we will give them a certain distribution as specified below to facilitate our analysis. In setting up a likeli- hood function for data generated from (1) it is, of course, most convenient to take the initial values to be constant, as in Johansen (1988, 1991).

We assume that model (1) does not have a constant term, p, say, since this simplifies considerably the presentation and derivation of our results. Of course, the basic idea in the derivation of the asymptotics is the same whether p = 0 or p # 0, and since the effects of deterministic trends have been investigated rather fully in the recent literature on regression with integrated processes, it is easy to see how the asymptotic distributions derived in the next two sections should be modified if p # 0.

We can write (1) in the equivalent ECM format

where J* =J(1) -In, and

k-1

J*(L) = J,*L'-' with J: = -5 J[ (i=l, ..., k-1).

i=l l=i+l

H. Y. TODA AND P. C. B. PHILLIPS

We assume that:

(3a) 11,-J(L)L~=O implies (L(>1 or L=l.

(3c) (J* (1 -I)A ,is nonsingular,

where r, and A, are n x (n -r) matrices of full column rank such that r\r= 0 =At,A. (If r = 0, then we take r, =A ,=I,.)

Under the above conditions {y,) is I(1), and is cointegrated with r cointegrat- ing vectors if r 1. Condition (3a) precludes explosive processes but allows for the model (1) to have some unit roots. Condition (3b) defines the cointegration space to be of rank r and A is a matrix whose columns span this space. Condition (3c) ensures that the Granger representation theorem applies, so that Ay, is stationary with a Wold representation, A'y, is stationary, and y, is an I(1) process.

Suppose that we are interested in whether the first n, elements of y, are "caused by" the last n, elements of this vector. Write

and partition J(L) conformably with y,. Then the null hypothesis of noncausal- ity can be formulated based on the model (1) as

(4) Mo:Jl,13=."'=Jk,13=o

where J,,(L) = Cf=, J,,,,L'-I is the n, x n, upper-right submatrix of J( L). Equivalently, the noncausality null can be formulated as

(5) No*: Jt13= ... =Jz-,,,,=O and J$=O based on the ECM format (2), where J,*,(L) = c~-'J* L'-' and J$ are the

,=,',I,

n, x n, upper-right submatrix of J* (L) and J*, respectively.

To prepare for the analysis in the following sections we introduce some further notation and a couple of lemmas. Define x, = (yj-,, . . . ,yj-,)' and we can write (1) as

where @ = [J,, . . . ,J,]. Define an nk X nk nonsingular matrix H = [H,, Hz I with

and

where I, is an n x n identity matrix, e, is a k-dimensional unit vector, i.e.,

VECTOR AUTOREGRESSIONS

(1,0,. . . ,O)', and D is a k x (k -1) matrix such that

Then define zi =(zit, z;,) = (H'x,)' where

which is an m, = n(k -1) + r dimensional vector, and

which is an m2 =n -r dimensional vector. These variates z,, and z2, are the basic components that will appear in the large sample asymptotics developed in the next two sections. With this notation we now specify the distribution of the initial values x, = (yb, . . . ,y'_,+,)' of system (1) in a convenient way for our analysis. Note that if z, = (z;,, z!,,)' is specified, so is x, through x, =H'-'z,. Our initialization assigns to zll the stationary distribution of the process defined by (Al) in the Appendix, while the component z2, can be any random vector.

We can write (2) as

where 9= [JT,.. . ,JLl, TI.Furthermore, since Ay, is I(O), we have the Wold representation

m

(8) A y, = C( L) u, where C( L) = C,L' with Co=I,.

i=O

(See (A6) in the Appendix for the explicit form of C(L).) Now write w: = (u:, zit, Az;,) and define for any t

and

We partition 0, 2, and A conformably with w,. For instance, we write

with indices "On, "I", and "2" corresponding to the components of w,. Then,

H. Y.TODA AND P. C. B. PHILLIPS we have the following lemma.

where B(s) =(B,(s)', Bl(s)', B,(s)')' is an (n +m, +m2)-vector Brownian motion with covariance matrix 0, 5 is an nml-dimensional normal random vector with mean zero and covariance matrix 2, 8Z,, and B(s) and 5 are independent.

(ii) B2(s) =A1,C(1) B,(s).

(iii) 0, =2, =2,, 2, and 0, are positive definite, and A,, =Z,, =0.

The next lemma follows from Lemma 1 above and Lemma 2.1 of Park and Phillips (1989).

d

(i)(b) -1 CT zl,ui -No where vec (No) =.[;

0t-1

1 d

(iii) -z2,zit+/B~B~.T2 ,=I

Joint convergence of all the above also applies.

Now we are ready to analyze the asymptotic distribution of the Wald statistic for testing the hypothesis (4) (or (5)).

3. CAUSALITY TESTS BASED ON LEVELS VAR ESTIMATION

Suppose we estimate the levels VAR model (1) by OLS. The estimated equation is

(9) y, =&x, +a, (t= I, ...,T),

VECTOR AUTOREGRESSIONS

1373

where in this section "* " signifies estimation by OLS. The noncausality hypoth- esis (4) can be written as

(10) 4:S;@S= 0 or (S;x St)vec(@) = 0

where

Then the Wald statistic for testing (10)can be written

x (Sl,@ Sr)vec (d)

= tr [ S;~S

[s'(x'x)-~sI l~'~~l(~;~,~l)-l~

where 2, is the OLS estimator of 2, and X' = (x,,. . . ,x,). Under the null hypothesis S;&S = S;U'X(X'X)-~Swhere U' = (u,,. . . ,u,), and we therefore have

where R = S'H and Z' = (z,,. . . ,z,). We write R = [R,, R,] with R1=StHl= [D@S;,ek@A3]

and

R2= SrH2= ek @ A ., where A, and A .,are the last n, rows of A and A. , respectively.

LEMMA3: Let rank (A,) = g Gn,. Then, there exists a nonsingular n, k X n, k matrix k, such that K;R = RTrT with where YT = diag(@l,. , TIm2),Ell and I?,, are (n,(k -1)+ g) x m1 and (n, g) X m2 matrices, both of which are of full row rank. H. Y.TODA AND P. C. B. PHILLIPS Using Lemma 3 we have Using Lemma 2 and taking into account the consistency of $, (see Park and Phillips (1989) for the consistency of the OLS estimator), the continuous mapping theorem gives where Note that FLS(,, and FL,(,, are independent because No is independent of (Bo(s)', B,(s)')' by Lemma l(i). since vec (~,lI;l~o~l) @ S',) vec (NO) N(O, ~11I;~~ll@ = (kllZ;' S;2,Sl), by Lemma 2(i)(b), we see that -1 F,,(,, = vee (R,~Z;'N~S~)) 8s;z,s,] vec (dllZ;l~o~l) [dll~yld;l On the other hand, FLS(,,component has a nonstandard distribution and depends on nuisance parameters in general. There are, however, two special cases that are noteworthy. First, note that if rank(A,) = n,, we may take Kl = In3and there is no K, in the proof of Lemma VECTOR AUTOREGRESSIONS 1375 3. Hence R = (ill, 0) with R,, of full row rank n,. Therefore, there is no FLS(,, component in this case. That is, we have the usual chi-square asymptotics. THEOREM1: If y, is cointegrated and rank (A,)=n,, then under the null d 2 hypotheses (4), FLs+x,,,,,~,. This theorem generalizes the SSW's (1990) result from their analysis of trivariate VAR(p) systems with one cointegrating vector-"if there is a linear combination involving X,,which is stationary," then "the F-test will have an asymptotic Xj/pdistribution" (SSW, 1990, p. 135, paragraph 3 and footnote 3). Next, we investigate the other extreme case where there is no cointegration. In this case, it will be shown that the limit distribution of the F,, statistic is nonstandard but free of nuisance parameters. If y, is not cointegrated, there is no A and we may take A. =I,. Hence we have Rl =D 8 Sj and R, =e, 8 Sj. Obviously we can set K, =D 8In3and K,= i, 8In3in the proof of Lemma 3 so that we have R,, =DID8 S; and R,,= S;. We shall rotate coordinates and cast FLS(,,into a canonical form that eliminates nuisance parameters. Define S;+, = (In1+n2,0) and let and Wl(s) = (S;I,S~)-~S;B,(S) which is a standard Brownian motion since 0, =2,by Lemma l(iii). Then we can write FLS(,)as where The covariance matrix of the m,-vector Brownian motion (Bb(s)', B,(s)')' is R,, which is p.d. by Lemma l(iii). We partition this as conformably with (Bb(sY, B,(sYY. Then define H.Y. TODA AND P. C. B. PHILLIPS which is a Brownian motion independent of Bb(s).Substituting Ba(s)=Ba.,(s) + Rabfl; 'Bb(s) into (12)gives Furthermore, define where Wa(s)and Wb(s)are standard Brownian motions independent of each other. Then we can reduce (11)to the canonical form where Moreover, we have Lemma 4. LEMMA4: W1(s) may be taken as the jirst n1 elements of Wb(s). Therefore we have obtained the following theorem. THEOREM2: If yr is not cointegrated, then under the null hypothesis (4) where the first and the second terms are independent, and W,(s) is the vector of the jirst n1 elements of Wb(s).. Theorem 2 shows that if processes are not cointegrated, causality tests in levels VAR's are asymptotically similar, and therefore the critical values for the tests can be tabulated conveniently. Once the critical values have been com- puted, the tests can easily be implemented in practice. Of course, if it is known 1377 that the system is I(1) with no cointegration, causality tests based on differences VAR's are also valid, and in these tests the usual chi-square critical values are employed. Hence, an interesting question is the following: which causality test should be implemented in empirical work if it is known that the process is I(1) with no cointegration. To provide some insight into this matter, we consider the situation where the null hypothesis (4) is false. For this purpose it is convenient to consider the equivalent formulation (5) of the null hypothesis (4). The alternative hypothesis corresponding to (5) is: 4": J:,, # 0 for some i Note that Js must be zero both under the null and alternative hypotheses since J* = 0 if the process, y,, is I(1) with no cointegration. Hence, if it is known that there is no cointegration, the constraints JI",= 0 in (5) are redundant. That is, tests of the noncausality hypothesis (4) in levels models contain redundant parameter restrictions. On the other hand, causality tests in difference models take account of the unit root constraint, J* = 0, and test the null hypothesis that Jz13= . . . = J$-,, ,,= 0. Therefore, causality tests in difference VAR's are likely to have higher power in finite samples. The above two theorems show that in two extreme cases the asymptotic distribution of the Wald test is free of nuisance parameters: (i) If there is "sufficient cointegration with respect to y,," in the sense that rank(A,) = n,, then the asymptotic distribution is Xi,,3,;and (ii) if there is no cointegration, the Wald test statistic converges to a nonstandard but nuisance parameter free distribution. In the intermediate cases, however, the asymptotic distribution is not only nonstandard but also dependent on nuisance parameters, i.e., if there is cointegration but it is "insufficient with respect to y,," in the sense that rank(A,) < n,, then the asymptotic distribution depends on nuisance parame- ters in a rather complicated manner. We illustrate this by an example. (The reader is referred to Toda and Phillips (1991a) for the full development of the asymptotics in the general case.) EXAMPLE 1: Let the true model be the following trivariate system with one cointegrating vector (given by the first equation) and error covariance matrix xu= (uUlu,): Ylt =y2t-1 +Ulf, Suppose we set the lag length k = 1 and J(L) = (bij) in (I), estimate an unrestricted VAR(l), and test JI/: b13= 0 (i.e., y, has no causal effect on yl) using the statistic F,,. In this case we have A, = 0 and g = 0, so there is insufficient cointegration with respect to y,. Let us now define the correlation matrix (pij) where pij = ~uiuj/(uuiup,,uj)i. Then, after a little manipulation we find the following limiting expression for H. Y. TODA AND P. C. B. PHILLIPS 1378 the FLsin terms of three scalar Brownian motions: where _W,(s) = Wa(s)-JW,W~(JW?)-~W~(S), 1 iii!i1 = BM(Ow) with Ow= ma, m! mf], ma, = (1 -~i3)-~(~~~ -p?,)f, -p32p31) =~1~.~(1 and mbl =PI,. We deduce that the limit distribution of FLsis dependent on the nuisance parameters p,, (the correlation between u,, and u,,) and ~13.2(the partial correlation between u,, and u3, given u,,). In sum, to detect noncausality in a nonstationary VAR model like (1)we need first to know if y, is cointegrated or not. If there is no cointegration, we may formulate the model in terms of the first differences, or we can apply Theorem 2. If there is cointegration, we need further to know if the cointegration is sufficient with respect to the variables whose causal effects are being tested. If it is sufficient, we can apply Theorem 1. If the cointegration is insufficient, however, it is necessary to know rank(A3), to estimate nuisance parameters, and to simulate the asymptotic distribution that is relevant for the particular model we have using estimated nuisance parameters. This procedure seems too complicated and computationally demanding in practice besides having no sound statistical basis. We conclude that causality tests based on OLS estima- tion in levels VAR's are not to be recommended in general. We shall therefore propose an alternative and more promising procedure based on ML estimation of the model in ECM format in Section 4. But before proceeding to the next section we consider the extension of the above theorems to the VAR model with a nonzero constant term, w, viz. (14) y, = p + @x, + u,. If (14) has some unit roots, the process y, may contain linear trends as well as stochastic trends., Effects on the asymptotics of the presence of deterministic trends in I(1) regressors are discussed in Park and Phillips (1988) in a general framework. Therefore, we report only the results very briefly. Suppose the process y, is cointegrated. If (14) is estimated by OLS, the asymptotic distribution of the Wald statistic for noncausality, in general, differs depending on whether y, contains linear trends or not. But if the rank condition of Theorem 1 is satisfied, the Wald statistic still has an asymptotic chi-square If r\p # 0, then linear trends are present, otherwise y, possesses only stochastic trends. See Johansen (1991). VECTOR AUTOREGRESSIONS distribution with the same degrees of freedom. However, if y, is not cointe- grated and (14) is estimated, then the nuisance parameter independence of the causality test no longer applies. This is because, if there is no cointegration, the nonzero p in (14) produces linear trends in y,, which break the collinearity of the Brownian motions in Lemma 4. To obtain asymptotically similar tests for causality in the model with nonzero p, we need to include a time trend term, t, as well as a constant term in the estimated system. Then, the linear trend components in y, are eliminated since including time as a regressor is equivalent to detrending the data prior to estimation. Accordingly, the asymptotic distribution of the causality test be- comes free of nuisance parameters if the process is not cointegrated. But the Brownian motions in Theorem 2 must be replaced with "detrended Brownian motions." Of course, Theorem 1continues to hold even if a time trend term (in addition to a constant term) is included in the estimated equation. 4. CAUSALITY.TESTS BASED ON ML ESTIMATION THE MODEL IN ECM FORMAT As we saw in the last section, causality tests based on OLS estimators of unrestricted levels VAR's are not very useful in general because of uncertainties regarding the relevant asymptotic theory and potential nuisance parameters in the limit. In this section we consider an alternative way to test noncausality hypotheses in cointegrated VAR's. Our testing procedure is based on Johansen's (1988, 1991) ML method. This method has two advantages over the levels VAR approach considered in the last section. First, the ML procedure gives estima- tors of the system's cointegrating vectors, A, and their weights r. Hence if the asymptotic distribution of the tests depends on the structure of A (or T), as in the case of OLS based tests, then we may use these estimators to test relevant hypotheses about the structure of A (or T). Moreover, the ML estimators of the cointegrating vectors are asymptotically median unbiased and have mixed nor- mal limit distributions, unlike those that would be obtained from levels VAR estimation, and they are therefore much better suited to perform inference. Second, since ML methods take into account information on the presence of unit roots in the system, we can avoid unit root asymptotics altogether, i.e., the asymptotic distribution of the ML estimator of A will be mixed normal and conventional normal asymptotics will apply to the estimators of the other parameters. (See Phillips (1990.) We deal with the ECM representation of the system given in (2), and estimate the parameters JF, .. . ,J;',, r, and A. We continue to assume that the model has no constant term. The asymptotic theory does differ if the model has a constant term. However, our results on causality tests obtained below are not affected by that difference. Although Johansen assumes normality of the innovation sequence {u,} in addition to the assumptions we made in Section 2, it is obvious in view of our Lemma 1 and Lemma 2 that all the asymptotic results in Johansen (1988) H. Y. TODA AND P. C. B. PHILLIPS continue to hold without the extra as~umption.~Thus, suppose that by Johansen's likelihood ratio test about the number of cointegrating vectors we have decided that there are r 1cointegrating vectors. (If there is no cointegra- tion, we can formulate the model in terms of first order differences {A?,), or we may apply Theorem 2 in the last section.) Then the ML estimator A of A is given by the eigenvectors corresponding to the J largest eigenvalues that solve equation (9) of Johansen (1988). Also, let A, be the n -r eigenvectors4 corresponding to the n -r smallest eigenvalues and assume that all the eigen- vectors are normalized in the manner prescribed by Johansen (1988, p. 235). The estimator of !P = [J;, .. . ,JE1, TI (see (7))is given by where AY' = [Ay,, .-. . , Ay,], and Z"; = [ill,. . . , i,,] with ilt= [AyjPl,...,Ay:-k+l,(Aryt-l)r]r. In this section the symbol "^" on top of a parameter signifies that the parameter is estimated by ML. We shall construct a Wald test statistic based on these ML estimators to test the noncausality hypothesis (5). But before proceeding further, we summarize the limit behavior of the ML estimators and some sample moment matrices. Define A=dfi-l, f =far, and s= [q,..., gl, f], where I?=dd with 8= (ArA)-2, and define where ff ,=d,d, with 4, = (Ar,A ,)-1A', . The limit theory we need is given in the following lemma, which is a consequence of results in Johansen (1988). where B,(s) is the m, =n -r dimensional Brownian motion defined in Lemma 1, B,(s) is an r dimensional Brownian motion with covariance matrix R, = If ut is not normally distributed, the estimators considered below are no longer ML estimators. Nevertheless, we shall continue to refer to them as "ML estimators." These eigenvectors do not provide a consistent estimator of the space spanned by A ..But we call them A. since their role in the derivation of the asymptotic distribution is the same as that of A .. (See the Proof of Lemma 5.) VECTOR AUTOREGRESSIONS 1381 (T'C;lT)-l, and B2(s) and B,(s) are independent. (ii) n(@-ly) 5N = [ Nl ,N,], n<k-1) r where vec (N') =N(0, 2; @ 2,) which is independent of B,(s) and Bc(s). (iii) 2,52, where 2, = T-'(AY'AY -AY' 2,(,2;2,)-'2; AY) is the ML estimator of 2,. P (iv) hc= (P2;lf) ---+0,. (v) T(2;.fl)-' 52;' where 2; = [ill,.. . ,ZIT] with i;,= [Ayi-,, . . . ,.A-Y:-~+~,(~Y~-I)~I. (vi)(a) For a matrix M such that M!A = 0, M'A =M'A. , - where 2; = [E,,, . . . ,Z,,] with Z2, =Af,y ,-,. Now the null hypothesis of noncausality is given by (5). This can be written alternatively as (ST, = S;@*S = 0 or vec (@Ti) = (Sf@ S;) vec (@*')= 0 (15) where @* = [J?, ...,Jk*- l, J*] and @T3 = [Jt,,, ...,JZ-,, ,,, Jg].Note that (15) involves some nonlinear restrictions, viz., I,*,=T1A\ = 0 where rl denotes the first n, rows of r. Since fk-rAr= ffir(fi-l)lk-r'f= (f-r)2+r(p-~'), we have under the null hypothesis (15) s where &* = E@, . . . ,~k-1,p-1 with jk =fi,= (vet (9'-TI)', vec (A -A)')', - 2= (PI, P,) with I S S 0 nln3(k-1) - - - - - - - -- - --1 - - - - - - p 1 0 [ I A,@s; ] n~n3 and I nr and A, denotes the last n, rows of A. Given Lemma 5(i) and (ii), the expansion (16) motivates us to employ the Wald statistic (17) F,, =vec (6;;)1($~)-'vec (6;;) where 3 is defined as P with A, and r, replaced by d, and PI,respectively, and A,and fl are the last n, rows of d and the first n, rows of f,, and 8,= (f's; 'f I-'. Note that 3 and ? can be constructed using the estimates a,f,d,,ands It turns out that PW' might be singular (even in the limit). However, we will be interested in the case where $W^' is nonsingular in the limit. In order to guarantee this we need to assume that either r, or A, is of full row rank, and consequently in the above formula of the Wald statistic it is assumed that ($W^')-' exists. This problem arises due to the nonlinear restrictions r,Aj = 0 involved in the noncausality hypothesis in ECM's. In the case of a Wald test of a nonlinear hypothesis f(0) = 0, say, in stationary models it is well known that the Jacobian matrix, af(e>/ae', evaluated at the true parameter value must have full row rank for the Wald statistic to converge to a chi-square distribution with the usual degrees of freedom. The same principle applies to the restriction rlAj = 0 in our nonstationary model, and the rank conditions on T, and A, correspond to the conditions on the Jacobian matrix, af(e)/de1. Let us first assume that rank(A,) =g <n, and rank(r,) =n, in the following development of the asymptotics. The treatment of the case in which rank(A,) =n, and rank(r,) G n, is easier and will be briefly discussed later. Now it is convenient to rewrite the matrix form P^@' as (18) $@' =PtftFt'. Here and Ft = (F,, PJ)with where A,, is the last n, rows of A,. It is straightforward to show the above equality (18) using the relations A =Aft-l, f =ffir, and A, =A,ftyl. For instance, d, (&f2)-Q, =A,(Z;Z,)-%', . To obtain the limit distribution of the FML, we need the next lemma. LEMMA6: If rank(A,) =g <n, and rank(Tl) =n,, there exists an nln3k X nln3k nonsingular matrix K: such that ~,*lPt=F$r,?, where FT is an n n k x (nm, + rn) matrix, p: is an nln3k x (nm, + rm,) matrix, T;*TInm1, = diag(tr' TInr), and T$ = diag(@~,,,, TIrm2). Moreouer, where Pll is an nl(n3(k -1)+g) X nm, matrix of full row rank and P2, is an nl(n3-g) x m matrix, and where P12 =P2,( A ,@Ir) is an n1(n3 -g) X rm, matrix of full row rank. Using (18) and Lemma 6 we have from (17) From the last expression, we get by Lemma 5 d F~~ F~~(l) +F~~(2) where and FML(,) and FML(,, are independent since N is independent of B2(s) and Bc(s)by Lemma 5(ii). H.Y. TODA AND P. C. B. PHILLIPS Because vec (N') =N(O,Z; @ Z,), we easily see that FML(,,=~:,,~(k-l)+nlg. As for F,,(,,, note that by the same argument as that of Lemma 5.1 of Park and Phillips (1988) In1(n3-g)) since B2(s) and Bc(s) are independent by Lemma 5(i). Hence, FML(,,=X:1(n3-g,. Since FML(,,and FML(,)are independent, we deduce that To interpret the decomposition of the limiting distribution into the FML(,,and FML(,,components, consider testing the following hypotheses in ECM's: 4;: JCl3= ... =J,*_,,,,=O and rl=O and 4;: A3=0. These are sufficient (but obviously not necessary) conditions for noncausality. It follows easily from Lemma 5(ii) that under the null hypothesis the Wald statistic for testing 4; converges to a chi-square distribution with n,n,(k -1) +n,r degrees of freedom. It is also easy to see that under 4; the Wald test for 4; has an asymptotic chi-square distribution with n3r degrees of freedom by Lemma 5(i). Moreover, these two Wald statistics are asymptotically independent by Lemma 5(i) and (ii). The F+-(,, and FML(,)terms, to some extent, come from testing 4; and 4$,respectively. However, the derivation of the asymptotic distributions 'in the case of testing M,'" in (5) is much more complicated and the ranks of the matrices r, and A, play an important role in the final result. Next suppose that rank(A,) =n3 and rank(Td G n,. In this case it is easy to show that Lemma 6 holds with no P2,, viz., P, 3(P,,, 0) and P$3(P,,, 0) where P,, is an n,n,k xnm, matrix of full row rank. Therefore which also has a X:1n3, distribution. We summarize these results in our next theorem. THEOREM3: Suppose that in the model (2) (or equivalently (1)) y, is cointe- grated. If rank (A,)=n, or rank (TI)=n,, then under the null hypothesis (5) (or d x2 equivalently (4)), FML+ ~ ~ ~ ~ ~ . Unfortunately, even the ML method does not always guarantee the usual chi-square asyrnptotics because the rank condition in Theorem 3 is not always satisfied under the null. To illustrate the problem that arises when there is rank VECTOR AUTOREGRESSIONS 1385 deficiency both in A, and r,we provide the following example. (See Toda and Phillips (1991a) for more details.) EXAMPLE2: Consider the trivariate cointegrated system where Y: = (Y,,, Y,,, Y,,), u: = bit, u,,, u,,), y' = (y,, y2, Y,) = (0,1,1), and (I.'= (a,, a,, a,) = (1, -i,0). In this example we use lower case letters to signify vectors and scalars. (For example, a corresponds to A.) Suppose that we want to test whether y,,-, causes y,,. Then the null hypothesis is and the Wald statistic given by (17) becomes where a,, = var (u,,) and 2;= (B'y,,, . . . , B'y,.-,). This can be rewritten as - because S;a = a, = 0 and hence & .,= S; A. = Sja = a .,by Lemma 5(vi)(a). Since by Lemma 5 6ulf;uul, 6, -o,, T($z,)-' f;u;', T2(2;i2)-' 4 d (JB,B;)-', figl = fi(j, -y,) 4 P N(0, uuIu;'), and T&,= T(&, -a,) a ,(~B,B;)-~JB, dB,, we have where each of X, and X, is distributed as chi-square with one degree of freedom, and X, and X, are independent. Thus, FMLdoes not converge to a X? distribution but to a nonlinear function of independent chi-square variates. This occurs because both of y, and a, are zero. Under the null hypothesis (20), we can expand as 1 where j, -y, = O,(T-T), a, -a, = 0 (T-I), and (TI -yl)(&, -a,) = Op(T-$1. If a, # 0, jl&,is asymptotically dominated by the first term. If a, = 0 but yl # 0, then j,&,is asymptotically dominated by the second term. In either case the Wald statistic F,, will have an asymptotic chi-square distribution with one degree of freedom because @(j, -y,) is asymptotically normal and T(&,-a,) is asymptotically mixed normal. If a, = y, = 0, however, TI&, is equal to the third term and therefore the usual chi-square asymptotics do not hold. H. Y. TODA AND P. C. B. PHILLIPS -Limit density (21) ----Chi-squared, The density of the distribution in (21) is graphed in Figure 1against that of a X: variate. (The analytic form of this density is given in Toda and Phillips (1991a).) As is apparent from the figure, the density of (21) is much more concentrated near the origin and has a much thinner tail than the X: distribution. If we were to test the null hypothesis (20) using a critical value obtained from (21) for the Wald statistic in this case, the test would have much greater power than a test that employed a nominal X: critical value. In practice, of course, we do not know that both y, and a, are zero, and an investigator who routinely employed a X: critical value in this case would suffer from both the size distortion and the resulting power loss. In sum, if the system is subject to cointegration, causality tests based on ML estimation may well collapse and not satisfy the usual chi-square asymptotics, not because of failure to use information on unit roots (as in the levels VAR estimation), but because of the nonlinear constraints r,A;= 0 that are neces- sarily involved in the null hypothesis. Thus, we need to know whether the condition that (i) rank(r,) =n, or that (ii) rank(A,) =n, holds. Unless we have a reason to believe a priori that either condition (i) or (ii) holds, we have to test the conditions empirically. This can, of course, be done using the ML estimates of r, and A,. In particular, condition (i) or (ii) can easily be tested if n, = 1or n, = 1, respectively. A companion paper, Toda and Phillips (1991b) proposes some sequential test procedures for causality that are applicable when n, = 1 and/or n, = 1. VECTOR AUTOREGRESSIONS In the above development of the asyrnptotics of causality tests, we ignored the issue of the estimation of r. A natural question is how severe its impact on the causality test can be in small or moderately sized samples. Furthermore, even if r were known, some size distortion and loss of power are inevitable in the causality test in ECM's because of its sequential nature (i.e., verify the rank condition and then test causality). To address these issues, Toda and Phillips (1991b) conducted a simulation study which investigated the performance of the causality tests in ECM's when n, = 1or n, = 1, in comparison with conventional causality tests in levels VAR's and in difference VAR's. These simulations found some favorable evidence in support of the sequential causality test based on the ECM formulation at least in relatively large samples (more than 100 observations) and showed that the sequential procedure generally outperforms the conventional tests. The results for the Wald test derived in this section are not affected by the presence of a constant term in the true model and the inclusion of a constant term (and a time trend term) in the estimated equation. The sequential test suggested in Toda and Phillips (1991b) can also be implemented in such models. 5. CONCLUSION This paper has studied the asyrnptotics of Granger causality tests in unre- stricted levels VAR's and Johansen-type ECM's. The results of our analysis are not encouraging for these tests in levels VAR's. Our main conclusions regarding the use of Wald tests in levels VAR's are as follows. (i) Causality tests are valid asymptotically as X2 criteria only when there is suficient cointegration with respect to the variables whose causal effects are being tested. The precise condition for sufficiency involves a rank condition on a submatrix of the cointegrating matrix. Since the estimates of such matrices in levels VAR's suffer from simultaneous equations bias (as shown in Phillips (1991)) there is no valid statistical basis for determining whether the required sufficient condition applies. (ii) When the rank condition for sufficiency fails, the limit distribution is more complex and involves a mixture of a X2 and a nonstandard distribution, which generally involves nuisance parameters. The precise form of the distribution depends on the actual rank of a submatrix of the cointegrating matrix and again no valid statistical basis for mounting a Wald test of causality applies. In view of these results we conclude the empirical use of Granger causality tests in levels VAR's is not to be encouraged in general when there are stochastic trends and the possibility of cointegration. (iii) If there is no cointegration the Wald test statistic for causality has a nonstandard but nuisance parameter free limit distribution. This distribution could be used for tests when it is known that there are stochastic trends but no cointegration in the system. Testing for causality in ECM's is more promising than in levels VAR's but it still involves some difficulties. Our main results are as follows. H.Y. TODA AND P. C. B. PHILLIPS (iv) Wald tests for causality in ECM's are not always valid asymptotic X2 criteria. (v) Problems of nuisance parameter dependencies and nonstandard distribu- tions enter the limit theory in the general case. (vi) Sufficient rank conditions for causality tests to be asymptotically valid X2 tests are given. These rank conditions relate to submatrices of both the cointe- grating matrix and the loading coefficient matrix. They can, in principle, be tested empirically using the ML estimates of these submatrices. Granger causality tests in systems of stochastic difference equations are fraught with many complications when there are stochastic trends and cointe- gration in the system. The results for causality tests in ECM's are deserving of some emphasis in view of the fact that other types of Wald tests in ECM's are known to be asymptotically valid X2 tests. Since ML estimation of ECM's delivers optimal estimates of the cointegrating space, ECM's provide a more promising basis than VAR's for the sequential inference procedures that are needed to adequately test causality hypothesis in these models. Indeed, simula- tion exercises reported in Toda and Phillips (1991b) indicate that the sequential test in ECM's works reasonably well in small systems (3 or 4 variables) and moderately large sample sizes (more than 100 observations). Institute of Socio-Economic Planning, University of Tsukuba, Tsukuba-shi, Ibaraki 305, Japan and Cowles Foundation for Research in Economics, Yale University, P.O. Box 2125 Yale Station, New Haven, CT 06520-2125, U.S.A. Manuscript received July, 1991; final revision received ~ebruary, 1993. APPENDIX PROOFOF LEMMA1: (i) From (2)we have ('41) 21,+1 = GZl,+ Fu, where J;, ... J:-, r and F= [ek-~:In]. I Since z,, is I(0) by assumption, the eigenvalues of G must be all less than unity. Hence we can write (Al) as (A2) z,,= O(L)FU,-~ where O(L) = C7=,,Oj~j= Cy=,GjLj. 1389 Now by the same argument as that of Theorem 2.2 of Chan and Wei (19881, ~-~C~,~z~,z;, 5Z1 and where Bo(s) is an n-vector Brownian motion with covariance matrix Z,, and 5 is an nml-dimen- sional normal random vector with mean zero and covariance matrix 8,88, with Z1 = Ezl,z;, = ~=o~j~Z,~'~J. Bob) and 5 are independent. Since @(L) is the inverse of I -GL and (I-GLI = 0 has only stable roots, we see from Brillinger (1981, p. 77) that for all p a 0 where (IOj(l, denotes the sum of the absolute value of the entries of Oj. This in turn implies that m Cj211@,112< j=1 where I(@,I( = tr(@,@j'); . Thus, by the multivariate extension of Theorem 3.4 of Phillips and Solo (19921, Since Az,, =A1, Ay,-,, we also have from (7) and (A4) 5A', w@(~)FB,(s) +A1,B0(s) =A', [I, + w@(l)F]~,(s) . Next, we set B,(s) = @(l)FBo(s) and B2(s) =A', [ I, + W@(l)F]Bo(s) . Combining (~3), (A4), and (AS) we have (i) as stated and the covariance matrix of B(s) = (Bob)', B,(s)', B2(s)')' is given by 0. (ii) Inserting (A2) into (7) gives (A6) Ay, = W@(L)Fu,-, + u, = [I, + w@(L)FL]u, . Hence, C(L) =I, + W@(L)FL (cf. (B)), and C(1) =I, + W@(l)F. Therefore, from (A51 we have (iii) It is obvious that 0, = 8, = S,, which is p.d. by assumption. From (A7) we have (A8) flz =At,C(l)Z,C(l)'A ,, which is also p.d. since R(C(1)) = R(AIL = R(A ,). The positive definiteness of 8, is proved from (All in the same way as Lemma 5.5.5 of Anderson (1971). Since Az2, =A1, Ay,-, is a function of the past history of the innovations {u,-,, u,-,, . . .} we have E A~,,U;+~ = 0. = 0 for all j a 0. Hence Zzo =Azo PROOFOF LEMMA2: (i) was proved in Lemma l(i). The rest of Lemma 2 immediately follows from Lemma 1 and from Lemma 2.1 of Park and Phillips (1989) noting that 8,, = A,, = 0. PROOFOF LEMMA3: Define an n, x g matrix K1 with rank(K1) =g and an n3 X (n, -g) matrix K, with rank(K2) = n3 -g such that R(Kl) = R(A3) and R(K2) = R(AJL, i.e., K;A3 = 0. Let H. Y. TODA AND P. C. B. PHILLIPS K, = [fi~,,TK,] with K, = [D@ In3, ik @ K,] and K, = ik@ K, where ik is a k-vector of ones. Then, the required results hold with D'D @ S; ek-, @A3 R,, = K;A3 and R2, =K;AL3. PROOFOF LEMMA4: If y, is not cointegrated, we have the VAR(k -1) representation in first order differences such that Since I, -J*(L)L is invertible, (8)becomes Ay, = [I, -J*(L)L]-lu,, i.e., C(L)= [I, -J*(L)L]-I. Hence from Lemma l(ii)we have B,(s) = [In-J*(l)]-'B,(s) since A. =In.Thus since B,(s) = S;B,(s) and Bb(s)= S;+,B,(s). Multiplying (SiB,S1)-on both sides of this last equation, we have W,(s) =Lt1W,(s) where Lt1=(S;B,S,)- +[Inl -J,*,(l), -J&(l)]f2,+. Note that Lt1 is of full row rank, and that Lt1L, =In1 since Wl(s)is a standard Brownian motion. Define a nonsingular matrix L = [L,, L,] where L, is an (n, + n,) x n, matrix such that Lt2Ll= 0 and L1,L2= InZ Then we can write Notice that in (13)we may replace Wb(s)with kwb(s)which is also a standard Brownian motion independent of W,(s). Therefore, redefining Wb(s)as L'W,(S) gives the required result. PROOFOF LEMMA5: (i) This is proved in Lemma 8 of Johansen (1988). (ii)-Recall that z,, =H;x, where x, and H, were defined in Section 2. Define H, = [D@ I,, ek @A]. Then ill=H;x, and = Z;Z, + (H, -H,)'x'z,+ Z;X( H, -H,) + (H, -H,)'x'x( H, -H,) where H, -HI = [0, ek 8 (2-A)] = o,(T-') by virtue of (i). By the same argument as (AS) and (A71 Hence, by Lemma 2.1 of Park and Phillips (19891, X'Z, = O,(T) and XIX= 0,(T2). Therefore, from (A9) VECTOR AUTOREGRESSIONS by Lemma 2(iXa). Also (All) (z;-G)Zl= (HI -H~)'x~z~-H~)'x~x(H~)= oP(l). + ( HI Furthermore u'Z, = U1Z1+ U'X(H, -HI) where UtX= Op(T). Hence, by Lemma 2(iXb) where vec(No)-= N(O,gl ~8,). Now since ly =AY1Z1(ZiZl)-', we have from (AlO), (All), and (A121 where vec (N') = (2;' @ 1,)vec (No) = N(0, 2;' @ 2,). Furthermore, N is independent of (B,(sY, B,(s)'Y because B,(s) and B,(s) are linear combinations of elements of Bob) (see Lemma 1 in Section 2 and Lemma 8 of Johansen (1988)) and No is independent of Bo(s) by Lemma l(i). (iii) This is proved in Theorem 3 of Johansen (1988). (iv) This follows immediately from (ii) and (iii). (v) This was proved in (ii) above. (vi) Write where & =AIL,and ft ,=~',a,. Since each column of a, is an eigenvector of the equation (9) of Johansen (19881, where Sij(i, j = 0,l) are the product moment matrices defined by (8) of Johansen (1988) with k-lagged level variables replaced by one-lagged levels (i.e., Sll, Sol, and Slo correspond to Johansen's Skk, SO,, and S,,, res~ectivel~),~Aj (j = r + 1,.. .,n) are the :igenva!ues correspondin! to a,, and the last equality follows from the normalization conditipn: A1,SllA, = I,-,. Since Aj (j= r + 1,.. . ,n) are o,(T-') by Lemma 6 of Johansen (19881, A ,and hence & and II, are OJT-t). Also from the normalization condition =&A~S~,A&+A;A~,S,~A&+ &x1sl1~,fiL+filL~l,~ll~ . ,fi, Hence A;A~,s,,A I,-, ,A, since A1Sl1A = OP(1) by Lemma 3 of Johansen (1988) and A1Sl1A ,= OP(1) similarly. Thus Notice that we have the level variables y,-I in (21, while Johansen formulates the model so as to have y,-, as the level variables. This difference, of course, does not affect the asymptotics. H. Y.TODA AND P. C. B. PHILLIPS Since T-'A',S~~A,5 /B,B; (in our notation) by Lemma 3 of Johansen (1988), it follows that I?;' = O,(T+) and therefore where YL,=[yo,.. .,YT-I] Hence by (A13) and Lemma 2. PROOFOF LEMMA6: Define an nln,k X n,n,k matrix KF = [J?;KT,TK$] with where Kl and K, are defined in the poof of Lemma 3. We give a proof of the result for pt. The result for P can be proved in an entirely analogous manner. Now Since A, %A,, fi~;A,= fi~;(A, -A,) -50, Fl 1;r1and K;A, .= K;s;A, = K;S;A ,= KiA,, by Lemma 5(i), (ii), and (viXa), we have T-~K~IPJ 50, ,FK;~P,= (0, DK;~, 8 s;) 50, and K,*'@=K;A.,@~',~K;A, , @~,=PJ, where PI, and PJ2 are full row rank matrices as required. REFERENCES ANDERSON,T. W. (1971): The Statistical Analysis of Time Series. New York: Wiley. BRILLINGER,D. R. (1981): Time Series: Data Analysis and Theory. San Francisco: Holden-Day. CHAN, N. H., AND C. Z. WEI (1988): "Limiting Distributions of Least Squares Estimates of Unstable Autoregressive Processes," Annals of Statistics, 16, 367-401. JOHANSEN, S. (1988): "Statistical Analysis of Cointegration Vectors," Journal of Economic Dynamics and Control, 12, 231-254. -(1991): "Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Autoregressive Models," Econometrica, 59, 1551-1580. PARK, J., AND P. C. B. PHILLIPS(1988): "Statistical Inference in Regressions with Integrated Processes: Part I," Econometric Theory, 4, 468-497. -(1989): "Statistical Inference in Regressions with Integrated Processes: Part 11," Econometric Theory, 5, 95-131. PHILLIPS,P. C. B. (1991): "Optimal Inference in Cointegrated Systems," Econometrica, 59, 283-306. PHILLIPS,P. C. B., AND S. N. DURLAUF (1986): "Multiple Time Series Regression with Integrated Processes," Review of Economic Studies, 53, 573-496. PHILLIPS,P. C. B., AND V. SOLO (1992): "Asymptotics for Linear Processes," Annals of Statistics, 20, 971-1001. SIMS, C. A. (1980): "Macroeconomics and Reality," Econometrica, 48, 1-48. SIMS, C. A,, J. H. STOCK,AND M. W. WATSON (1990): "Inference in Linear Time Series Models with Some Unit Roots," Econometrica, 58, 113-144. TODA,H. Y., AND P. C. B. PHILLIPS(1991a): "Vector Autoregressions and Causality," Cowles Foundation Discussion Paper No. 977. -(1991b): "Vector Autoregression and Causality: A Theoretical Overview and Simulation Study," University of Western Australia Working Paper 91-07, to appear in Econometric Reviews, 1993.

and S =Ik @ S, with S, =

where r and A are n x r matrices of full column rank r, 0 G r G n -1. (If J* = 0, then r = 0 and there is no ror A.)

ONE OF THE MAJOR POTENTIAL APPLICATIONS of unrestricted estimation in systems of stochastic difference equations is to test for causality between subsets of the variables. Such tests have become common in the empirical literature following their use in Sims (1980) to test the block exogeneity of the real sector in vector autoregressions (VAR's) fitted with real and monetary variables for both Germany and the U.S.A. Such tests are routinely performed using Wald criteria that are thought to be asymptotically chi-squared, as indeed they are in stationary or trend stationary systems. In recent work, Phillips and Durlauf (1986), Park and Phillips (1988, 1989), and Sims, Stock, and Watson (1990) (hereafter SSW) have all shown that the asymptotic theory of Wald tests is typically much more complex in systems that involve variables with stochastic trends. In general, one can expect the limit theory to involve nuisance parame- ters and nonstandard distributions both of which substantially complicate infer- ence procedures, as originally pointed out in the Phillips-Durlauf paper.

Comments