Diff for "DsgeVar" - DynareWiki

Differences between revisions 3 and 4

Consider the order latex($p$) VAR representation for the latex($1\times m$) vector of observed variables latex($y_t$):

$y_{t}=\sum_{k=1}^{p} y_{t-k} \mathbf{A}_{k} + u_t$

where latex($u_t\sim \mathcal N\left( 0,\Sigma_u\right)$). Let latex($z_t$) be the latex($mp\times 1$) vector latex($\left[y_{t-1}',...,y_{t-p}'\right]'$) and define latex($\mathbf{A}=\left[\mathbf A_1',...,\mathbf A_p'\right]'$), the VAR representation can then be written in matrix form as:

$Y=Z\mathbf A +\mathcal U$

where latex($Y = (y_1',\dots,y_T')'$), latex($Z = (z_1',\dots,z_T')'$) and latex($\mathcal U = (u_1',\dots,u_T')'$).

Dummy observations prior for the VAR can be constructed using the VAR likelihood function for latex($\mathcal T = [\lambda T]$) artificial data simulated with the DSGE latex($\left( Y^{\ast },Z^{\ast}\right)$), combined with diffuse priors. The prior is then given by:

$p_{0}\left( \mathbf A, \Sigma \mid Y^*,Z^* \right) \propto \left\vert \Sigma \right\vert ^{-\frac{\lambda T+m+1}{2}}e^{-\frac{1}{2}tr\left[ \Sigma^{-1}\left( {Y^*}'Y^*-\mathbf{A}'{Z^*}'Y^*-{Y^*}'Z^*\mathbf A+ \mathbf A'{Z^*}'Z^*\mathbf A \right) \right] }$

implying that $\Sigma$ follows an inverted Wishart distribution and $\mathbf A$ conditional on $\Sigma$ is gaussian. Assuming that observables are covariance stationary, \cite{DS2004} use the DSGE theoretical autocovariance matrices for a given $n\times 1$ vector of model parameters $\theta $, denoted $\Gamma_{YY}\left( \theta \right) $, $\Gamma_{ZY}\left( \theta \right) $, $\Gamma_{YZ}\left( \theta \right)$, $\Gamma_{ZZ}\left( \theta \right) $ instead of the (artificial) sample moments ${Y^*}'Y*$, ${Z^*}'Y*$, ${Y^*}'Z*$, ${Z^*}'Z*$. In addition, the $p$-th order VAR approximation of the DSGE provides the first moment of the prior distributions through the population least-square regression: \begin{subequations}

\begin{equation}\tag{P1a}\label{prior1a}
- \mathrm A^*( \theta ) = \Gamma_{ZZ}\left( \theta \right)^{-1}\Gamma_{ZY}\left( \theta \right)
\end{equation} \begin{equation}\tag{P1b}\label{prior1b}
- \Sigma^*(\theta) = \Gamma_{YY}(\theta) -\Gamma_{YZ}(\theta) \Gamma_{ZZ}\left( \theta \right)^{-1}\Gamma_{ZY}\left( \theta \right)
\end{equation}

\end{subequations} Conditional on the deep parameters of the DSGE $\theta $ and $\lambda$, the priors for the VAR parameters are given by: \begin{equation}\tag{P2}\label{prior2}

\begin{split}
- \VEC \mathbf A \mid \Sigma ,\theta, \lambda &\sim \normal{\VEC\mathbf A^*(\theta)}{\Sigma
  - \otimes \left[\lambda T \Gamma_{ZZ}(\theta)\right]^{-1}}\\
  \Sigma \mid \theta,\lambda &\sim \mathcal{IW}\left(\lambda T \Sigma^*(\theta),\lambda

T-mp-m\right)

\end{split}

\end{equation} where $\Gamma_{ZZ}(\theta)$ is assumed to be non singular and $\lambda \geq \frac{mp+m}{T}$ for the priors to be proper\footnote{Note that it

would not be possible to estimate the VAR model by OLS (or maximum
likelihood) if we had $\mathcal T < m(p+1)$. In this case we would not have more observations than parameters to estimate.}. The

\textit{a priori} density of $\mathbf A$ is defined by $n+1$ parameters ($\theta$ and $\lambda$), which is likely to be less than $mp$ (the VAR number of parameters). If we have a one-to-one relationship (no identification issues) between $(\theta,\lambda)$ and $\mathbf A$ it will be a good idea to estimate $(\theta,\lambda)$ instead of $\mathbf A$, \textit{ie} to estimate fewer free parameters. To do so, \cite{DS2004} complete the prior by specifying a prior distribution over the structural model's deep parameters: $p_0(\theta)$. We still have to set the weight of the structural prior, $\lambda$. \citeauthor{DS2004} choose the value of $\lambda$ that maximizes the marginal density. They estimate a limited number of DSGE-VAR models with different values of $\lambda$. For each model they also estimate the marginal density and select the model (\textit{ie} the value of $\lambda$) with highest marginal density. In the present paper, we estimate directly $\lambda$ as another parameter, instead of doing a loop over the values of this parameter\footnote{In this regard, the approach followed by

\citeauthor{DS2004} is, at least computationally, inefficient. Also, contrary to us, they do not average over different possible values of $\lambda$ but pick a single value of this parameter, which is not the bayesian way.}. So we define a prior on the distribution of

$\lambda$, which is assumed to be independent from $\theta$. Finally, the DSGE-VAR model has the following prior structure: \begin{equation}\tag{P3}\label{prior3} p_0\left( \mathbf A,\Sigma, \theta, \lambda \right) = p_0\left( \mathbf A, \Sigma \mid \theta ,\lambda \right) \times p_0\left( \theta \right) \times p_0\left( \lambda \right) \end{equation} where $p_0\left(\mathbf A, \Sigma \mid \theta ,\lambda \right)$ is defined by [\ref{prior1a},\ref{prior1b}] and \equaref{prior2}.}\newline

\par{The posterior distribution, may be factorized in the following way: \begin{equation}\tag{Q3}\label{posterior1} p\left( \mathbf A, \Sigma , \theta , \lambda \mid \mathcal Y_T\right) = p\left(\mathbf A, \Sigma \mid \mathcal Y_T, \theta, \lambda\right) \times p\left( \theta ,\lambda \mid \mathcal Y_T\right) \end{equation} where $\mathcal Y_T$ stands for the sample. A closed form expression for the first density function on the right hand side of \equaref{posterior1} is available. Conditional on $\theta $ and $\lambda$, [\ref{prior1a},\ref{prior1b}] and \equaref{prior2} define a conjugate prior for the VAR model, so its posterior density has to belong to the same family: the distribution of $\mathbf A$ conditional on $\Sigma$, $\theta$, $\lambda$ and the sample is matric-variate normal, and the distribution of $\Sigma$ conditional on $\theta$, $\lambda$ and the sample is inverted Wishart. More formally, we have: \begin{equation}\tag{Q2}\label{posterior2}

\begin{split}
- \VEC \mathbf A \mid \Sigma, \theta , \lambda, \mathcal Y_T
  - & \sim \normal{\VEC \widetilde{\mathbf A}(\theta,\lambda)}{\Sigma \otimes V(\theta,\lambda)^{-1}}\\
  \Sigma \mid \theta, \lambda, \mathcal Y_T &\sim
  - \mathcal{IW} \left( (\lambda+1) T~\widetilde{\Sigma}(\theta,\lambda),

(\lambda+1)T-mp-m\right)

\end{split}

\end{equation} where: \begin{subequations}%\tag{Q1}\label{posterior3}

\begin{equation}\tag{Q1a}\label{posterior3a}
- \widetilde{\mathbf A}(\theta,\lambda) = V(\theta,\lambda)^{-1}\left( \lambda
  - T~\Gamma_{ZY}(\theta)+Z'Y\right)
\end{equation}

\begin{equation}\tag{Q1b}\label{posterior3b}
- \widetilde{\Sigma}(\theta,\lambda) = \frac{1}{(1+\lambda)T}
  - \left[ \lambda T~\Gamma_{YY}(\theta) + Y'Y - \left(\lambda T~\Gamma _{YZ}(\theta) +Y'Z\right) V(\theta,\lambda)^{-1}\left(
    - \lambda T~\Gamma_{ZY}(\theta)+Z'Y\right)\right]
\end{equation}

\end{subequations} with: \[ V(\theta,\lambda) = \lambda T~\Gamma_{ZZ}(\theta) +Z'Z \] Not surprisingly, we find that the posterior mean of $\mathbf A$ is a convex combination of $A^*(\theta)$, the prior mean, and of the OLS estimate of $\mathbf A$. When $\lambda$ goes to infinity the posterior mean shrinks towards the prior mean, \textit{ie} the projection of the DSGE model onto the VAR($p$).\newline We do not have a closed form expression for the joint posterior density of $\theta $ and $\lambda$ (the second term on the right hand side of \equaref{posterior1}). So the posterior distribution of $(\theta,\lambda)$ is recovered from an MCMC algorithm, as described in \cite[appendix B]{DS2004}, except that we do estimate $\lambda$ as the deep parameters $\theta$.\footnote{This can be done with

\href{http://www.cepremap.cnrs.fr/dynare}{Dynare} 4.}}\newline

\par{All in all, this estimation procedure allows to select the

\textquotedblleft best specification \textquotedblright from a continuum of intermediate models indexed by $\lambda$ and ranging between the VAR($p$) with diffuse priors and the VAR($p$) approximation of the DSGE model. The posterior distribution of the deep parameters, $\theta$, can then be interpreted as the best model to be used as a prior for the corresponding VAR($p$). In addition, the posterior distribution of $\lambda$ gives an indication of the reliability of the DSGE model and of the empirical relevance of the associated economic restrictions.}\newline

\par{Notice that, when $\lambda$ is closer to its lowest possible

value, the DSGE-VAR approximates an unrestricted VAR with diffuse priors. Given the number of observed variables in our study, such VAR has obviously poor empirical performance due to many free parameters and associated sampling errors. The econometric literature has shown that Bayesian VARs can improve on unrestricted VAR by introducing \textquotedblleft Minnesota-like \textquotedblright priors for example, which favor persistence, low cross-variable interactions and smaller coefficients at distant lags. Therefore, the DSGE-VAR approach is bound to call for higher $\lambda$ to increase the tightness of priors regarding serial correlations for instance which does not necessarily mean that the economic restrictions of the DSGE are more consistent with the data. A way to circumvent this problem would be to introduce an other source of dummy observation coming from a BVAR with some version of Minnesota priors and let the procedure infer the relative weight to put to the DSGE priors and the \textquotedblleft Minnesota priors \textquotedblright. This avenue is left for further research.}\newline

-  ⇤ ← Revision 3 as of 2008-05-08 13:27:10 → 
  Size: 678
  Editor: StéphaneAdjemian
  Comment:
+   ← Revision 4 as of 2008-05-08 13:36:05 → ⇥
  Size: 10053
  Editor: StéphaneAdjemian
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 14:
+where [[latex($Y = (y_1',\dots,y_T')'$)]], [[latex($Z = (z_1',\dots,z_T')'$)]] and [[latex($\mathcal U = (u_1',\dots,u_T')'$)]].

Dummy observations prior for the VAR can be constructed using the VAR likelihood function for [[latex($\mathcal T = [\lambda T]$)]] artificial data simulated with the DSGE [[latex($\left( Y^{\ast },Z^{\ast}\right)$)]], combined with diffuse priors. The prior is then given by:
{{{#!latex
\[
p_{0}\left( \mathbf A, \Sigma \mid Y^*,Z^* \right)
\propto
\left\vert \Sigma \right\vert ^{-\frac{\lambda T+m+1}{2}}e^{-\frac{1}{2}tr\left[ \Sigma^{-1}\left(
{Y^*}'Y^*-\mathbf{A}'{Z^*}'Y^*-{Y^*}'Z^*\mathbf A+ \mathbf A'{Z^*}'Z^*\mathbf A \right) \right] }
\]
}}}
implying that $\Sigma$ follows an inverted Wishart distribution and
$\mathbf A$ conditional on $\Sigma$ is gaussian. Assuming that
observables are covariance stationary, \cite{DS2004} use the DSGE
theoretical autocovariance matrices for a given $n\times 1$ vector of
model parameters $\theta $, denoted $\Gamma_{YY}\left( \theta \right)
$, $\Gamma_{ZY}\left( \theta \right) $, $\Gamma_{YZ}\left( \theta
\right)$, $\Gamma_{ZZ}\left( \theta \right) $ instead of the
(artificial) sample moments ${Y^*}'Y^*$, ${Z^*}'Y^*$, ${Y^*}'Z^*$,
${Z^*}'Z^*$. In addition, the $p$-th order VAR approximation of the
DSGE provides the first moment of the prior distributions through the
population least-square regression:
\begin{subequations}
    \begin{equation}\tag{P1a}\label{prior1a}
        \mathrm A^*( \theta ) = \Gamma_{ZZ}\left( \theta
        \right)^{-1}\Gamma_{ZY}\left( \theta \right)
    \end{equation}
    \begin{equation}\tag{P1b}\label{prior1b}
        \Sigma^*(\theta) = \Gamma_{YY}(\theta) -\Gamma_{YZ}(\theta)
        \Gamma_{ZZ}\left( \theta \right)^{-1}\Gamma_{ZY}\left( \theta \right)
    \end{equation}
\end{subequations}
Conditional on the deep parameters of the DSGE $\theta $ and $\lambda$, the priors for the
VAR parameters are given by: 
\begin{equation}\tag{P2}\label{prior2}
 \begin{split}
    \VEC \mathbf A \mid \Sigma ,\theta, \lambda &\sim \normal{\VEC\mathbf A^*(\theta)}{\Sigma
        \otimes \left[\lambda T \Gamma_{ZZ}(\theta)\right]^{-1}}\\ 
    \Sigma \mid \theta,\lambda &\sim \mathcal{IW}\left(\lambda T \Sigma^*(\theta),\lambda
T-mp-m\right)
 \end{split}
\end{equation}
where $\Gamma_{ZZ}(\theta)$ is assumed to be non singular and $\lambda
\geq \frac{mp+m}{T}$ for the priors to be proper\footnote{Note that it
  would not be possible to estimate the VAR model by OLS (or maximum
  likelihood) if we had $\mathcal T < m(p+1)$. In this case we would
  not have more observations than parameters to estimate.}. The
\textit{a priori} density of $\mathbf A$ is defined by $n+1$
parameters ($\theta$ and $\lambda$), which is likely to be less than
$mp$ (the VAR number of parameters). If we have a one-to-one
relationship (no identification issues) between $(\theta,\lambda)$ and
$\mathbf A$ it will be a good idea to estimate $(\theta,\lambda)$
instead of $\mathbf A$, \textit{ie} to estimate fewer free parameters.
To do so, \cite{DS2004} complete the prior by specifying a prior
distribution over the structural model's deep parameters:
$p_0(\theta)$. We still have to set the weight of the structural
prior, $\lambda$.  \citeauthor{DS2004} choose the value of $\lambda$
that maximizes the marginal density. They estimate a limited number of
DSGE-VAR models with different values of $\lambda$. For each model
they also estimate the marginal density and select the model
(\textit{ie} the value of $\lambda$) with highest marginal density. In
the present paper, we estimate directly $\lambda$ as another
parameter, instead of doing a loop over the values of this
parameter\footnote{In this regard, the approach followed by
  \citeauthor{DS2004} is, at least computationally, inefficient. Also,
  contrary to us, they do not average over different possible values
  of $\lambda$ but pick a single value of this parameter, which is not
  the bayesian way.}. So we define a prior on the distribution of
$\lambda$, which is assumed to be independent from $\theta$. Finally,
the DSGE-VAR model has the following prior structure:
\begin{equation}\tag{P3}\label{prior3}
p_0\left( \mathbf A,\Sigma, \theta, \lambda \right) = p_0\left(
\mathbf A, \Sigma \mid \theta ,\lambda \right) \times
p_0\left( \theta \right) \times p_0\left( \lambda \right)
\end{equation}
where $p_0\left(\mathbf A, \Sigma \mid \theta ,\lambda \right)$ is
defined by [\ref{prior1a},\ref{prior1b}] and
\equaref{prior2}.}\newline

\par{The posterior distribution, may be factorized in the following way:
\begin{equation}\tag{Q3}\label{posterior1}
p\left( \mathbf A, \Sigma , \theta , \lambda \mid \mathcal Y_T\right) = p\left(\mathbf A, \Sigma
\mid \mathcal Y_T, \theta, \lambda\right) \times
p\left( \theta ,\lambda \mid \mathcal Y_T\right)
\end{equation}
where $\mathcal Y_T$ stands for the sample. A closed form expression for the first density function
on the right hand side of \equaref{posterior1} is available. Conditional on $\theta $ and $\lambda$,
[\ref{prior1a},\ref{prior1b}] and \equaref{prior2} define a conjugate prior for the VAR model, so
its posterior density has to belong to the same family: the distribution of $\mathbf A$ conditional
on $\Sigma$, $\theta$, $\lambda$ and the sample is matric-variate normal, and the distribution of
$\Sigma$ conditional on $\theta$, $\lambda$ and the sample is inverted Wishart. More formally, we
have:
\begin{equation}\tag{Q2}\label{posterior2}
    \begin{split}
        \VEC \mathbf A \mid \Sigma, \theta , \lambda, \mathcal Y_T  
            & \sim \normal{\VEC \widetilde{\mathbf A}(\theta,\lambda)}{\Sigma \otimes
            V(\theta,\lambda)^{-1}}\\
        \Sigma \mid \theta, \lambda, \mathcal Y_T &\sim 
            \mathcal{IW} \left( (\lambda+1) T~\widetilde{\Sigma}(\theta,\lambda),
(\lambda+1)T-mp-m\right)
    \end{split}
\end{equation}
where:
\begin{subequations}%\tag{Q1}\label{posterior3}
     \begin{equation}\tag{Q1a}\label{posterior3a}
         \widetilde{\mathbf A}(\theta,\lambda) = V(\theta,\lambda)^{-1}\left( \lambda
             T~\Gamma_{ZY}(\theta)+Z'Y\right)
     \end{equation}
    \begin{equation}\tag{Q1b}\label{posterior3b}
         \widetilde{\Sigma}(\theta,\lambda) = \frac{1}{(1+\lambda)T}
             \left[ \lambda T~\Gamma_{YY}(\theta) + Y'Y - \left(\lambda T~\Gamma _{YZ}(\theta)
             +Y'Z\right) V(\theta,\lambda)^{-1}\left(
              \lambda T~\Gamma_{ZY}(\theta)+Z'Y\right)\right]
    \end{equation}
\end{subequations}
with:
\[
V(\theta,\lambda) = \lambda T~\Gamma_{ZZ}(\theta) +Z'Z
\]
Not surprisingly, we find that the posterior mean of $\mathbf A$ is a
convex combination of $A^*(\theta)$, the prior mean, and of the OLS
estimate of $\mathbf A$. When $\lambda$ goes to infinity the posterior
mean shrinks towards the prior mean, \textit{ie} the projection of the
DSGE model onto the VAR($p$).\newline We do not have a closed form
expression for the joint posterior density of $\theta $ and $\lambda$
(the second term on the right hand side of \equaref{posterior1}). So
the posterior distribution of $(\theta,\lambda)$ is recovered from an
MCMC algorithm, as described in \cite[appendix B]{DS2004}, except that
we do estimate $\lambda$ as the deep parameters
$\theta$.\footnote{This can be done with
  \href{http://www.cepremap.cnrs.fr/dynare}{Dynare} 4.}}\newline

\par{All in all, this estimation procedure allows to select the
  \textquotedblleft best specification \textquotedblright from a
  continuum of intermediate models indexed by $\lambda$ and ranging
  between the VAR($p$) with diffuse priors and the VAR($p$)
  approximation of the DSGE model. The posterior distribution of the
  deep parameters, $\theta$, can then be interpreted as the best model
  to be used as a prior for the corresponding VAR($p$). In addition,
  the posterior distribution of $\lambda$ gives an indication of the
  reliability of the DSGE model and of the empirical relevance of the
  associated economic restrictions.}\newline

\par{Notice that, when $\lambda$ is closer to its lowest possible
  value, the DSGE-VAR approximates an unrestricted VAR with diffuse
  priors. Given the number of observed variables in our study, such
  VAR has obviously poor empirical performance due to many free
  parameters and associated sampling errors. The econometric
  literature has shown that Bayesian VARs can improve on unrestricted
  VAR by introducing \textquotedblleft Minnesota-like
  \textquotedblright priors for example, which favor persistence, low
  cross-variable interactions and smaller coefficients at distant
  lags. Therefore, the DSGE-VAR approach is bound to call for higher
  $\lambda$ to increase the tightness of priors regarding serial
  correlations for instance which does not necessarily mean that the
  economic restrictions of the DSGE are more consistent with the data.
  A way to circumvent this problem would be to introduce an other
  source of dummy observation coming from a BVAR with some version of
  Minnesota priors and let the procedure infer the relative weight to
  put to the DSGE priors and the \textquotedblleft Minnesota priors
  \textquotedblright. This avenue is left for further
  research.}\newline