doubleml.datasets.make_pliv_CHS2015#

doubleml.datasets.make_pliv_CHS2015(n_obs, alpha=1.0, dim_x=200, dim_z=150, return_type='DoubleMLData')#

Generates data from a partially linear IV regression model used in Chernozhukov, Hansen and Spindler (2015). The data generating process is defined as

\[ \begin{align}\begin{aligned}z_i &= \Pi x_i + \zeta_i,\\d_i &= x_i' \gamma + z_i' \delta + u_i,\\y_i &= \alpha d_i + x_i' \beta + \varepsilon_i,\end{aligned}\end{align} \]

with

\[\begin{split}\left(\begin{matrix} \varepsilon_i \\ u_i \\ \zeta_i \\ x_i \end{matrix} \right) \sim \mathcal{N}\left(0, \left(\begin{matrix} 1 & 0.6 & 0 & 0 \\ 0.6 & 1 & 0 & 0 \\ 0 & 0 & 0.25 I_{p_n^z} & 0 \\ 0 & 0 & 0 & \Sigma \end{matrix} \right) \right)\end{split}\]

where \(\Sigma\) is a \(p_n^x \times p_n^x\) matrix with entries \(\Sigma_{kj} = 0.5^{|j-k|}\) and \(I_{p_n^z}\) is the \(p_n^z \times p_n^z\) identity matrix. \(\beta = \gamma\) is a \(p_n^x\)-vector with entries \(\beta_j=\frac{1}{j^2}\), \(\delta\) is a \(p_n^z\)-vector with entries \(\delta_j=\frac{1}{j^2}\) and \(\Pi = (I_{p_n^z}, 0_{p_n^z \times (p_n^x - p_n^z)})\).

Parameters:
  • n_obs – The number of observations to simulate.

  • alpha – The value of the causal parameter.

  • dim_x – The number of covariates.

  • dim_z – The number of instruments.

  • return_type

    If 'DoubleMLData' or DoubleMLData, returns a DoubleMLData object.

    If 'DataFrame', 'pd.DataFrame' or pd.DataFrame, returns a pd.DataFrame.

    If 'array', 'np.ndarray', 'np.array' or np.ndarray, returns np.ndarray’s (x, y, d, z).

References

Chernozhukov, V., Hansen, C. and Spindler, M. (2015), Post-Selection and Post-Regularization Inference in Linear Models with Many Controls and Instruments. American Economic Review: Papers and Proceedings, 105 (5): 486-90.