# doubleml.datasets.make_pliv_CHS2015#

doubleml.datasets.make_pliv_CHS2015(n_obs, alpha=1.0, dim_x=200, dim_z=150, return_type='DoubleMLData')#

Generates data from a partially linear IV regression model used in Chernozhukov, Hansen and Spindler (2015). The data generating process is defined as

\begin{align}\begin{aligned}z_i &= \Pi x_i + \zeta_i,\\d_i &= x_i' \gamma + z_i' \delta + u_i,\\y_i &= \alpha d_i + x_i' \beta + \varepsilon_i,\end{aligned}\end{align}

with

$\begin{split}\left(\begin{matrix} \varepsilon_i \\ u_i \\ \zeta_i \\ x_i \end{matrix} \right) \sim \mathcal{N}\left(0, \left(\begin{matrix} 1 & 0.6 & 0 & 0 \\ 0.6 & 1 & 0 & 0 \\ 0 & 0 & 0.25 I_{p_n^z} & 0 \\ 0 & 0 & 0 & \Sigma \end{matrix} \right) \right)\end{split}$

where $$\Sigma$$ is a $$p_n^x \times p_n^x$$ matrix with entries $$\Sigma_{kj} = 0.5^{|j-k|}$$ and $$I_{p_n^z}$$ is the $$p_n^z \times p_n^z$$ identity matrix. $$\beta = \gamma$$ is a $$p_n^x$$-vector with entries $$\beta_j=\frac{1}{j^2}$$, $$\delta$$ is a $$p_n^z$$-vector with entries $$\delta_j=\frac{1}{j^2}$$ and $$\Pi = (I_{p_n^z}, 0_{p_n^z \times (p_n^x - p_n^z)})$$.

Parameters
• n_obs – The number of observations to simulate.

• alpha – The value of the causal parameter.

• dim_x – The number of covariates.

• dim_z – The number of instruments.

• return_type

If 'DoubleMLData' or DoubleMLData, returns a DoubleMLData object.

If 'DataFrame', 'pd.DataFrame' or pd.DataFrame, returns a pd.DataFrame.

If 'array', 'np.ndarray', 'np.array' or np.ndarray, returns np.ndarray’s (x, y, d, z).

References

Chernozhukov, V., Hansen, C. and Spindler, M. (2015), Post-Selection and Post-Regularization Inference in Linear Models with Many Controls and Instruments. American Economic Review: Papers and Proceedings, 105 (5): 486-90.