doubleml.datasets.make_pliv_CHS2015#
- doubleml.datasets.make_pliv_CHS2015(n_obs, alpha=1.0, dim_x=200, dim_z=150, return_type='DoubleMLData')#
Generates data from a partially linear IV regression model used in Chernozhukov, Hansen and Spindler (2015). The data generating process is defined as
\[ \begin{align}\begin{aligned}z_i &= \Pi x_i + \zeta_i,\\d_i &= x_i' \gamma + z_i' \delta + u_i,\\y_i &= \alpha d_i + x_i' \beta + \varepsilon_i,\end{aligned}\end{align} \]with
\[\begin{split}\left(\begin{matrix} \varepsilon_i \\ u_i \\ \zeta_i \\ x_i \end{matrix} \right) \sim \mathcal{N}\left(0, \left(\begin{matrix} 1 & 0.6 & 0 & 0 \\ 0.6 & 1 & 0 & 0 \\ 0 & 0 & 0.25 I_{p_n^z} & 0 \\ 0 & 0 & 0 & \Sigma \end{matrix} \right) \right)\end{split}\]where \(\Sigma\) is a \(p_n^x \times p_n^x\) matrix with entries \(\Sigma_{kj} = 0.5^{|j-k|}\) and \(I_{p_n^z}\) is the \(p_n^z \times p_n^z\) identity matrix. \(\beta = \gamma\) is a \(p_n^x\)-vector with entries \(\beta_j=\frac{1}{j^2}\), \(\delta\) is a \(p_n^z\)-vector with entries \(\delta_j=\frac{1}{j^2}\) and \(\Pi = (I_{p_n^z}, 0_{p_n^z \times (p_n^x - p_n^z)})\).
- Parameters:
n_obs – The number of observations to simulate.
alpha – The value of the causal parameter.
dim_x – The number of covariates.
dim_z – The number of instruments.
return_type –
If
'DoubleMLData'
orDoubleMLData
, returns aDoubleMLData
object.If
'DataFrame'
,'pd.DataFrame'
orpd.DataFrame
, returns apd.DataFrame
.If
'array'
,'np.ndarray'
,'np.array'
ornp.ndarray
, returnsnp.ndarray
’s(x, y, d, z)
.
References
Chernozhukov, V., Hansen, C. and Spindler, M. (2015), Post-Selection and Post-Regularization Inference in Linear Models with Many Controls and Instruments. American Economic Review: Papers and Proceedings, 105 (5): 486-90.