3.2.2. doubleml.irm.datasets.make_iivm_data#
- doubleml.irm.datasets.make_iivm_data(n_obs=500, dim_x=20, theta=1.0, alpha_x=0.2, return_type='DoubleMLData')#
- Generates data from a interactive IV regression (IIVM) model. The data generating process is defined as \[ \begin{align}\begin{aligned}d_i &= 1\left\lbrace \alpha_x Z + v_i > 0 \right\rbrace,\\y_i &= \theta d_i + x_i' \beta + u_i,\end{aligned}\end{align} \]- with \(Z \sim \text{Bernoulli}(0.5)\) and \[\begin{split}\left(\begin{matrix} u_i \\ v_i \end{matrix} \right) \sim \mathcal{N}\left(0, \left(\begin{matrix} 1 & 0.3 \\ 0.3 & 1 \end{matrix} \right) \right).\end{split}\]- The covariates \(x_i \sim \mathcal{N}(0, \Sigma)\), where \(\Sigma\) is a matrix with entries \(\Sigma_{kj} = 0.5^{|j-k|}\) and \(\beta\) is a dim_x-vector with entries \(\beta_j=\frac{1}{j^2}\). - The data generating process is inspired by a process used in the simulation experiment of Farbmacher, Gruber and Klaassen (2020). - Parameters:
- n_obs – The number of observations to simulate. 
- dim_x – The number of covariates. 
- theta – The value of the causal parameter. 
- alpha_x – The value of the parameter \(\alpha_x\). 
- return_type – - If - 'DoubleMLData'or- DoubleMLData, returns a- DoubleMLDataobject.- If - 'DataFrame',- 'pd.DataFrame'or- pd.DataFrame, returns a- pd.DataFrame.- If - 'array',- 'np.ndarray',- 'np.array'or- np.ndarray, returns- np.ndarray’s- (x, y, d, z).
 
 - References - Farbmacher, H., Guber, R. and Klaaßen, S. (2020). Instrument Validity Tests with Causal Forests. MEA Discussion Paper No. 13-2020. Available at SSRN: http://dx.doi.org/10.2139/ssrn.3619201. 
 
    
  
  
    