3.2.1. doubleml.irm.datasets.make_irm_data#
- doubleml.irm.datasets.make_irm_data(n_obs=500, dim_x=20, theta=0, R2_d=0.5, R2_y=0.5, return_type='DoubleMLData')#
- Generates data from a interactive regression (IRM) model. The data generating process is defined as \[ \begin{align}\begin{aligned}d_i &= 1\left\lbrace \frac{\exp(c_d x_i' \beta)}{1+\exp(c_d x_i' \beta)} > v_i \right\rbrace, & &v_i \sim \mathcal{U}(0,1),\\y_i &= \theta d_i + c_y x_i' \beta d_i + \zeta_i, & &\zeta_i \sim \mathcal{N}(0,1),\end{aligned}\end{align} \]- with covariates \(x_i \sim \mathcal{N}(0, \Sigma)\), where \(\Sigma\) is a matrix with entries \(\Sigma_{kj} = 0.5^{|j-k|}\). \(\beta\) is a dim_x-vector with entries \(\beta_j=\frac{1}{j^2}\) and the constants \(c_y\) and \(c_d\) are given by \[c_y = \sqrt{\frac{R_y^2}{(1-R_y^2) \beta' \Sigma \beta}}, \qquad c_d = \sqrt{\frac{(\pi^2 /3) R_d^2}{(1-R_d^2) \beta' \Sigma \beta}}.\]- The data generating process is inspired by a process used in the simulation experiment (see Appendix P) of Belloni et al. (2017). - Parameters:
- n_obs – The number of observations to simulate. 
- dim_x – The number of covariates. 
- theta – The value of the causal parameter. 
- R2_d – The value of the parameter \(R_d^2\). 
- R2_y – The value of the parameter \(R_y^2\). 
- return_type – - If - 'DoubleMLData'or- DoubleMLData, returns a- DoubleMLDataobject.- If - 'DataFrame',- 'pd.DataFrame'or- pd.DataFrame, returns a- pd.DataFrame.- If - 'array',- 'np.ndarray',- 'np.array'or- np.ndarray, returns- np.ndarray’s- (x, y, d).
 
 - References - Belloni, A., Chernozhukov, V., Fernández‐Val, I. and Hansen, C. (2017). Program Evaluation and Causal Inference With High‐Dimensional Data. Econometrica, 85: 233-298. 
 
    
  
  
    