3.2.3. doubleml.irm.datasets.make_heterogeneous_data#
- doubleml.irm.datasets.make_heterogeneous_data(n_obs=200, p=30, support_size=5, n_x=1, binary_treatment=False)#
- Creates a simple synthetic example for heterogeneous treatment effects. The data generating process is based on the Monte Carlo simulation from Oprescu et al. (2019). - The data is generated as \[ \begin{align}\begin{aligned}Y_i & = \theta_0(X_i)D_i + \langle X_i,\gamma_0\rangle + \epsilon_i\\D_i & = \langle X_i,\beta_0\rangle + \eta_i,\end{aligned}\end{align} \]- where \(X_i\sim\mathcal{U}[0,1]^{p}\) and \(\epsilon_i,\eta_i \sim\mathcal{U}[-1,1]\). If the treatment is set to be binary, the treatment is generated as \[D_i = 1\{\langle X_i,\beta_0\rangle \ge \eta_i\}.\]- The coefficient vectors \(\gamma_0\) and \(\beta_0\) both have small random (identical) support which values are drawn independently from \(\mathcal{U}[0,1]\) and \(\mathcal{U}[0,0.3]\). Further, \(\theta_0(x)\) defines the conditional treatment effect, which is defined differently depending on the dimension of \(x\). - If the heterogeneity is univariate the conditional treatment effect takes the following form \[\theta_0(x) = \exp(2x_0) + 3\sin(4x_0),\]- whereas for the two-dimensional case the conditional treatment effect is defined as \[\theta_0(x) = \exp(2x_0) + 3\sin(4x_1).\]- Parameters:
- n_obs (int) – Number of observations to simulate. Default is - 200.
- p (int) – Dimension of covariates. Default is - 30.
- support_size (int) – Number of relevant (confounding) covariates. Default is - 5.
- n_x (int) – Dimension of the heterogeneity. Can be either - 1or- 2. Default is- 1.
- binary_treatment (bool) – Indicates whether the treatment is binary. Default is - False.
 
- Returns:
- res_dict – Dictionary with entries - data,- effects,- treatment_effect.
- Return type:
- dictionary 
 
 
    
  
  
    