3.2.4. doubleml.irm.datasets.make_irm_data_discrete_treatments#
- doubleml.irm.datasets.make_irm_data_discrete_treatments(n_obs=200, n_levels=3, linear=False, random_state=None, **kwargs)#
- Generates data from a interactive regression (IRM) model with multiple treatment levels (based on an underlying continous treatment). - The data generating process is defined as follows (similar to the Monte Carlo simulation used in Sant’Anna and Zhao (2020)). - Let \(X= (X_1, X_2, X_3, X_4, X_5)^T \sim \mathcal{N}(0, \Sigma)\), where \(\Sigma\) corresponds to the identity matrix. Further, define \(Z_j = (\tilde{Z_j} - \mathbb{E}[\tilde{Z}_j]) / \sqrt{\text{Var}(\tilde{Z}_j)}\), where \[ \begin{align}\begin{aligned}\tilde{Z}_1 &= \exp(0.5 \cdot X_1)\\\tilde{Z}_2 &= 10 + X_2/(1 + \exp(X_1))\\\tilde{Z}_3 &= (0.6 + X_1 \cdot X_3 / 25)^3\\\tilde{Z}_4 &= (20 + X_2 + X_4)^2\\\tilde{Z}_5 &= X_5.\end{aligned}\end{align} \]- A continuous treatment \(D_{\text{cont}}\) is generated as \[D_{\text{cont}} = \xi (-Z_1 + 0.5 Z_2 - 0.25 Z_3 - 0.1 Z_4) + \varepsilon_D,\]- where \(\varepsilon_D \sim \mathcal{N}(0,1)\) and \(\xi=0.3\). The corresponding treatment effect is defined as \[\theta (d) = 0.1 \exp(d) + 10 \sin(0.7 d) + 2 d - 0.2 d^2.\]- Based on the continous treatment, a discrete treatment \(D\) is generated as with a baseline level of \(D=0\) and additional levels based on the quantiles of \(D_{\text{cont}}\). The number of levels is defined by \(n_{\text{levels}}\). Each level is chosen to have the same probability of being selected. - The potential outcomes are defined as \[ \begin{align}\begin{aligned}Y(0) &= 210 + 27.4 Z_1 + 13.7 (Z_2 + Z_3 + Z_4) + \varepsilon_Y\\Y(1) &= \theta (D_{\text{cont}}) 1\{D_{\text{cont}} > 0\} + Y(0),\end{aligned}\end{align} \]- where \(\varepsilon_Y \sim \mathcal{N}(0,5)\). Further, the observed outcome is defined as \[Y = Y(1) 1\{D > 0\} + Y(0) 1\{D = 0\}.\]- The data is returned as a dictionary with the entries - x,- y,- dand- oracle_values.- Parameters:
- Returns:
- res_dict – Dictionary with entries - x,- y,- dand- oracle_values. The oracle values contain the continuous treatment, the level bounds, the potential level, ITE and the potential outcome without treatment.
- Return type:
- dictionary 
 
 
    
  
  
    