doubleml.datasets.make_irm_data_discrete_treatments#
- doubleml.datasets.make_irm_data_discrete_treatments(n_obs=200, n_levels=3, linear=False, random_state=None, **kwargs)#
Generates data from a interactive regression (IRM) model with multiple treatment levels (based on an underlying continous treatment).
The data generating process is defined as follows (similar to the Monte Carlo simulation used in Sant’Anna and Zhao (2020)).
Let \(X= (X_1, X_2, X_3, X_4, X_5)^T \sim \mathcal{N}(0, \Sigma)\), where \(\Sigma\) corresponds to the identity matrix. Further, define \(Z_j = (\tilde{Z_j} - \mathbb{E}[\tilde{Z}_j]) / \sqrt{\text{Var}(\tilde{Z}_j)}\), where
\[ \begin{align}\begin{aligned}\tilde{Z}_1 &= \exp(0.5 \cdot X_1)\\\tilde{Z}_2 &= 10 + X_2/(1 + \exp(X_1))\\\tilde{Z}_3 &= (0.6 + X_1 \cdot X_3 / 25)^3\\\tilde{Z}_4 &= (20 + X_2 + X_4)^2\\\tilde{Z}_5 &= X_5.\end{aligned}\end{align} \]A continuous treatment \(D_{\text{cont}}\) is generated as
\[D_{\text{cont}} = \xi (-Z_1 + 0.5 Z_2 - 0.25 Z_3 - 0.1 Z_4) + \varepsilon_D,\]where \(\varepsilon_D \sim \mathcal{N}(0,1)\) and \(\xi=0.3\). The corresponding treatment effect is defined as
\[\theta (d) = 0.1 \exp(d) + 10 \sin(0.7 d) + 2 d - 0.2 d^2.\]Based on the continous treatment, a discrete treatment \(D\) is generated as with a baseline level of \(D=0\) and additional levels based on the quantiles of \(D_{\text{cont}}\). The number of levels is defined by \(n_{\text{levels}}\). Each level is chosen to have the same probability of being selected.
The potential outcomes are defined as
\[ \begin{align}\begin{aligned}Y(0) &= 210 + 27.4 Z_1 + 13.7 (Z_2 + Z_3 + Z_4) + \varepsilon_Y\\Y(1) &= \theta (D_{\text{cont}}) 1\{D_{\text{cont}} > 0\} + Y(0),\end{aligned}\end{align} \]where \(\varepsilon_Y \sim \mathcal{N}(0,5)\). Further, the observed outcome is defined as
\[Y = Y(1) 1\{D > 0\} + Y(0) 1\{D = 0\}.\]The data is returned as a dictionary with the entries
x
,y
,d
andoracle_values
.- Parameters:
- Returns:
res_dict – Dictionary with entries
x
,y
,d
andoracle_values
. The oracle values contain the continuous treatment, the level bounds, the potential level, ITE and the potential outcome without treatment.- Return type:
dictionary