Generates data from a interactive regression (IRM) model. The data generating process is defined as
\(d_i = 1\left\lbrace \frac{\exp(c_d x_i' \beta)}{1+\exp(c_d x_i' \beta)} > v_i \right\rbrace,\)
\( y_i = \theta d_i + c_y x_i' \beta d_i + \zeta_i,\)
with \(v_i \sim \mathcal{U}(0,1)\), \(\zeta_i \sim \mathcal{N}(0,1)\)
and covariates \(x_i \sim \mathcal{N}(0, \Sigma)\), where \(\Sigma\)
is a matrix with entries \(\Sigma_{kj} = 0.5^{|j-k|}\).
\(\beta\) is a dim_x
-vector with entries \(\beta_j = \frac{1}{j^2}\)
and the constancts \(c_y\) and \(c_d\) are given by
\( c_y = \sqrt{\frac{R_y^2}{(1-R_y^2) \beta' \Sigma \beta}},\)
\(c_d = \sqrt{\frac{(\pi^2 /3) R_d^2}{(1-R_d^2) \beta' \Sigma \beta}}.\)
The data generating process is inspired by a process used in the simulation experiment (see Appendix P) of Belloni et al. (2017).
Usage
make_irm_data(
n_obs = 500,
dim_x = 20,
theta = 0,
R2_d = 0.5,
R2_y = 0.5,
return_type = "DoubleMLData"
)
Arguments
- n_obs
(
integer(1)
)
The number of observations to simulate.- dim_x
(
integer(1)
)
The number of covariates.- theta
(
numeric(1)
)
The value of the causal parameter.- R2_d
(
numeric(1)
)
The value of the parameter \(R_d^2\).- R2_y
(
numeric(1)
)
The value of the parameter \(R_y^2\).- return_type
(
character(1)
)
If"DoubleMLData"
, returns aDoubleMLData
object. If"data.frame"
returns adata.frame()
. If"data.table"
returns adata.table()
. If"matrix"
a namedlist()
with entriesX
,y
,d
andz
is returned. Every entry in the list is amatrix()
object. Default is"DoubleMLData"
.