The data generating process is defined as:
Usage
make_ssm_data(
n_obs = 8000,
dim_x = 100,
theta = 1,
mar = TRUE,
return_type = "DoubleMLData"
)
Arguments
- n_obs
(
integer(1)
)
The number of observations to simulate.- dim_x
(
integer(1)
)
The number of covariates.- theta
(
numeric(1)
)
The value of the causal parameter.- mar
(
logical(1)
)
Indicates whether missingness at random holds.- return_type
(
character(1)
)
If"DoubleMLData"
, returns aDoubleMLData
object. If"data.frame"
returns adata.frame()
. If"data.table"
returns adata.table()
. Default is"DoubleMLData"
.
Details
$$ y_i = \theta d_i + x_i' \beta + u_i,$$
$$s_i = 1\lbrace d_i + \gamma z_i + x_i' \beta + v_i > 0 \rbrace,$$
$$d_i = 1\lbrace x_i' \beta + w_i > 0 \rbrace,$$
with \(y_i\) being observed if \(s_i = 1\) and covariates \(x_i \sim \mathcal{N}(0, \Sigma^2_x)\), where
\(\Sigma^2_x\) is a matrix with entries
\(\Sigma_{kj} = 0.5^{|j-k|}\).
\(\beta\) is a dim_x
-vector with entries \(\beta_j=\frac{0.4}{j^2}\)
\(z_i \sim \mathcal{N}(0, 1)\),
\((u_i,v_i) \sim \mathcal{N}(0, \Sigma^2_{u,v})\),
\(w_i \sim \mathcal{N}(0, 1)\).
The data generating process is inspired by a process used in the simulation study (see Appendix E) of Bia, Huber and Lafférs (2023).