Processing math: 100%
Skip to contents

Generates data from a interactive regression (IRM) model. The data generating process is defined as

di=1{exp(cdxiβ)1+exp(cdxiβ)>vi},

yi=θdi+cyxiβdi+ζi,

with viU(0,1), ζiN(0,1) and covariates xiN(0,Σ), where Σ is a matrix with entries Σkj=0.5|jk|. β is a dim_x-vector with entries βj=1j2 and the constancts cy and cd are given by

cy=R2y(1R2y)βΣβ,

cd=(π2/3)R2d(1R2d)βΣβ.

The data generating process is inspired by a process used in the simulation experiment (see Appendix P) of Belloni et al. (2017).

Usage

make_irm_data(
  n_obs = 500,
  dim_x = 20,
  theta = 0,
  R2_d = 0.5,
  R2_y = 0.5,
  return_type = "DoubleMLData"
)

Arguments

n_obs

(integer(1))
The number of observations to simulate.

dim_x

(integer(1))
The number of covariates.

theta

(numeric(1))
The value of the causal parameter.

R2_d

(numeric(1))
The value of the parameter R2d.

R2_y

(numeric(1))
The value of the parameter R2y.

return_type

(character(1))
If "DoubleMLData", returns a DoubleMLData object. If "data.frame" returns a data.frame(). If "data.table" returns a data.table(). If "matrix" a named list() with entries X, y, d and z is returned. Every entry in the list is a matrix() object. Default is "DoubleMLData".

References

Belloni, A., Chernozhukov, V., Fernández-Val, I. and Hansen, C. (2017). Program Evaluation and Causal Inference With High-Dimensional Data. Econometrica, 85: 233-298.