3.2.14. doubleml.rdd.datasets.make_simple_rdd_data#

doubleml.rdd.datasets.make_simple_rdd_data(n_obs=5000, p=4, fuzzy=True, binary_outcome=False, **kwargs)#

Generates synthetic data for a regression discontinuity design (RDD) analysis. The data generating process is defined as

\[ \begin{align}\begin{aligned}Y_0 &= g_0 + g_{cov} + \epsilon_0,\\Y_1 &= g_1 + g_{cov} + \epsilon_1,\\g_0 &= 0.1 \cdot \text{score}^2,\\g_1 &= \tau + 0.1 \cdot score^2 - 0.5 \cdot score^2 + a \sum_{i=1}^{\text{dim}_x} X_i \cdot score,\\g_{cov} &= \sum_{i=1}^{\text{dim}_x} \text{Polynomial}(X_i),\end{aligned}\end{align} \]

with random noise \(\epsilon_0, \epsilon_1 \sim \mathcal{N}(0, 0.2^2)\) and \(X_i\) being drawn independently from a uniform distribution.

Parameters:

n_obs (int) – Number of observations to generate. Default is 5000.
p (int) – Degree of the polynomial for covariates. Default is 4. If zero, no covariate effect is considered.
fuzzy (bool) – If True, generates data for a fuzzy RDD. Default is True.
binary_outcome (bool) – If True, generates binary outcomes based on a logistic transformation. Default is False.
**kwargs (Additional keyword arguments.) –

cutofffloat
The cutoff value for the score. Default is 0.0.

dim_xint
The number of independent covariates. Default is 3.

afloat
Factor to control interaction of score and covariates in the outcome equation. Default is 0.0.

taufloat
Parameter to control the true effect in the generated data at the given cutoff. Default is 1.0.

Returns:

res_dict – Dictionary with entries score, X, Y, D, and oracle_values. The oracle values contain the potential outcomes.

Return type:

dictionary