doubleml.rdd.datasets.make_simple_rdd_data#
- doubleml.rdd.datasets.make_simple_rdd_data(n_obs=5000, p=4, fuzzy=True, binary_outcome=False, **kwargs)#
Generates synthetic data for a regression discontinuity design (RDD) analysis. The data generating process is defined as
\[ \begin{align}\begin{aligned}Y_0 &= g_0 + g_{cov} + \epsilon_0,\\Y_1 &= g_1 + g_{cov} + \epsilon_1,\\g_0 &= 0.1 \cdot \text{score}^2,\\g_1 &= \tau + 0.1 \cdot score^2 - 0.5 \cdot score^2 + a \sum_{i=1}^{\text{dim}_x} X_i \cdot score,\\g_{cov} &= \sum_{i=1}^{\text{dim}_x} \text{Polynomial}(X_i),\end{aligned}\end{align} \]with random noise \(\epsilon_0, \epsilon_1 \sim \mathcal{N}(0, 0.2^2)\) and \(X_i\) being drawn independently from a uniform distribution.
- Parameters:
n_obs (int) – Number of observations to generate. Default is 5000.
p (int) – Degree of the polynomial for covariates. Default is 4. If zero, no covariate effect is considered.
fuzzy (bool) – If True, generates data for a fuzzy RDD. Default is True.
binary_outcome (bool) – If True, generates binary outcomes based on a logistic transformation. Default is False.
**kwargs (Additional keyword arguments.) –
- cutofffloat
The cutoff value for the score. Default is 0.0.
- dim_xint
The number of independent covariates. Default is 3.
- afloat
Factor to control interaction of score and covariates in the outcome equation. Default is 0.0.
- taufloat
Parameter to control the true effect in the generated data at the given cutoff. Default is 1.0.
- Returns:
res_dict – Dictionary with entries
score
,X
,Y
,D
, andoracle_values
. The oracle values contain the potential outcomes.- Return type:
dictionary