doubleml.rdd.datasets.make_simple_rdd_data#

doubleml.rdd.datasets.make_simple_rdd_data(n_obs=5000, p=4, fuzzy=True, binary_outcome=False, **kwargs)#

Generates synthetic data for a regression discontinuity design (RDD) analysis. The data generating process is defined as

\[ \begin{align}\begin{aligned}Y_0 &= g_0 + g_{cov} + \epsilon_0,\\Y_1 &= g_1 + g_{cov} + \epsilon_1,\\g_0 &= 0.1 \cdot \text{score}^2,\\g_1 &= \tau + 0.1 \cdot score^2 - 0.5 \cdot score^2 + a \sum_{i=1}^{\text{dim}_x} X_i \cdot score,\\g_{cov} &= \sum_{i=1}^{\text{dim}_x} \text{Polynomial}(X_i),\end{aligned}\end{align} \]

with random noise \(\epsilon_0, \epsilon_1 \sim \mathcal{N}(0, 0.2^2)\) and \(X_i\) being drawn independently from a uniform distribution.

Parameters:
  • n_obs (int) – Number of observations to generate. Default is 5000.

  • p (int) – Degree of the polynomial for covariates. Default is 4. If zero, no covariate effect is considered.

  • fuzzy (bool) – If True, generates data for a fuzzy RDD. Default is True.

  • binary_outcome (bool) – If True, generates binary outcomes based on a logistic transformation. Default is False.

  • **kwargs (Additional keyword arguments.) –

    cutofffloat

    The cutoff value for the score. Default is 0.0.

    dim_xint

    The number of independent covariates. Default is 3.

    afloat

    Factor to control interaction of score and covariates in the outcome equation. Default is 0.0.

    taufloat

    Parameter to control the true effect in the generated data at the given cutoff. Default is 1.0.

Returns:

res_dict – Dictionary with entries score, X, Y, D, and oracle_values. The oracle values contain the potential outcomes.

Return type:

dictionary