doubleml.rdd.datasets.make_simple_rdd_data#

doubleml.rdd.datasets.make_simple_rdd_data(n_obs=5000, p=4, fuzzy=True, binary_outcome=False, **kwargs)#

Generates synthetic data for a regression discontinuity design (RDD) analysis.

\[Y_0 &= g_0 + g_{cov} + \epsilon_0 \ Y_1 &= g_1 + g_{cov} + \epsilon_1 \ g_0 &= 0.1 \cdot \text{score}^2 \ g_1 &= au + 0.1 \cdot \text{score}^2 - 0.5 \cdot \text{score}^2 \ g_{cov} &= \sum_{i=1}^{ ext{dim\_x}} ext{Polynomial}(X_i) \ \epsilon_0, \epsilon_1 &\sim \mathcal{N}(0, 0.2^2)\]
Parameters:
  • n_obs (int) – Number of observations to generate. Default is 5000.

  • p (int) – Degree of the polynomial for covariates. Default is 4.

  • fuzzy (bool) – If True, generates data for a fuzzy RDD. Default is True.

  • binary_outcome (bool) – If True, generates binary outcomes. Default is False.

  • **kwargs (Additional keyword arguments.) –

    cutofffloat

    The cutoff value for the score. Default is 0.0.

    dim_xint

    The number of independent covariates. Default is 3.

    afloat

    Factor to control interaction of score and covariates to the outcome equation. Default is 0.0.

    taufloat

    Parameter to control the true effect in the generated data at the given cutoff. Default is 1.0.

Returns:

dict – ‘score’ (np.ndarray): The running variable. ‘X’ (np.ndarray): The independent covariates. ‘Y0’ (np.ndarray): The potential outcomes without treatment. ‘Y1’ (np.ndarray): The potential outcomes with treatment. ‘intended_treatment’ (np.ndarray): The intended treatment assignment.

Return type:

A dictionary containing the generated data with keys: