3.2.14. doubleml.rdd.datasets.make_simple_rdd_data#
- doubleml.rdd.datasets.make_simple_rdd_data(n_obs=5000, p=4, fuzzy=True, binary_outcome=False, **kwargs)#
- Generates synthetic data for a regression discontinuity design (RDD) analysis. The data generating process is defined as \[ \begin{align}\begin{aligned}Y_0 &= g_0 + g_{cov} + \epsilon_0,\\Y_1 &= g_1 + g_{cov} + \epsilon_1,\\g_0 &= 0.1 \cdot \text{score}^2,\\g_1 &= \tau + 0.1 \cdot score^2 - 0.5 \cdot score^2 + a \sum_{i=1}^{\text{dim}_x} X_i \cdot score,\\g_{cov} &= \sum_{i=1}^{\text{dim}_x} \text{Polynomial}(X_i),\end{aligned}\end{align} \]- with random noise \(\epsilon_0, \epsilon_1 \sim \mathcal{N}(0, 0.2^2)\) and \(X_i\) being drawn independently from a uniform distribution. - Parameters:
- n_obs (int) – Number of observations to generate. Default is 5000. 
- p (int) – Degree of the polynomial for covariates. Default is 4. If zero, no covariate effect is considered. 
- fuzzy (bool) – If True, generates data for a fuzzy RDD. Default is True. 
- binary_outcome (bool) – If True, generates binary outcomes based on a logistic transformation. Default is False. 
- **kwargs (Additional keyword arguments.) – - cutofffloat
- The cutoff value for the score. Default is 0.0. 
- dim_xint
- The number of independent covariates. Default is 3. 
- afloat
- Factor to control interaction of score and covariates in the outcome equation. Default is 0.0. 
- taufloat
- Parameter to control the true effect in the generated data at the given cutoff. Default is 1.0. 
 
 
- Returns:
- res_dict – Dictionary with entries - score,- X,- Y,- D, and- oracle_values. The oracle values contain the potential outcomes.
- Return type:
- dictionary 
 
 
    
  
  
    