doubleml.rdd.RDFlex#

class doubleml.rdd.RDFlex(obj_dml_data, ml_g, ml_m=None, fuzzy=False, cutoff=0, n_folds=5, n_rep=1, h_fs=None, fs_specification='cutoff', fs_kernel='triangular', **kwargs)#

Flexible adjustment with double machine learning for regression discontinuity designs

Parameters:
  • obj_dml_data (DoubleMLData object) – The DoubleMLData object providing the data and specifying the variables for the causal model.

  • ml_g (estimator implementing fit() and predict()) – A machine learner implementing fit() and predict() methods and support sample_weights (e.g. sklearn.ensemble.RandomForestRegressor) for the nuisance functions \(g_0^{\pm}(X) = E[Y|\text{score}=\text{cutoff}^{\pm}, X]\). The adjustment function is then defined as \(\eta_0(X) = (g_0^{+}(X) + g_0^{-}(X))/2\).

  • ml_m (classifier implementing fit() and predict_proba() or None) – A machine learner implementing fit() and predict_proba() methods and support sample_weights (e.g. sklearn.ensemble.RandomForestClassifier) for the nuisance functions \(m_0^{\pm}(X) = E[D|\text{score}=\text{cutoff}^{\pm}, X]\). The adjustment function is then defined as \(\eta_0(X) = (m_0^{+}(X) + m_0^{-}(X))/2\). Or None, in case of a non-fuzzy design. Default is None.

  • fuzzy (bool) – Indicates whether to fit a fuzzy or a sharp design. That is if the intended treatment defined by the cutoff can diverge from the actual treatment given with obj_dml_data.d. Default is False.

  • n_folds (int) – Number of folds. Default is 5.

  • n_rep (int) – Number of repetitons for the sample splitting. Default is 1.

  • cutoff (float or int) – A float or intspecifying the cutoff in the score. Default is 0.

  • h_fs (float or None) – Initial bandwidth in the first stage estimation. If None, then the optimal bandwidth without covariates will be used. Default is None.

  • fs_specification (str) – Specification of the first stage regression. The options are cutoff, cutoff and score and interacted cutoff and score. Default is cutoff.

  • fs_kernel (str) – Kernel for the first stage estimation. uniform, triangular and epanechnikov are supported. Default is triangular.

  • **kwargs (kwargs) – Key-worded arguments that are not used within RDFlex but directly handed to rdrobust.

Examples

>>> import numpy as np
>>> import doubleml as dml
>>> from doubleml.rdd.datasets import make_simple_rdd_data
>>> from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
>>> np.random.seed(123)
>>> data_dict = make_simple_rdd_data(fuzzy=True)
>>> obj_dml_data = dml.DoubleMLData.from_arrays(x=data_dict["X"], y=data_dict["Y"], d=data_dict["D"], s=data_dict["score"])
>>> ml_g = RandomForestRegressor()
>>> ml_m = RandomForestClassifier()
>>> rdflex_obj = dml.rdd.RDFlex(obj_dml_data, ml_g, ml_m, fuzzy=True)
>>> print(rdflex_obj.fit())
Method             Coef.     S.E.     t-stat       P>|t|           95% CI
-------------------------------------------------------------------------
Conventional      0.935     0.220     4.244    2.196e-05  [0.503, 1.367]
Robust                 -        -     3.635    2.785e-04  [0.418, 1.396]

Methods

aggregate_over_splits()

confint([level])

Confidence intervals for RDFlex models.

fit([n_iterations])

Estimate RDFlex model.

Attributes

all_coef

Estimates of the causal parameter(s) for the n_rep different sample splits after calling fit().

all_se

Standard errors of the causal parameter(s) for the n_rep different sample splits after calling fit().

ci

Confidence intervals for the causal parameter(s) after calling fit().

coef

Estimates for the causal parameter after calling fit().

cutoff

Cutoff at which the treatment effect is estimated.

fs_kernel

Kernel for the first stage estimation.

fuzzy

Indicates whether the design is fuzzy or not.

h

Array of final bandwidths in the last stage estimation (shape (n_rep,)).

h_fs

Initial bandwidth in the first stage estimation.

n_folds

Number of folds.

n_rep

Number of repetitions for the sample splitting.

pval

p-values for the causal parameter(s) after calling fit().

se

Standard errors for the causal parameter(s) after calling fit().

t_stat

t-statistics for the causal parameter(s) after calling fit().

w

Weights for the first stage estimation.

RDFlex.aggregate_over_splits()#
RDFlex.confint(level=0.95)#

Confidence intervals for RDFlex models.

Parameters:

level (float) – The confidence level. Default is 0.95.

Returns:

df_ci – A data frame with the confidence interval(s).

Return type:

pd.DataFrame

RDFlex.fit(n_iterations=2)#

Estimate RDFlex model.

Parameters:

n_iterations (int) – Number of iterations for the iterative bandwidth fitting. Default is 2.

Returns:

self

Return type:

object