2.4.1. doubleml.rdd.RDFlex#

class doubleml.rdd.RDFlex(obj_dml_data, ml_g, ml_m=None, fuzzy=False, cutoff=0, n_folds=5, n_rep=1, h_fs=None, fs_specification='cutoff', fs_kernel='triangular', **kwargs)#

Flexible adjustment with double machine learning for regression discontinuity designs

Parameters:

obj_dml_data (DoubleMLData object) – The DoubleMLData object providing the data and specifying the variables for the causal model.
ml_g (estimator implementing fit() and predict()) – A machine learner implementing fit() and predict() methods and support sample_weights (e.g. sklearn.ensemble.RandomForestRegressor) for the nuisance functions \(g_0^{\pm}(X) = E[Y|\text{score}=\text{cutoff}^{\pm}, X]\). The adjustment function is then defined as \(\eta_0(X) = (g_0^{+}(X) + g_0^{-}(X))/2\).
ml_m (classifier implementing fit() and predict_proba() or None) – A machine learner implementing fit() and predict_proba() methods and support sample_weights (e.g. sklearn.ensemble.RandomForestClassifier) for the nuisance functions \(m_0^{\pm}(X) = E[D|\text{score}=\text{cutoff}^{\pm}, X]\). The adjustment function is then defined as \(\eta_0(X) = (m_0^{+}(X) + m_0^{-}(X))/2\). Or None, in case of a non-fuzzy design. Default is None.
fuzzy (bool) – Indicates whether to fit a fuzzy or a sharp design. That is if the intended treatment defined by the cutoff can diverge from the actual treatment given with obj_dml_data.d. Default is False.
n_folds (int) – Number of folds. Default is 5.
n_rep (int) – Number of repetitions for the sample splitting. Default is 1.
cutoff (float or int) – A float or intspecifying the cutoff in the score. Default is 0.
h_fs (float or None) – Initial bandwidth in the first stage estimation. If None, then the optimal bandwidth without covariates will be used. Default is None.
fs_specification (str) – Specification of the first stage regression. The options are cutoff, cutoff and score and interacted cutoff and score. Default is cutoff.
fs_kernel (str) – Kernel for the first stage estimation. uniform, triangular and epanechnikov are supported. Default is triangular.
**kwargs (kwargs) – Key-worded arguments that are not used within RDFlex but directly handed to rdrobust.

Examples

>>> import numpy as np
>>> import doubleml as dml
>>> from doubleml.rdd.datasets import make_simple_rdd_data
>>> from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
>>> np.random.seed(123)
>>> data_dict = make_simple_rdd_data(fuzzy=True)
>>> obj_dml_data = dml.DoubleMLData.from_arrays(x=data_dict["X"], y=data_dict["Y"], d=data_dict["D"], s=data_dict["score"])
>>> ml_g = RandomForestRegressor()
>>> ml_m = RandomForestClassifier()
>>> rdflex_obj = dml.rdd.RDFlex(obj_dml_data, ml_g, ml_m, fuzzy=True)
>>> print(rdflex_obj.fit())
Method             Coef.     S.E.     t-stat       P>|t|           95% CI
-------------------------------------------------------------------------
Conventional      0.935     0.220     4.244    2.196e-05  [0.503, 1.367]
Robust                 -        -     3.635    2.785e-04  [0.418, 1.396]

Methods

`aggregate_over_splits`()
`confint`([level])	Confidence intervals for RDFlex models.
`fit`([n_iterations])	Estimate RDFlex model.

Attributes

`all_coef`	Estimates of the causal parameter(s) for the `n_rep` different sample splits after calling `fit()`.
`all_se`	Standard errors of the causal parameter(s) for the `n_rep` different sample splits after calling `fit()`.
`ci`	Confidence intervals for the causal parameter(s) after calling `fit()`.
`coef`	Estimates for the causal parameter after calling `fit()`.
`cutoff`	Cutoff at which the treatment effect is estimated.
`fs_kernel`	Kernel for the first stage estimation.
`fuzzy`	Indicates whether the design is fuzzy or not.
`h`	Array of final bandwidths in the last stage estimation (shape (`n_rep`,)).
`h_fs`	Initial bandwidth in the first stage estimation.
`n_folds`	Number of folds.
`n_rep`	Number of repetitions for the sample splitting.
`pval`	p-values for the causal parameter(s) after calling `fit()`.
`se`	Standard errors for the causal parameter(s) after calling `fit()`.
`t_stat`	t-statistics for the causal parameter(s) after calling `fit()`.
`w`	Weights for the first stage estimation.

RDFlex.aggregate_over_splits()#

RDFlex.confint(level=0.95)#

Confidence intervals for RDFlex models.

Parameters:: level (float) – The confidence level. Default is 0.95.
Returns:: df_ci – A data frame with the confidence interval(s).
Return type:: pd.DataFrame

RDFlex.fit(n_iterations=2)#

Estimate RDFlex model.

Parameters:: n_iterations (int) – Number of iterations for the iterative bandwidth fitting. Default is 2.
Returns:: self
Return type:: object