doubleml.rdd.RDFlex#
- class doubleml.rdd.RDFlex(obj_dml_data, ml_g, ml_m=None, fuzzy=False, cutoff=0, n_folds=5, n_rep=1, h_fs=None, fs_specification='cutoff', fs_kernel='triangular', **kwargs)#
Flexible adjustment with double machine learning for regression discontinuity designs
- Parameters:
obj_dml_data (
DoubleMLData
object) – TheDoubleMLData
object providing the data and specifying the variables for the causal model.ml_g (estimator implementing
fit()
andpredict()
) – A machine learner implementingfit()
andpredict()
methods and supportsample_weights
(e.g.sklearn.ensemble.RandomForestRegressor
) for the nuisance functions \(g_0^{\pm}(X) = E[Y|\text{score}=\text{cutoff}^{\pm}, X]\). The adjustment function is then defined as \(\eta_0(X) = (g_0^{+}(X) + g_0^{-}(X))/2\).ml_m (classifier implementing
fit()
andpredict_proba()
or None) – A machine learner implementingfit()
andpredict_proba()
methods and supportsample_weights
(e.g.sklearn.ensemble.RandomForestClassifier
) for the nuisance functions \(m_0^{\pm}(X) = E[D|\text{score}=\text{cutoff}^{\pm}, X]\). The adjustment function is then defined as \(\eta_0(X) = (m_0^{+}(X) + m_0^{-}(X))/2\). Or None, in case of a non-fuzzy design. Default isNone
.fuzzy (bool) – Indicates whether to fit a fuzzy or a sharp design. That is if the intended treatment defined by the cutoff can diverge from the actual treatment given with
obj_dml_data.d
. Default isFalse
.n_folds (int) – Number of folds. Default is
5
.n_rep (int) – Number of repetitons for the sample splitting. Default is
1
.cutoff (float or int) – A float or intspecifying the cutoff in the score. Default is
0
.h_fs (float or None) – Initial bandwidth in the first stage estimation. If
None
, then the optimal bandwidth without covariates will be used. Default isNone
.fs_specification (str) – Specification of the first stage regression. The options are
cutoff
,cutoff and score
andinteracted cutoff and score
. Default iscutoff
.fs_kernel (str) – Kernel for the first stage estimation.
uniform
,triangular
andepanechnikov
are supported. Default istriangular
.**kwargs (kwargs) – Key-worded arguments that are not used within RDFlex but directly handed to rdrobust.
Examples
>>> import numpy as np >>> import doubleml as dml >>> from doubleml.rdd.datasets import make_simple_rdd_data >>> from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier >>> np.random.seed(123) >>> data_dict = make_simple_rdd_data(fuzzy=True) >>> obj_dml_data = dml.DoubleMLData.from_arrays(x=data_dict["X"], y=data_dict["Y"], d=data_dict["D"], s=data_dict["score"]) >>> ml_g = RandomForestRegressor() >>> ml_m = RandomForestClassifier() >>> rdflex_obj = dml.rdd.RDFlex(obj_dml_data, ml_g, ml_m, fuzzy=True) >>> print(rdflex_obj.fit()) Method Coef. S.E. t-stat P>|t| 95% CI ------------------------------------------------------------------------- Conventional 0.935 0.220 4.244 2.196e-05 [0.503, 1.367] Robust - - 3.635 2.785e-04 [0.418, 1.396]
Methods
Attributes
all_coef
Estimates of the causal parameter(s) for the
n_rep
different sample splits after callingfit()
.all_se
Standard errors of the causal parameter(s) for the
n_rep
different sample splits after callingfit()
.ci
Confidence intervals for the causal parameter(s) after calling
fit()
.coef
Estimates for the causal parameter after calling
fit()
.cutoff
Cutoff at which the treatment effect is estimated.
fs_kernel
Kernel for the first stage estimation.
fuzzy
Indicates whether the design is fuzzy or not.
h
Array of final bandwidths in the last stage estimation (shape (
n_rep
,)).h_fs
Initial bandwidth in the first stage estimation.
n_folds
Number of folds.
n_rep
Number of repetitions for the sample splitting.
pval
p-values for the causal parameter(s) after calling
fit()
.se
Standard errors for the causal parameter(s) after calling
fit()
.t_stat
t-statistics for the causal parameter(s) after calling
fit()
.w
Weights for the first stage estimation.
- RDFlex.aggregate_over_splits()#