2.4.1. doubleml.rdd.RDFlex#
- class doubleml.rdd.RDFlex(obj_dml_data, ml_g, ml_m=None, fuzzy=False, cutoff=0, n_folds=5, n_rep=1, h_fs=None, fs_specification='cutoff', fs_kernel='triangular', **kwargs)#
Flexible adjustment with double machine learning for regression discontinuity designs
- Parameters:
obj_dml_data (
DoubleMLRDDDataobject) – TheDoubleMLRDDDataobject providing the data and specifying the variables for the causal model.ml_g (estimator implementing
fit()andpredict()) – A machine learner implementingfit()andpredict()methods and supportsample_weights(e.g.sklearn.ensemble.RandomForestRegressor) for the nuisance functions \(g_0^{\pm}(X) = E[Y|\text{score}=\text{cutoff}^{\pm}, X]\). The adjustment function is then defined as \(\eta_0(X) = (g_0^{+}(X) + g_0^{-}(X))/2\).ml_m (classifier implementing
fit()andpredict_proba()or None) – A machine learner implementingfit()andpredict_proba()methods and supportsample_weights(e.g.sklearn.ensemble.RandomForestClassifier) for the nuisance functions \(m_0^{\pm}(X) = E[D|\text{score}=\text{cutoff}^{\pm}, X]\). The adjustment function is then defined as \(\eta_0(X) = (m_0^{+}(X) + m_0^{-}(X))/2\). Or None, in case of a non-fuzzy design. Default isNone.fuzzy (bool) – Indicates whether to fit a fuzzy or a sharp design. That is if the intended treatment defined by the cutoff can diverge from the actual treatment given with
obj_dml_data.d. Default isFalse.n_folds (int) – Number of folds. Default is
5.n_rep (int) – Number of repetitions for the sample splitting. Default is
1.cutoff (float or int) – A float or intspecifying the cutoff in the score. Default is
0.h_fs (float or None) – Initial bandwidth in the first stage estimation. If
None, then the optimal bandwidth without covariates will be used. Default isNone.fs_specification (str) – Specification of the first stage regression. The options are
cutoff,cutoff and scoreandinteracted cutoff and score. Default iscutoff.fs_kernel (str) – Kernel for the first stage estimation.
uniform,triangularandepanechnikovare supported. Default istriangular.**kwargs (kwargs) – Key-worded arguments that are not used within RDFlex but directly handed to rdrobust.
Examples
>>> import numpy as np >>> import doubleml as dml >>> from doubleml.rdd.datasets import make_simple_rdd_data >>> from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier >>> np.random.seed(123) >>> data_dict = make_simple_rdd_data(fuzzy=True) >>> obj_dml_data = dml.DoubleMLRDDData.from_arrays( ... x=data_dict["X"], ... y=data_dict["Y"], ... d=data_dict["D"], ... s=data_dict["score"] ... ) >>> ml_g = RandomForestRegressor() >>> ml_m = RandomForestClassifier() >>> rdflex_obj = dml.rdd.RDFlex(obj_dml_data, ml_g, ml_m, fuzzy=True) >>> print(rdflex_obj.fit()) Method Coef. S.E. t-stat P>|t| 95% CI ------------------------------------------------------------------------- Conventional 0.935 0.220 4.244 2.196e-05 [0.503, 1.367] Robust - - 3.635 2.785e-04 [0.418, 1.396]
Methods
Attributes
all_coefEstimates of the causal parameter(s) for the
n_repdifferent sample splits after callingfit().all_seStandard errors of the causal parameter(s) for the
n_repdifferent sample splits after callingfit().ciConfidence intervals for the causal parameter(s) after calling
fit().coefEstimates for the causal parameter after calling
fit().cutoffCutoff at which the treatment effect is estimated.
fs_kernelKernel for the first stage estimation.
fuzzyIndicates whether the design is fuzzy or not.
hArray of final bandwidths in the last stage estimation (shape (
n_rep,)).h_fsInitial bandwidth in the first stage estimation.
n_foldsNumber of folds.
n_repNumber of repetitions for the sample splitting.
pvalp-values for the causal parameter(s) after calling
fit().seStandard errors for the causal parameter(s) after calling
fit().t_statt-statistics for the causal parameter(s) after calling
fit().wWeights for the first stage estimation.
- RDFlex.aggregate_over_splits()#