doubleml.DoubleMLQTE#
- class doubleml.DoubleMLQTE(obj_dml_data, ml_g, ml_m=None, quantiles=0.5, n_folds=5, n_rep=1, score='PQ', normalize_ipw=True, kde=None, trimming_rule='truncate', trimming_threshold=0.01, draw_sample_splitting=True)#
Double machine learning for quantile treatment effects
- Parameters:
obj_dml_data (
DoubleMLData
object) – TheDoubleMLData
object providing the data and specifying the variables for the causal model.ml_g (classifier implementing
fit()
andpredict()
) – A machine learner implementingfit()
andpredict_proba()
methods (e.g.sklearn.ensemble.RandomForestClassifier
) for the nuisance elements which depend on priliminary estimation.ml_m (classifier implementing
fit()
andpredict_proba()
) – A machine learner implementingfit()
andpredict_proba()
methods (e.g.sklearn.ensemble.RandomForestClassifier
) for the propensity nuisance functions.quantiles (float or array_like) – Quantiles for treatment effect estimation. Entries have to be between
0
and1
. Default is0.5
.n_folds (int) – Number of folds. Default is
5
.n_rep (int) – Number of repetitons for the sample splitting. Default is
1
.score (str) – A str (
'PQ'
,'LPQ'
or'CVaR'
) specifying the score function. Default is'PQ'
.normalize_ipw (bool) – Indicates whether the inverse probability weights are normalized. Default is
True
.kde (callable or None) – A callable object / function with signature
deriv = kde(u, weights)
for weighted kernel density estimation. Herederiv
should evaluate the density in0
. Default is'None'
, which usesstatsmodels.nonparametric.kde.KDEUnivariate
with a gaussian kernel and silverman for bandwidth determination.trimming_rule (str) – A str (
'truncate'
is the only choice) specifying the trimming approach. Default is'truncate'
.trimming_threshold (float) – The threshold used for trimming. Default is
1e-2
.draw_sample_splitting (bool) – Indicates whether the sample splitting should be drawn during initialization of the object. Default is
True
.
Examples
>>> import numpy as np >>> import doubleml as dml >>> from doubleml.datasets import make_irm_data >>> from sklearn.ensemble import RandomForestClassifier >>> np.random.seed(3141) >>> ml_g = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2) >>> ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2) >>> data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type='DataFrame') >>> obj_dml_data = dml.DoubleMLData(data, 'y', 'd') >>> dml_qte_obj = dml.DoubleMLQTE(obj_dml_data, ml_g, ml_m, quantiles=[0.25, 0.5, 0.75]) >>> dml_qte_obj.fit().summary coef std err t P>|t| 2.5 % 97.5 % 0.25 0.274825 0.347310 0.791297 0.428771 -0.405890 0.955541 0.50 0.449150 0.192539 2.332782 0.019660 0.071782 0.826519 0.75 0.709606 0.193308 3.670867 0.000242 0.330731 1.088482
Methods
bootstrap
([method, n_rep_boot])Multiplier bootstrap for DoubleML models.
confint
([joint, level])Confidence intervals for DoubleML models.
Draw sample splitting for DoubleML models.
fit
([n_jobs_models, n_jobs_cv, ...])Estimate DoubleMLQTE models.
p_adjust
([method])Multiple testing adjustment for DoubleML models.
set_sample_splitting
(all_smpls[, ...])Set the sample splitting for DoubleML models.
Attributes
all_coef
Estimates of the causal parameter(s) for the
n_rep
different sample splits after callingfit()
(shape (n_quantiles
,n_rep
)).all_se
Standard errors of the causal parameter(s) for the
n_rep
different sample splits after callingfit()
(shape (n_quantiles
,n_rep
)).boot_method
The method to construct the bootstrap replications.
boot_t_stat
Bootstrapped t-statistics for the causal parameter(s) after calling
fit()
andbootstrap()
(shape (n_rep_boot
,n_quantiles
,n_rep
)).coef
Estimates for the causal parameter(s) after calling
fit()
(shape (n_quantiles
,)).framework
The corresponding
doubleml.DoubleMLFramework
object.kde
The kernel density estimation of the derivative.
modellist_0
List of the models for the control group (
treatment==0
).modellist_1
List of the models for the treatment group (
treatment==1
).n_folds
Number of folds.
n_quantiles
Number of Quantiles.
n_rep
Number of repetitions for the sample splitting.
n_rep_boot
The number of bootstrap replications.
normalize_ipw
Indicates whether the inverse probability weights are normalized.
pval
p-values for the causal parameter(s) (shape (
n_quantiles
,)).quantiles
Number of Quantiles.
score
The score function.
se
Standard errors for the causal parameter(s) after calling
fit()
(shape (n_quantiles
,)).smpls
The partition used for cross-fitting.
summary
A summary for the estimated causal effect after calling
fit()
.t_stat
t-statistics for the causal parameter(s) after calling
fit()
(shape (n_quantiles
,)).trimming_rule
Specifies the used trimming rule.
trimming_threshold
Specifies the used trimming threshold.
- DoubleMLQTE.bootstrap(method='normal', n_rep_boot=500)#
Multiplier bootstrap for DoubleML models.
- DoubleMLQTE.confint(joint=False, level=0.95)#
Confidence intervals for DoubleML models.
- DoubleMLQTE.draw_sample_splitting()#
Draw sample splitting for DoubleML models.
The samples are drawn according to the attributes
n_folds
andn_rep
.- Returns:
self
- Return type:
- DoubleMLQTE.fit(n_jobs_models=None, n_jobs_cv=None, store_predictions=True, store_models=False, external_predictions=None)#
Estimate DoubleMLQTE models.
- Parameters:
n_jobs_models (None or int) – The number of CPUs to use to fit the quantiles.
None
means1
. Default isNone
.n_jobs_cv (None or int) – The number of CPUs to use to fit the learners.
None
means1
. Does not speed up computation for quantile models. Default isNone
.store_predictions (bool) – Indicates whether the predictions for the nuisance functions should be stored in
predictions
. Default isTrue
.store_models (bool) – Indicates whether the fitted models for the nuisance functions should be stored in
models
. This allows to analyze the fitted models or extract information like variable importance. Default isFalse
.
- Returns:
self
- Return type:
- DoubleMLQTE.p_adjust(method='romano-wolf')#
Multiple testing adjustment for DoubleML models.
- Parameters:
method (str) – A str (
'romano-wolf''
,'bonferroni'
,'holm'
, etc) specifying the adjustment method. In addition to'romano-wolf''
, all methods implemented instatsmodels.stats.multitest.multipletests()
can be applied. Default is'romano-wolf'
.- Returns:
p_val – A data frame with adjusted p-values.
- Return type:
pd.DataFrame
- DoubleMLQTE.set_sample_splitting(all_smpls, all_smpls_cluster=None)#
Set the sample splitting for DoubleML models.
The attributes
n_folds
andn_rep
are derived from the provided partition.- Parameters:
- If nested list of lists of tuples:
The outer list needs to provide an entry per repeated sample splitting (length of list is set as
n_rep
). The inner list needs to provide a tuple (train_ind, test_ind) per fold (length of list is set asn_folds
). test_ind must form a partition for each inner list.- If list of tuples:
The list needs to provide a tuple (train_ind, test_ind) per fold (length of list is set as
n_folds
). test_ind must form a partition.n_rep=1
is always set.- If tuple:
Must be a tuple with two elements train_ind and test_ind. Only viable option is to set train_ind and test_ind to np.arange(n_obs), which corresponds to no sample splitting.
n_folds=1
andn_rep=1
is always set.
all_smpls_cluster (list or None) – Nested list or
None
. The first level of nesting corresponds to the number of repetitions. The second level of nesting corresponds to the number of folds. The third level of nesting contains a tuple of training and testing lists. Both training and testing contain an array for each cluster variable, which form a partition of the clusters. Default isNone
.
- Returns:
self
- Return type:
Examples
>>> import numpy as np >>> import doubleml as dml >>> from doubleml.datasets import make_plr_CCDDHNR2018 >>> from sklearn.ensemble import RandomForestRegressor >>> from sklearn.base import clone >>> np.random.seed(3141) >>> learner = RandomForestRegressor(max_depth=2, n_estimators=10) >>> ml_g = learner >>> ml_m = learner >>> obj_dml_data = make_plr_CCDDHNR2018(n_obs=10, alpha=0.5) >>> dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_g, ml_m) >>> dml_plr_obj.set_sample_splitting(smpls) >>> # sample splitting with two folds and cross-fitting >>> smpls = [([0, 1, 2, 3, 4], [5, 6, 7, 8, 9]), >>> ([5, 6, 7, 8, 9], [0, 1, 2, 3, 4])] >>> dml_plr_obj.set_sample_splitting(smpls) >>> # sample splitting with two folds and repeated cross-fitting with n_rep = 2 >>> smpls = [[([0, 1, 2, 3, 4], [5, 6, 7, 8, 9]), >>> ([5, 6, 7, 8, 9], [0, 1, 2, 3, 4])], >>> [([0, 2, 4, 6, 8], [1, 3, 5, 7, 9]), >>> ([1, 3, 5, 7, 9], [0, 2, 4, 6, 8])]] >>> dml_plr_obj.set_sample_splitting(smpls)