2.2.8. doubleml.irm.DoubleMLQTE#
- class doubleml.irm.DoubleMLQTE(obj_dml_data, ml_g, ml_m=None, quantiles=0.5, n_folds=5, n_rep=1, score='PQ', normalize_ipw=True, kde=None, trimming_rule='truncate', trimming_threshold=0.01, draw_sample_splitting=True)#
Double machine learning for quantile treatment effects
- Parameters:
obj_dml_data (
DoubleMLDataobject) – TheDoubleMLDataobject providing the data and specifying the variables for the causal model.ml_g (classifier implementing
fit()andpredict()) – A machine learner implementingfit()andpredict_proba()methods (e.g.sklearn.ensemble.RandomForestClassifier) for the nuisance elements which depend on priliminary estimation.ml_m (classifier implementing
fit()andpredict_proba()) – A machine learner implementingfit()andpredict_proba()methods (e.g.sklearn.ensemble.RandomForestClassifier) for the propensity nuisance functions.quantiles (float or array_like) – Quantiles for treatment effect estimation. Entries have to be between
0and1. Default is0.5.n_folds (int) – Number of folds. Default is
5.n_rep (int) – Number of repetitions for the sample splitting. Default is
1.score (str) – A str (
'PQ','LPQ'or'CVaR') specifying the score function. Default is'PQ'.normalize_ipw (bool) – Indicates whether the inverse probability weights are normalized. Default is
True.kde (callable or None) – A callable object / function with signature
deriv = kde(u, weights)for weighted kernel density estimation. Herederivshould evaluate the density in0. Default is'None', which usesstatsmodels.nonparametric.kde.KDEUnivariatewith a gaussian kernel and silverman for bandwidth determination.trimming_rule (str) – A str (
'truncate'is the only choice) specifying the trimming approach. Default is'truncate'.trimming_threshold (float) – The threshold used for trimming. Default is
1e-2.draw_sample_splitting (bool) – Indicates whether the sample splitting should be drawn during initialization of the object. Default is
True.
Examples
>>> import numpy as np >>> import doubleml as dml >>> from doubleml.datasets import make_irm_data >>> from sklearn.ensemble import RandomForestClassifier >>> np.random.seed(3141) >>> ml_g = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2) >>> ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2) >>> data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type='DataFrame') >>> obj_dml_data = dml.DoubleMLData(data, 'y', 'd') >>> dml_qte_obj = dml.DoubleMLQTE(obj_dml_data, ml_g, ml_m, quantiles=[0.25, 0.5, 0.75]) >>> dml_qte_obj.fit().summary coef std err t P>|t| 2.5 % 97.5 % 0.25 0.274825 0.347310 0.791297 0.428771 -0.405890 0.955541 0.50 0.449150 0.192539 2.332782 0.019660 0.071782 0.826519 0.75 0.709606 0.193308 3.670867 0.000242 0.330731 1.088482
Methods
bootstrap([method, n_rep_boot])Multiplier bootstrap for DoubleML models.
confint([joint, level])Confidence intervals for DoubleML models.
Draw sample splitting for DoubleML models.
fit([n_jobs_models, n_jobs_cv, ...])Estimate DoubleMLQTE models.
p_adjust([method])Multiple testing adjustment for DoubleML models.
set_sample_splitting(all_smpls[, ...])Set the sample splitting for DoubleML models.
Attributes
all_coefEstimates of the causal parameter(s) for the
n_repdifferent sample splits after callingfit()(shape (n_quantiles,n_rep)).all_seStandard errors of the causal parameter(s) for the
n_repdifferent sample splits after callingfit()(shape (n_quantiles,n_rep)).boot_methodThe method to construct the bootstrap replications.
boot_t_statBootstrapped t-statistics for the causal parameter(s) after calling
fit()andbootstrap()(shape (n_rep_boot,n_quantiles,n_rep)).coefEstimates for the causal parameter(s) after calling
fit()(shape (n_quantiles,)).frameworkThe corresponding
doubleml.DoubleMLFrameworkobject.kdeThe kernel density estimation of the derivative.
modellist_0List of the models for the control group (
treatment==0).modellist_1List of the models for the treatment group (
treatment==1).n_foldsNumber of folds.
n_quantilesNumber of Quantiles.
n_repNumber of repetitions for the sample splitting.
n_rep_bootThe number of bootstrap replications.
normalize_ipwIndicates whether the inverse probability weights are normalized.
pvalp-values for the causal parameter(s) (shape (
n_quantiles,)).quantilesNumber of Quantiles.
scoreThe score function.
seStandard errors for the causal parameter(s) after calling
fit()(shape (n_quantiles,)).smplsThe partition used for cross-fitting.
summaryA summary for the estimated causal effect after calling
fit().t_statt-statistics for the causal parameter(s) after calling
fit()(shape (n_quantiles,)).trimming_ruleSpecifies the used trimming rule.
trimming_thresholdSpecifies the used trimming threshold.
- DoubleMLQTE.bootstrap(method='normal', n_rep_boot=500)#
Multiplier bootstrap for DoubleML models.
- DoubleMLQTE.confint(joint=False, level=0.95)#
Confidence intervals for DoubleML models.
- DoubleMLQTE.draw_sample_splitting()#
Draw sample splitting for DoubleML models.
The samples are drawn according to the attributes
n_foldsandn_rep.- Returns:
self
- Return type:
- DoubleMLQTE.fit(n_jobs_models=None, n_jobs_cv=None, store_predictions=True, store_models=False, external_predictions=None)#
Estimate DoubleMLQTE models.
- Parameters:
n_jobs_models (None or int) – The number of CPUs to use to fit the quantiles.
Nonemeans1. Default isNone.n_jobs_cv (None or int) – The number of CPUs to use to fit the learners.
Nonemeans1. Does not speed up computation for quantile models. Default isNone.store_predictions (bool) – Indicates whether the predictions for the nuisance functions should be stored in
predictions. Default isTrue.store_models (bool) – Indicates whether the fitted models for the nuisance functions should be stored in
models. This allows to analyze the fitted models or extract information like variable importance. Default isFalse.
- Returns:
self
- Return type:
- DoubleMLQTE.p_adjust(method='romano-wolf')#
Multiple testing adjustment for DoubleML models.
- Parameters:
method (str) – A str (
'romano-wolf'','bonferroni','holm', etc) specifying the adjustment method. In addition to'romano-wolf'', all methods implemented instatsmodels.stats.multitest.multipletests()can be applied. Default is'romano-wolf'.- Returns:
p_val – A data frame with adjusted p-values.
- Return type:
pd.DataFrame
- DoubleMLQTE.set_sample_splitting(all_smpls, all_smpls_cluster=None)#
Set the sample splitting for DoubleML models.
The attributes
n_foldsandn_repare derived from the provided partition.- Parameters:
- If nested list of lists of tuples:
The outer list needs to provide an entry per repeated sample splitting (length of list is set as
n_rep). The inner list needs to provide a tuple (train_ind, test_ind) per fold (length of list is set asn_folds). test_ind must form a partition for each inner list.- If list of tuples:
The list needs to provide a tuple (train_ind, test_ind) per fold (length of list is set as
n_folds). test_ind must form a partition.n_rep=1is always set.- If tuple:
Must be a tuple with two elements train_ind and test_ind. Only viable option is to set train_ind and test_ind to np.arange(n_obs), which corresponds to no sample splitting.
n_folds=1andn_rep=1is always set.
all_smpls_cluster (list or None) – Nested list or
None. The first level of nesting corresponds to the number of repetitions. The second level of nesting corresponds to the number of folds. The third level of nesting contains a tuple of training and testing lists. Both training and testing contain an array for each cluster variable, which form a partition of the clusters. Default isNone.
- Returns:
self
- Return type:
Examples
>>> import numpy as np >>> import doubleml as dml >>> from doubleml.datasets import make_plr_CCDDHNR2018 >>> from sklearn.ensemble import RandomForestRegressor >>> from sklearn.base import clone >>> np.random.seed(3141) >>> learner = RandomForestRegressor(max_depth=2, n_estimators=10) >>> ml_g = learner >>> ml_m = learner >>> obj_dml_data = make_plr_CCDDHNR2018(n_obs=10, alpha=0.5) >>> dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_g, ml_m) >>> dml_plr_obj.set_sample_splitting(smpls) >>> # sample splitting with two folds and cross-fitting >>> smpls = [([0, 1, 2, 3, 4], [5, 6, 7, 8, 9]), >>> ([5, 6, 7, 8, 9], [0, 1, 2, 3, 4])] >>> dml_plr_obj.set_sample_splitting(smpls) >>> # sample splitting with two folds and repeated cross-fitting with n_rep = 2 >>> smpls = [[([0, 1, 2, 3, 4], [5, 6, 7, 8, 9]), >>> ([5, 6, 7, 8, 9], [0, 1, 2, 3, 4])], >>> [([0, 2, 4, 6, 8], [1, 3, 5, 7, 9]), >>> ([1, 3, 5, 7, 9], [0, 2, 4, 6, 8])]] >>> dml_plr_obj.set_sample_splitting(smpls)