doubleml.DoubleMLQTE#
- class doubleml.DoubleMLQTE(obj_dml_data, ml_g, ml_m=None, quantiles=0.5, n_folds=5, n_rep=1, score='PQ', dml_procedure='dml2', normalize_ipw=True, kde=None, trimming_rule='truncate', trimming_threshold=0.01, draw_sample_splitting=True)#
Double machine learning for quantile treatment effects
- Parameters
obj_dml_data (
DoubleMLData
object) – TheDoubleMLData
object providing the data and specifying the variables for the causal model.ml_g (classifier implementing
fit()
andpredict()
) – A machine learner implementingfit()
andpredict_proba()
methods (e.g.sklearn.ensemble.RandomForestClassifier
) for the nuisance elements which depend on priliminary estimation.ml_m (classifier implementing
fit()
andpredict_proba()
) – A machine learner implementingfit()
andpredict_proba()
methods (e.g.sklearn.ensemble.RandomForestClassifier
) for the propensity nuisance functions.quantiles (float or array_like) – Quantiles for treatment effect estimation. Entries have to be between
0
and1
. Default is0.5
.n_folds (int) – Number of folds. Default is
5
.n_rep (int) – Number of repetitons for the sample splitting. Default is
1
.score (str) – A str (
'PQ'
,'LPQ'
or'CVaR'
) specifying the score function. Default is'PQ'
.dml_procedure (str) – A str (
'dml1'
or'dml2'
) specifying the double machine learning algorithm. Default is'dml2'
.normalize_ipw (bool) – Indicates whether the inverse probability weights are normalized. Default is
True
.kde (callable or None) – A callable object / function with signature
deriv = kde(u, weights)
for weighted kernel density estimation. Herederiv
should evaluate the density in0
. Default is'None'
, which usesstatsmodels.nonparametric.kde.KDEUnivariate
with a gaussian kernel and silverman for bandwidth determination.trimming_rule (str) – A str (
'truncate'
is the only choice) specifying the trimming approach. Default is'truncate'
.trimming_threshold (float) – The threshold used for trimming. Default is
1e-2
.draw_sample_splitting (bool) – Indicates whether the sample splitting should be drawn during initialization of the object. Default is
True
.
Examples
>>> import numpy as np >>> import doubleml as dml >>> from doubleml.datasets import make_irm_data >>> from sklearn.ensemble import RandomForestClassifier >>> np.random.seed(3141) >>> ml_g = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2) >>> ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2) >>> data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type='DataFrame') >>> obj_dml_data = dml.DoubleMLData(data, 'y', 'd') >>> dml_qte_obj = dml.DoubleMLQTE(obj_dml_data, ml_g, ml_m, quantiles=[0.25, 0.5, 0.75]) >>> dml_qte_obj.fit().summary coef std err t P>|t| 2.5 % 97.5 % 0.25 0.274825 0.347310 0.791297 0.428771 -0.405890 0.955541 0.50 0.449150 0.192539 2.332782 0.019660 0.071782 0.826519 0.75 0.709606 0.193308 3.670867 0.000242 0.330731 1.088482
Methods
bootstrap
([method, n_rep_boot])Multiplier bootstrap for DoubleML models.
confint
([joint, level])Confidence intervals for DoubleML models.
Draw sample splitting for DoubleML models.
fit
([n_jobs_models, n_jobs_cv, ...])Estimate DoubleMLQTE models.
Attributes
all_coef
Estimates of the causal parameter(s) for the
n_rep
different sample splits after callingfit()
.apply_cross_fitting
Indicates whether cross-fitting should be applied.
boot_coef
Bootstrapped coefficients for the causal parameter(s) after calling
fit()
andbootstrap()
.boot_t_stat
Bootstrapped t-statistics for the causal parameter(s) after calling
fit()
andbootstrap()
.coef
Estimates for the causal parameter(s) after calling
fit()
.dml_procedure
The double machine learning algorithm.
kde
The kernel density estimation of the derivative.
modellist_0
List of the models for the control group (
treatment==0
).modellist_1
List of the models for the treatment group (
treatment==1
).n_folds
Number of folds.
n_quantiles
Number of Quantiles.
n_rep
Number of repetitions for the sample splitting.
n_rep_boot
The number of bootstrap replications.
normalize_ipw
Indicates whether the inverse probability weights are normalized.
pval
p-values for the causal parameter(s) after calling
fit()
.quantiles
Number of Quantiles.
score
Number of Quantiles.
se
Standard errors for the causal parameter(s) after calling
fit()
.smpls
The partition used for cross-fitting.
summary
A summary for the estimated causal effect after calling
fit()
.t_stat
t-statistics for the causal parameter(s) after calling
fit()
.trimming_rule
Specifies the used trimming rule.
trimming_threshold
Specifies the used trimming threshold.
- DoubleMLQTE.bootstrap(method='normal', n_rep_boot=500)#
Multiplier bootstrap for DoubleML models.
- DoubleMLQTE.confint(joint=False, level=0.95)#
Confidence intervals for DoubleML models.
- DoubleMLQTE.draw_sample_splitting()#
Draw sample splitting for DoubleML models.
The samples are drawn according to the attributes
n_folds
,n_rep
andapply_cross_fitting
.- Returns
self
- Return type
- DoubleMLQTE.fit(n_jobs_models=None, n_jobs_cv=None, store_predictions=True, store_models=False)#
Estimate DoubleMLQTE models.
- Parameters
n_jobs_models (None or int) – The number of CPUs to use to fit the quantiles.
None
means1
. Default isNone
.n_jobs_cv (None or int) – The number of CPUs to use to fit the learners.
None
means1
. Does not speed up computation for quantile models. Default isNone
.store_predictions (bool) – Indicates whether the predictions for the nuisance functions should be stored in
predictions
. Default isTrue
.store_models (bool) – Indicates whether the fitted models for the nuisance functions should be stored in
models
. This allows to analyze the fitted models or extract information like variable importance. Default isFalse
.
- Returns
self
- Return type