doubleml.DoubleMLQTE#

class doubleml.DoubleMLQTE(obj_dml_data, ml_g, ml_m=None, quantiles=0.5, n_folds=5, n_rep=1, score='PQ', dml_procedure='dml2', normalize_ipw=True, kde=None, trimming_rule='truncate', trimming_threshold=0.01, draw_sample_splitting=True)#

Double machine learning for quantile treatment effects

Parameters:
  • obj_dml_data (DoubleMLData object) – The DoubleMLData object providing the data and specifying the variables for the causal model.

  • ml_g (classifier implementing fit() and predict()) – A machine learner implementing fit() and predict_proba() methods (e.g. sklearn.ensemble.RandomForestClassifier) for the nuisance elements which depend on priliminary estimation.

  • ml_m (classifier implementing fit() and predict_proba()) – A machine learner implementing fit() and predict_proba() methods (e.g. sklearn.ensemble.RandomForestClassifier) for the propensity nuisance functions.

  • quantiles (float or array_like) – Quantiles for treatment effect estimation. Entries have to be between 0 and 1. Default is 0.5.

  • n_folds (int) – Number of folds. Default is 5.

  • n_rep (int) – Number of repetitons for the sample splitting. Default is 1.

  • score (str) – A str ('PQ', 'LPQ' or 'CVaR') specifying the score function. Default is 'PQ'.

  • dml_procedure (str) – A str ('dml1' or 'dml2') specifying the double machine learning algorithm. Default is 'dml2'.

  • normalize_ipw (bool) – Indicates whether the inverse probability weights are normalized. Default is True.

  • kde (callable or None) – A callable object / function with signature deriv = kde(u, weights) for weighted kernel density estimation. Here deriv should evaluate the density in 0. Default is 'None', which uses statsmodels.nonparametric.kde.KDEUnivariate with a gaussian kernel and silverman for bandwidth determination.

  • trimming_rule (str) – A str ('truncate' is the only choice) specifying the trimming approach. Default is 'truncate'.

  • trimming_threshold (float) – The threshold used for trimming. Default is 1e-2.

  • draw_sample_splitting (bool) – Indicates whether the sample splitting should be drawn during initialization of the object. Default is True.

Examples

>>> import numpy as np
>>> import doubleml as dml
>>> from doubleml.datasets import make_irm_data
>>> from sklearn.ensemble import RandomForestClassifier
>>> np.random.seed(3141)
>>> ml_g = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2)
>>> ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2)
>>> data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type='DataFrame')
>>> obj_dml_data = dml.DoubleMLData(data, 'y', 'd')
>>> dml_qte_obj = dml.DoubleMLQTE(obj_dml_data, ml_g, ml_m, quantiles=[0.25, 0.5, 0.75])
>>> dml_qte_obj.fit().summary
          coef   std err         t     P>|t|     2.5 %    97.5 %
0.25  0.274825  0.347310  0.791297  0.428771 -0.405890  0.955541
0.50  0.449150  0.192539  2.332782  0.019660  0.071782  0.826519
0.75  0.709606  0.193308  3.670867  0.000242  0.330731  1.088482

Methods

bootstrap([method, n_rep_boot])

Multiplier bootstrap for DoubleML models.

confint([joint, level])

Confidence intervals for DoubleML models.

draw_sample_splitting()

Draw sample splitting for DoubleML models.

fit([n_jobs_models, n_jobs_cv, ...])

Estimate DoubleMLQTE models.

Attributes

all_coef

Estimates of the causal parameter(s) for the n_rep different sample splits after calling fit().

apply_cross_fitting

Indicates whether cross-fitting should be applied.

boot_coef

Bootstrapped coefficients for the causal parameter(s) after calling fit() and bootstrap().

boot_t_stat

Bootstrapped t-statistics for the causal parameter(s) after calling fit() and bootstrap().

coef

Estimates for the causal parameter(s) after calling fit().

dml_procedure

The double machine learning algorithm.

kde

The kernel density estimation of the derivative.

modellist_0

List of the models for the control group (treatment==0).

modellist_1

List of the models for the treatment group (treatment==1).

n_folds

Number of folds.

n_quantiles

Number of Quantiles.

n_rep

Number of repetitions for the sample splitting.

n_rep_boot

The number of bootstrap replications.

normalize_ipw

Indicates whether the inverse probability weights are normalized.

pval

p-values for the causal parameter(s) after calling fit().

quantiles

Number of Quantiles.

score

Number of Quantiles.

se

Standard errors for the causal parameter(s) after calling fit().

smpls

The partition used for cross-fitting.

summary

A summary for the estimated causal effect after calling fit().

t_stat

t-statistics for the causal parameter(s) after calling fit().

trimming_rule

Specifies the used trimming rule.

trimming_threshold

Specifies the used trimming threshold.

DoubleMLQTE.bootstrap(method='normal', n_rep_boot=500)#

Multiplier bootstrap for DoubleML models.

Parameters:
  • method (str) – A str ('Bayes', 'normal' or 'wild') specifying the multiplier bootstrap method. Default is 'normal'

  • n_rep_boot (int) – The number of bootstrap replications.

Returns:

self

Return type:

object

DoubleMLQTE.confint(joint=False, level=0.95)#

Confidence intervals for DoubleML models.

Parameters:
  • joint (bool) – Indicates whether joint confidence intervals are computed. Default is False

  • level (float) – The confidence level. Default is 0.95.

Returns:

df_ci – A data frame with the confidence interval(s).

Return type:

pd.DataFrame

DoubleMLQTE.draw_sample_splitting()#

Draw sample splitting for DoubleML models.

The samples are drawn according to the attributes n_folds, n_rep and apply_cross_fitting.

Returns:

self

Return type:

object

DoubleMLQTE.fit(n_jobs_models=None, n_jobs_cv=None, store_predictions=True, store_models=False, external_predictions=None)#

Estimate DoubleMLQTE models.

Parameters:
  • n_jobs_models (None or int) – The number of CPUs to use to fit the quantiles. None means 1. Default is None.

  • n_jobs_cv (None or int) – The number of CPUs to use to fit the learners. None means 1. Does not speed up computation for quantile models. Default is None.

  • store_predictions (bool) – Indicates whether the predictions for the nuisance functions should be stored in predictions. Default is True.

  • store_models (bool) – Indicates whether the fitted models for the nuisance functions should be stored in models. This allows to analyze the fitted models or extract information like variable importance. Default is False.

Returns:

self

Return type:

object