doubleml.DoubleMLAPOS#
- class doubleml.DoubleMLAPOS(obj_dml_data, ml_g, ml_m, treatment_levels, n_folds=5, n_rep=1, score='APO', weights=None, normalize_ipw=False, trimming_rule='truncate', trimming_threshold=0.01, draw_sample_splitting=True)#
Double machine learning for interactive regression models with multiple discrete treatments.
Methods
bootstrap
([method, n_rep_boot])Multiplier bootstrap for DoubleML models.
causal_contrast
(reference_levels)Average causal contrasts for DoubleMLAPOS models.
confint
([joint, level])Confidence intervals for DoubleML models.
Draw sample splitting for DoubleML models.
fit
([n_jobs_models, n_jobs_cv, ...])Estimate DoubleMLAPOS models.
sensitivity_analysis
([cf_y, cf_d, rho, ...])Performs a sensitivity analysis to account for unobserved confounders.
sensitivity_benchmark
(benchmarking_set[, ...])Computes a benchmark for a given set of features.
sensitivity_plot
([idx_treatment, value, ...])Contour plot of the sensivity with respect to latent/confounding variables.
set_sample_splitting
(all_smpls[, ...])Set the sample splitting for DoubleML models.
Attributes
all_coef
Estimates of the causal parameter(s) for the
n_rep
different sample splits after callingfit()
(shape (n_treatment_levels
,n_rep
)).all_se
Standard errors of the causal parameter(s) for the
n_rep
different sample splits after callingfit()
(shape (n_treatment_levels
,n_rep
)).boot_method
The method to construct the bootstrap replications.
boot_t_stat
Bootstrapped t-statistics for the causal parameter(s) after calling
fit()
andbootstrap()
(shape (n_rep_boot
,n_treatment_levels
,n_rep
)).coef
Estimates for the causal parameter(s) after calling
fit()
(shape (n_treatment_levels
,)).framework
The corresponding
doubleml.DoubleMLFramework
object.modellist
The list of models for each level.
n_folds
Number of folds.
n_rep
Number of repetitions for the sample splitting.
n_rep_boot
The number of bootstrap replications.
n_treatment_levels
The number of treatment levels.
normalize_ipw
Indicates whether the inverse probability weights are normalized.
pval
p-values for the causal parameter(s) (shape (
n_treatment_levels
,)).score
The score function.
se
Standard errors for the causal parameter(s) after calling
fit()
(shape (n_treatment_levels
,)).sensitivity_elements
Values of the sensitivity components after calling
fit()
; If available (e.g., PLR, IRM) a dictionary with entriessigma2
,nu2
,psi_sigma2
,psi_nu2
andriesz_rep
.sensitivity_params
Values of the sensitivity parameters after calling
sesitivity_analysis()
; If available (e.g., PLR, IRM) a dictionary with entriestheta
,se
,ci
,rv
andrva
.sensitivity_summary
Returns a summary for the sensitivity analysis after calling
sensitivity_analysis()
.smpls
The partition used for cross-fitting.
summary
A summary for the estimated causal effect after calling
fit()
.t_stat
t-statistics for the causal parameter(s) after calling
fit()
(shape (n_treatment_levels
,)).treatment_levels
The evaluated treatment levels.
trimming_rule
Specifies the used trimming rule.
trimming_threshold
Specifies the used trimming threshold.
weights
Specifies the weights for a weighted average potential outcome.
- DoubleMLAPOS.bootstrap(method='normal', n_rep_boot=500)#
Multiplier bootstrap for DoubleML models.
- DoubleMLAPOS.causal_contrast(reference_levels)#
Average causal contrasts for DoubleMLAPOS models. Estimates the difference in average potential outcomes between the treatment levels and the reference levels. The reference levels have to be a subset of the treatment levels or a single treatment level.
- Parameters:
reference_levels – The reference levels for the difference in average potential outcomes. Has to be an element of
treatment_levels
.- Returns:
acc – A DoubleMLFramwork class for average causal contrast(s).
- Return type:
DoubleMLFramework
- DoubleMLAPOS.confint(joint=False, level=0.95)#
Confidence intervals for DoubleML models.
- DoubleMLAPOS.draw_sample_splitting()#
Draw sample splitting for DoubleML models.
The samples are drawn according to the attributes
n_folds
andn_rep
.- Returns:
self
- Return type:
- DoubleMLAPOS.fit(n_jobs_models=None, n_jobs_cv=None, store_predictions=True, store_models=False, external_predictions=None)#
Estimate DoubleMLAPOS models.
- Parameters:
n_jobs_models (None or int) – The number of CPUs to use to fit the treatment_levels.
None
means1
. Default isNone
.n_jobs_cv (None or int) – The number of CPUs to use to fit the learners.
None
means1
. Does not speed up computation for quantile models. Default isNone
.store_predictions (bool) – Indicates whether the predictions for the nuisance functions should be stored in
predictions
. Default isTrue
.store_models (bool) – Indicates whether the fitted models for the nuisance functions should be stored in
models
. This allows to analyze the fitted models or extract information like variable importance. Default isFalse
.external_predictions (dict or None) – A nested dictionary where the keys correspond the the treatment levels and can contain predictions according to each treatment level. The values have to be dictionaries which can contain keys
'ml_g0'
,'ml_g1'
and'ml_m'
. Default is None.
- Returns:
self
- Return type:
- DoubleMLAPOS.sensitivity_analysis(cf_y=0.03, cf_d=0.03, rho=1.0, level=0.95, null_hypothesis=0.0)#
Performs a sensitivity analysis to account for unobserved confounders.
The evaluated scenario is stored as a dictionary in the property
sensitivity_params
.- Parameters:
cf_y (float) – Percentage of the residual variation of the outcome explained by latent/confounding variables. Default is
0.03
.cf_d (float) – Percentage gains in the variation of the Riesz representer generated by latent/confounding variables. Default is
0.03
.rho (float) – The correlation between the differences in short and long representations in the main regression and Riesz representer. Has to be in [-1,1]. The absolute value determines the adversarial strength of the confounding (maximizes at 1.0). Default is
1.0
.level (float) – The confidence level. Default is
0.95
.null_hypothesis (float or numpy.ndarray) – Null hypothesis for the effect. Determines the robustness values. If it is a single float uses the same null hypothesis for all estimated parameters. Else the array has to be of shape (n_coefs,). Default is
0.0
.
- Returns:
self
- Return type:
- DoubleMLAPOS.sensitivity_benchmark(benchmarking_set, fit_args=None)#
Computes a benchmark for a given set of features. Returns a DataFrame containing the corresponding values for cf_y, cf_d, rho and the change in estimates. :returns: benchmark_results – Benchmark results. :rtype: pandas.DataFrame
- DoubleMLAPOS.sensitivity_plot(idx_treatment=0, value='theta', rho=1.0, level=0.95, null_hypothesis=0.0, include_scenario=True, benchmarks=None, fill=True, grid_bounds=(0.15, 0.15), grid_size=100)#
Contour plot of the sensivity with respect to latent/confounding variables.
- Parameters:
idx_treatment (int) – Index of the treatment to perform the sensitivity analysis. Default is
0
.value (str) – Determines which contours to plot. Valid values are
'theta'
(refers to the bounds) and'ci'
(refers to the bounds including statistical uncertainty). Default is'theta'
.rho (float) – The correlation between the differences in short and long representations in the main regression and Riesz representer. Has to be in [-1,1]. The absolute value determines the adversarial strength of the confounding (maximizes at 1.0). Default is
1.0
.level (float) – The confidence level. Default is
0.95
.null_hypothesis (float) – Null hypothesis for the effect. Determines the direction of the contour lines.
include_scenario (bool) – Indicates whether to highlight the scenario from the call of
sensitivity_analysis()
. Default isTrue
.benchmarks (dict or None) – Dictionary of benchmarks to be included in the plot. The keys are
cf_y
,cf_d
andname
. Default isNone
.fill (bool) – Indicates whether to use a heatmap style or only contour lines. Default is
True
.grid_bounds (tuple) – Determines the evaluation bounds of the grid for
cf_d
andcf_y
. Has to contain two floats in [0, 1). Default is(0.15, 0.15)
.grid_size (int) – Determines the number of evaluation points of the grid. Default is
100
.
- Returns:
fig – Plotly figure of the sensitivity contours.
- Return type:
- DoubleMLAPOS.set_sample_splitting(all_smpls, all_smpls_cluster=None)#
Set the sample splitting for DoubleML models.
The attributes
n_folds
andn_rep
are derived from the provided partition.- Parameters:
- If nested list of lists of tuples:
The outer list needs to provide an entry per repeated sample splitting (length of list is set as
n_rep
). The inner list needs to provide a tuple (train_ind, test_ind) per fold (length of list is set asn_folds
). test_ind must form a partition for each inner list.- If list of tuples:
The list needs to provide a tuple (train_ind, test_ind) per fold (length of list is set as
n_folds
). test_ind must form a partition.n_rep=1
is always set.- If tuple:
Must be a tuple with two elements train_ind and test_ind. Only viable option is to set train_ind and test_ind to np.arange(n_obs), which corresponds to no sample splitting.
n_folds=1
andn_rep=1
is always set.
- Returns:
self
- Return type:
Examples
>>> import numpy as np >>> import doubleml as dml >>> from doubleml.datasets import make_plr_CCDDHNR2018 >>> from sklearn.ensemble import RandomForestRegressor >>> from sklearn.base import clone >>> np.random.seed(3141) >>> learner = RandomForestRegressor(max_depth=2, n_estimators=10) >>> ml_g = learner >>> ml_m = learner >>> obj_dml_data = make_plr_CCDDHNR2018(n_obs=10, alpha=0.5) >>> dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_g, ml_m) >>> # sample splitting with two folds and cross-fitting >>> smpls = [([0, 1, 2, 3, 4], [5, 6, 7, 8, 9]), >>> ([5, 6, 7, 8, 9], [0, 1, 2, 3, 4])] >>> dml_plr_obj.set_sample_splitting(smpls) >>> # sample splitting with two folds and repeated cross-fitting with n_rep = 2 >>> smpls = [[([0, 1, 2, 3, 4], [5, 6, 7, 8, 9]), >>> ([5, 6, 7, 8, 9], [0, 1, 2, 3, 4])], >>> [([0, 2, 4, 6, 8], [1, 3, 5, 7, 9]), >>> ([1, 3, 5, 7, 9], [0, 2, 4, 6, 8])]] >>> dml_plr_obj.set_sample_splitting(smpls)