doubleml.DoubleMLAPOS#

class doubleml.DoubleMLAPOS(obj_dml_data, ml_g, ml_m, treatment_levels, n_folds=5, n_rep=1, score='APO', weights=None, normalize_ipw=False, trimming_rule='truncate', trimming_threshold=0.01, draw_sample_splitting=True)#

Double machine learning for interactive regression models with multiple discrete treatments.

Methods

bootstrap([method, n_rep_boot])

Multiplier bootstrap for DoubleML models.

causal_contrast(reference_levels)

Average causal contrasts for DoubleMLAPOS models.

confint([joint, level])

Confidence intervals for DoubleML models.

draw_sample_splitting()

Draw sample splitting for DoubleML models.

fit([n_jobs_models, n_jobs_cv, ...])

Estimate DoubleMLAPOS models.

sensitivity_analysis([cf_y, cf_d, rho, ...])

Performs a sensitivity analysis to account for unobserved confounders.

sensitivity_benchmark(benchmarking_set[, ...])

Computes a benchmark for a given set of features.

sensitivity_plot([idx_treatment, value, ...])

Contour plot of the sensivity with respect to latent/confounding variables.

set_sample_splitting(all_smpls[, ...])

Set the sample splitting for DoubleML models.

Attributes

all_coef

Estimates of the causal parameter(s) for the n_rep different sample splits after calling fit() (shape (n_treatment_levels, n_rep)).

all_se

Standard errors of the causal parameter(s) for the n_rep different sample splits after calling fit() (shape (n_treatment_levels, n_rep)).

boot_method

The method to construct the bootstrap replications.

boot_t_stat

Bootstrapped t-statistics for the causal parameter(s) after calling fit() and bootstrap() (shape (n_rep_boot, n_treatment_levels, n_rep)).

coef

Estimates for the causal parameter(s) after calling fit() (shape (n_treatment_levels,)).

framework

The corresponding doubleml.DoubleMLFramework object.

modellist

The list of models for each level.

n_folds

Number of folds.

n_rep

Number of repetitions for the sample splitting.

n_rep_boot

The number of bootstrap replications.

n_treatment_levels

The number of treatment levels.

normalize_ipw

Indicates whether the inverse probability weights are normalized.

pval

p-values for the causal parameter(s) (shape (n_treatment_levels,)).

score

The score function.

se

Standard errors for the causal parameter(s) after calling fit() (shape (n_treatment_levels,)).

sensitivity_elements

Values of the sensitivity components after calling fit(); If available (e.g., PLR, IRM) a dictionary with entries sigma2, nu2, psi_sigma2, psi_nu2 and riesz_rep.

sensitivity_params

Values of the sensitivity parameters after calling sesitivity_analysis(); If available (e.g., PLR, IRM) a dictionary with entries theta, se, ci, rv and rva.

sensitivity_summary

Returns a summary for the sensitivity analysis after calling sensitivity_analysis().

smpls

The partition used for cross-fitting.

summary

A summary for the estimated causal effect after calling fit().

t_stat

t-statistics for the causal parameter(s) after calling fit() (shape (n_treatment_levels,)).

treatment_levels

The evaluated treatment levels.

trimming_rule

Specifies the used trimming rule.

trimming_threshold

Specifies the used trimming threshold.

weights

Specifies the weights for a weighted average potential outcome.

DoubleMLAPOS.bootstrap(method='normal', n_rep_boot=500)#

Multiplier bootstrap for DoubleML models.

Parameters:
  • method (str) – A str ('Bayes', 'normal' or 'wild') specifying the multiplier bootstrap method. Default is 'normal'

  • n_rep_boot (int) – The number of bootstrap replications.

Returns:

self

Return type:

object

DoubleMLAPOS.causal_contrast(reference_levels)#

Average causal contrasts for DoubleMLAPOS models. Estimates the difference in average potential outcomes between the treatment levels and the reference levels. The reference levels have to be a subset of the treatment levels or a single treatment level.

Parameters:

reference_levels – The reference levels for the difference in average potential outcomes. Has to be an element of treatment_levels.

Returns:

acc – A DoubleMLFramwork class for average causal contrast(s).

Return type:

DoubleMLFramework

DoubleMLAPOS.confint(joint=False, level=0.95)#

Confidence intervals for DoubleML models.

Parameters:
  • joint (bool) – Indicates whether joint confidence intervals are computed. Default is False

  • level (float) – The confidence level. Default is 0.95.

Returns:

df_ci – A data frame with the confidence interval(s).

Return type:

pd.DataFrame

DoubleMLAPOS.draw_sample_splitting()#

Draw sample splitting for DoubleML models.

The samples are drawn according to the attributes n_folds and n_rep.

Returns:

self

Return type:

object

DoubleMLAPOS.fit(n_jobs_models=None, n_jobs_cv=None, store_predictions=True, store_models=False, external_predictions=None)#

Estimate DoubleMLAPOS models.

Parameters:
  • n_jobs_models (None or int) – The number of CPUs to use to fit the treatment_levels. None means 1. Default is None.

  • n_jobs_cv (None or int) – The number of CPUs to use to fit the learners. None means 1. Does not speed up computation for quantile models. Default is None.

  • store_predictions (bool) – Indicates whether the predictions for the nuisance functions should be stored in predictions. Default is True.

  • store_models (bool) – Indicates whether the fitted models for the nuisance functions should be stored in models. This allows to analyze the fitted models or extract information like variable importance. Default is False.

  • external_predictions (dict or None) – A nested dictionary where the keys correspond the the treatment levels and can contain predictions according to each treatment level. The values have to be dictionaries which can contain keys 'ml_g0', 'ml_g1' and 'ml_m'. Default is None.

Returns:

self

Return type:

object

DoubleMLAPOS.sensitivity_analysis(cf_y=0.03, cf_d=0.03, rho=1.0, level=0.95, null_hypothesis=0.0)#

Performs a sensitivity analysis to account for unobserved confounders.

The evaluated scenario is stored as a dictionary in the property sensitivity_params.

Parameters:
  • cf_y (float) – Percentage of the residual variation of the outcome explained by latent/confounding variables. Default is 0.03.

  • cf_d (float) – Percentage gains in the variation of the Riesz representer generated by latent/confounding variables. Default is 0.03.

  • rho (float) – The correlation between the differences in short and long representations in the main regression and Riesz representer. Has to be in [-1,1]. The absolute value determines the adversarial strength of the confounding (maximizes at 1.0). Default is 1.0.

  • level (float) – The confidence level. Default is 0.95.

  • null_hypothesis (float or numpy.ndarray) – Null hypothesis for the effect. Determines the robustness values. If it is a single float uses the same null hypothesis for all estimated parameters. Else the array has to be of shape (n_coefs,). Default is 0.0.

Returns:

self

Return type:

object

DoubleMLAPOS.sensitivity_benchmark(benchmarking_set, fit_args=None)#

Computes a benchmark for a given set of features. Returns a DataFrame containing the corresponding values for cf_y, cf_d, rho and the change in estimates. :returns: benchmark_results – Benchmark results. :rtype: pandas.DataFrame

DoubleMLAPOS.sensitivity_plot(idx_treatment=0, value='theta', rho=1.0, level=0.95, null_hypothesis=0.0, include_scenario=True, benchmarks=None, fill=True, grid_bounds=(0.15, 0.15), grid_size=100)#

Contour plot of the sensivity with respect to latent/confounding variables.

Parameters:
  • idx_treatment (int) – Index of the treatment to perform the sensitivity analysis. Default is 0.

  • value (str) – Determines which contours to plot. Valid values are 'theta' (refers to the bounds) and 'ci' (refers to the bounds including statistical uncertainty). Default is 'theta'.

  • rho (float) – The correlation between the differences in short and long representations in the main regression and Riesz representer. Has to be in [-1,1]. The absolute value determines the adversarial strength of the confounding (maximizes at 1.0). Default is 1.0.

  • level (float) – The confidence level. Default is 0.95.

  • null_hypothesis (float) – Null hypothesis for the effect. Determines the direction of the contour lines.

  • include_scenario (bool) – Indicates whether to highlight the scenario from the call of sensitivity_analysis(). Default is True.

  • benchmarks (dict or None) – Dictionary of benchmarks to be included in the plot. The keys are cf_y, cf_d and name. Default is None.

  • fill (bool) – Indicates whether to use a heatmap style or only contour lines. Default is True.

  • grid_bounds (tuple) – Determines the evaluation bounds of the grid for cf_d and cf_y. Has to contain two floats in [0, 1). Default is (0.15, 0.15).

  • grid_size (int) – Determines the number of evaluation points of the grid. Default is 100.

Returns:

fig – Plotly figure of the sensitivity contours.

Return type:

object

DoubleMLAPOS.set_sample_splitting(all_smpls, all_smpls_cluster=None)#

Set the sample splitting for DoubleML models.

The attributes n_folds and n_rep are derived from the provided partition.

Parameters:

all_smpls (list or tuple) –

If nested list of lists of tuples:

The outer list needs to provide an entry per repeated sample splitting (length of list is set as n_rep). The inner list needs to provide a tuple (train_ind, test_ind) per fold (length of list is set as n_folds). test_ind must form a partition for each inner list.

If list of tuples:

The list needs to provide a tuple (train_ind, test_ind) per fold (length of list is set as n_folds). test_ind must form a partition. n_rep=1 is always set.

If tuple:

Must be a tuple with two elements train_ind and test_ind. Only viable option is to set train_ind and test_ind to np.arange(n_obs), which corresponds to no sample splitting. n_folds=1 and n_rep=1 is always set.

Returns:

self

Return type:

object

Examples

>>> import numpy as np
>>> import doubleml as dml
>>> from doubleml.datasets import make_plr_CCDDHNR2018
>>> from sklearn.ensemble import RandomForestRegressor
>>> from sklearn.base import clone
>>> np.random.seed(3141)
>>> learner = RandomForestRegressor(max_depth=2, n_estimators=10)
>>> ml_g = learner
>>> ml_m = learner
>>> obj_dml_data = make_plr_CCDDHNR2018(n_obs=10, alpha=0.5)
>>> dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_g, ml_m)
>>> # sample splitting with two folds and cross-fitting
>>> smpls = [([0, 1, 2, 3, 4], [5, 6, 7, 8, 9]),
>>>          ([5, 6, 7, 8, 9], [0, 1, 2, 3, 4])]
>>> dml_plr_obj.set_sample_splitting(smpls)
>>> # sample splitting with two folds and repeated cross-fitting with n_rep = 2
>>> smpls = [[([0, 1, 2, 3, 4], [5, 6, 7, 8, 9]),
>>>           ([5, 6, 7, 8, 9], [0, 1, 2, 3, 4])],
>>>          [([0, 2, 4, 6, 8], [1, 3, 5, 7, 9]),
>>>           ([1, 3, 5, 7, 9], [0, 2, 4, 6, 8])]]
>>> dml_plr_obj.set_sample_splitting(smpls)