2.2.3. doubleml.irm.DoubleMLAPOS#

class doubleml.irm.DoubleMLAPOS(obj_dml_data, ml_g, ml_m, treatment_levels, n_folds=5, n_rep=1, score='APO', weights=None, normalize_ipw=False, trimming_rule='truncate', trimming_threshold=0.01, draw_sample_splitting=True)#

Double machine learning for interactive regression models with multiple discrete treatments.

Methods

bootstrap([method, n_rep_boot])

Multiplier bootstrap for DoubleML models.

causal_contrast(reference_levels)

Average causal contrasts for DoubleMLAPOS models.

confint([joint, level])

Confidence intervals for DoubleML models.

draw_sample_splitting()

Draw sample splitting for DoubleML models.

fit([n_jobs_models, n_jobs_cv, ...])

Estimate DoubleMLAPOS models.

sensitivity_analysis([cf_y, cf_d, rho, ...])

Performs a sensitivity analysis to account for unobserved confounders.

sensitivity_benchmark(benchmarking_set[, ...])

Computes a benchmark for a given set of features.

sensitivity_plot([idx_treatment, value, ...])

Contour plot of the sensivity with respect to latent/confounding variables.

set_sample_splitting(all_smpls[, ...])

Set the sample splitting for DoubleML models.

Attributes

all_coef

Estimates of the causal parameter(s) for the n_rep different sample splits after calling fit() (shape (n_treatment_levels, n_rep)).

all_se

Standard errors of the causal parameter(s) for the n_rep different sample splits after calling fit() (shape (n_treatment_levels, n_rep)).

boot_method

The method to construct the bootstrap replications.

boot_t_stat

Bootstrapped t-statistics for the causal parameter(s) after calling fit() and bootstrap() (shape (n_rep_boot, n_treatment_levels, n_rep)).

coef

Estimates for the causal parameter(s) after calling fit() (shape (n_treatment_levels,)).

framework

The corresponding doubleml.DoubleMLFramework object.

modellist

The list of models for each level.

n_folds

Number of folds.

n_rep

Number of repetitions for the sample splitting.

n_rep_boot

The number of bootstrap replications.

n_treatment_levels

The number of treatment levels.

normalize_ipw

Indicates whether the inverse probability weights are normalized.

pval

p-values for the causal parameter(s) (shape (n_treatment_levels,)).

score

The score function.

se

Standard errors for the causal parameter(s) after calling fit() (shape (n_treatment_levels,)).

sensitivity_elements

Values of the sensitivity components after calling fit(); If available (e.g., PLR, IRM) a dictionary with entries sigma2, nu2, psi_sigma2, psi_nu2 and riesz_rep.

sensitivity_params

Values of the sensitivity parameters after calling sesitivity_analysis(); If available (e.g., PLR, IRM) a dictionary with entries theta, se, ci, rv and rva.

sensitivity_summary

Returns a summary for the sensitivity analysis after calling sensitivity_analysis().

smpls

The partition used for cross-fitting.

summary

A summary for the estimated causal effect after calling fit().

t_stat

t-statistics for the causal parameter(s) after calling fit() (shape (n_treatment_levels,)).

treatment_levels

The evaluated treatment levels.

trimming_rule

Specifies the used trimming rule.

trimming_threshold

Specifies the used trimming threshold.

weights

Specifies the weights for a weighted average potential outcome.

DoubleMLAPOS.bootstrap(method='normal', n_rep_boot=500)#

Multiplier bootstrap for DoubleML models.

Parameters:
  • method (str) – A str ('Bayes', 'normal' or 'wild') specifying the multiplier bootstrap method. Default is 'normal'

  • n_rep_boot (int) – The number of bootstrap replications.

Returns:

self

Return type:

object

DoubleMLAPOS.causal_contrast(reference_levels)#

Average causal contrasts for DoubleMLAPOS models. Estimates the difference in average potential outcomes between the treatment levels and the reference levels. The reference levels have to be a subset of the treatment levels or a single treatment level.

Parameters:

reference_levels – The reference levels for the difference in average potential outcomes. Has to be an element of treatment_levels.

Returns:

acc – A DoubleMLFramwork class for average causal contrast(s).

Return type:

DoubleMLFramework

DoubleMLAPOS.confint(joint=False, level=0.95)#

Confidence intervals for DoubleML models.

Parameters:
  • joint (bool) – Indicates whether joint confidence intervals are computed. Default is False

  • level (float) – The confidence level. Default is 0.95.

Returns:

df_ci – A data frame with the confidence interval(s).

Return type:

pd.DataFrame

DoubleMLAPOS.draw_sample_splitting()#

Draw sample splitting for DoubleML models.

The samples are drawn according to the attributes n_folds and n_rep.

Returns:

self

Return type:

object

DoubleMLAPOS.fit(n_jobs_models=None, n_jobs_cv=None, store_predictions=True, store_models=False, external_predictions=None)#

Estimate DoubleMLAPOS models.

Parameters:
  • n_jobs_models (None or int) – The number of CPUs to use to fit the treatment_levels. None means 1. Default is None.

  • n_jobs_cv (None or int) – The number of CPUs to use to fit the learners. None means 1. Does not speed up computation for quantile models. Default is None.

  • store_predictions (bool) – Indicates whether the predictions for the nuisance functions should be stored in predictions. Default is True.

  • store_models (bool) – Indicates whether the fitted models for the nuisance functions should be stored in models. This allows to analyze the fitted models or extract information like variable importance. Default is False.

  • external_predictions (dict or None) – A nested dictionary where the keys correspond the the treatment levels and can contain predictions according to each treatment level. The values have to be dictionaries which can contain keys 'ml_g0', 'ml_g1' and 'ml_m'. Default is None.

Returns:

self

Return type:

object

DoubleMLAPOS.sensitivity_analysis(cf_y=0.03, cf_d=0.03, rho=1.0, level=0.95, null_hypothesis=0.0)#

Performs a sensitivity analysis to account for unobserved confounders.

The evaluated scenario is stored as a dictionary in the property sensitivity_params.

Parameters:
  • cf_y (float) – Percentage of the residual variation of the outcome explained by latent/confounding variables. Default is 0.03.

  • cf_d (float) – Percentage gains in the variation of the Riesz representer generated by latent/confounding variables. Default is 0.03.

  • rho (float) – The correlation between the differences in short and long representations in the main regression and Riesz representer. Has to be in [-1,1]. The absolute value determines the adversarial strength of the confounding (maximizes at 1.0). Default is 1.0.

  • level (float) – The confidence level. Default is 0.95.

  • null_hypothesis (float or numpy.ndarray) – Null hypothesis for the effect. Determines the robustness values. If it is a single float uses the same null hypothesis for all estimated parameters. Else the array has to be of shape (n_coefs,). Default is 0.0.

Returns:

self

Return type:

object

DoubleMLAPOS.sensitivity_benchmark(benchmarking_set, fit_args=None)#

Computes a benchmark for a given set of features. Returns a DataFrame containing the corresponding values for cf_y, cf_d, rho and the change in estimates. :returns: benchmark_results – Benchmark results. :rtype: pandas.DataFrame

DoubleMLAPOS.sensitivity_plot(idx_treatment=0, value='theta', rho=1.0, level=0.95, null_hypothesis=0.0, include_scenario=True, benchmarks=None, fill=True, grid_bounds=(0.15, 0.15), grid_size=100)#

Contour plot of the sensivity with respect to latent/confounding variables.

Parameters:
  • idx_treatment (int) – Index of the treatment to perform the sensitivity analysis. Default is 0.

  • value (str) – Determines which contours to plot. Valid values are 'theta' (refers to the bounds) and 'ci' (refers to the bounds including statistical uncertainty). Default is 'theta'.

  • rho (float) – The correlation between the differences in short and long representations in the main regression and Riesz representer. Has to be in [-1,1]. The absolute value determines the adversarial strength of the confounding (maximizes at 1.0). Default is 1.0.

  • level (float) – The confidence level. Default is 0.95.

  • null_hypothesis (float) – Null hypothesis for the effect. Determines the direction of the contour lines.

  • include_scenario (bool) – Indicates whether to highlight the scenario from the call of sensitivity_analysis(). Default is True.

  • benchmarks (dict or None) – Dictionary of benchmarks to be included in the plot. The keys are cf_y, cf_d and name. Default is None.

  • fill (bool) – Indicates whether to use a heatmap style or only contour lines. Default is True.

  • grid_bounds (tuple) – Determines the evaluation bounds of the grid for cf_d and cf_y. Has to contain two floats in [0, 1). Default is (0.15, 0.15).

  • grid_size (int) – Determines the number of evaluation points of the grid. Default is 100.

Returns:

fig – Plotly figure of the sensitivity contours.

Return type:

object

DoubleMLAPOS.set_sample_splitting(all_smpls, all_smpls_cluster=None)#

Set the sample splitting for DoubleML models.

The attributes n_folds and n_rep are derived from the provided partition.

Parameters:
  • all_smpls (list or tuple) –

    If nested list of lists of tuples:

    The outer list needs to provide an entry per repeated sample splitting (length of list is set as n_rep). The inner list needs to provide a tuple (train_ind, test_ind) per fold (length of list is set as n_folds). test_ind must form a partition for each inner list.

    If list of tuples:

    The list needs to provide a tuple (train_ind, test_ind) per fold (length of list is set as n_folds). test_ind must form a partition. n_rep=1 is always set.

    If tuple:

    Must be a tuple with two elements train_ind and test_ind. Only viable option is to set train_ind and test_ind to np.arange(n_obs), which corresponds to no sample splitting. n_folds=1 and n_rep=1 is always set.

  • all_smpls_cluster (list or None) – Nested list or None. The first level of nesting corresponds to the number of repetitions. The second level of nesting corresponds to the number of folds. The third level of nesting contains a tuple of training and testing lists. Both training and testing contain an array for each cluster variable, which form a partition of the clusters. Default is None.

Returns:

self

Return type:

object

Examples

>>> import numpy as np
>>> import doubleml as dml
>>> from doubleml.plm.datasets import make_plr_CCDDHNR2018
>>> from sklearn.ensemble import RandomForestRegressor
>>> from sklearn.base import clone
>>> np.random.seed(3141)
>>> learner = RandomForestRegressor(max_depth=2, n_estimators=10)
>>> ml_g = learner
>>> ml_m = learner
>>> obj_dml_data = make_plr_CCDDHNR2018(n_obs=10, alpha=0.5)
>>> dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_g, ml_m)
>>> # simple sample splitting with two folds and without cross-fitting
>>> smpls = ([0, 1, 2, 3, 4], [5, 6, 7, 8, 9])
>>> dml_plr_obj.set_sample_splitting(smpls)
>>> # sample splitting with two folds and cross-fitting
>>> smpls = [([0, 1, 2, 3, 4], [5, 6, 7, 8, 9]),
>>>          ([5, 6, 7, 8, 9], [0, 1, 2, 3, 4])]
>>> dml_plr_obj.set_sample_splitting(smpls)
>>> # sample splitting with two folds and repeated cross-fitting with n_rep = 2
>>> smpls = [[([0, 1, 2, 3, 4], [5, 6, 7, 8, 9]),
>>>           ([5, 6, 7, 8, 9], [0, 1, 2, 3, 4])],
>>>          [([0, 2, 4, 6, 8], [1, 3, 5, 7, 9]),
>>>           ([1, 3, 5, 7, 9], [0, 2, 4, 6, 8])]]
>>> dml_plr_obj.set_sample_splitting(smpls)