2.2.3. doubleml.irm.DoubleMLAPOS#
- class doubleml.irm.DoubleMLAPOS(obj_dml_data, ml_g, ml_m, treatment_levels, n_folds=5, n_rep=1, score='APO', weights=None, normalize_ipw=False, trimming_rule='truncate', trimming_threshold=0.01, ps_processor_config: PSProcessorConfig | None = None, draw_sample_splitting=True)#
Double machine learning for interactive regression models with multiple discrete treatments.
- Parameters:
obj_dml_data (
DoubleMLDataobject) – TheDoubleMLDataobject providing the data and specifying the variables for the causal model.ml_g (estimator implementing
fit()andpredict()) – A machine learner implementingfit()andpredict()methods (e.g.sklearn.ensemble.RandomForestRegressor) for the nuisance function \(g_0(D, X) = E[Y | X, D]\). For a binary outcome variable \(Y\) (with values 0 and 1), a classifier implementingfit()andpredict_proba()can also be specified. Ifsklearn.base.is_classifier()returnsTrue,predict_proba()is used otherwisepredict().ml_m (classifier implementing
fit()andpredict_proba()) – A machine learner implementingfit()andpredict_proba()methods (e.g.sklearn.ensemble.RandomForestClassifier) for the nuisance function \(m_0(X) = E[D | X]\).treatment_levels (iterable of int or float) – The treatment levels for which average potential outcomes are evaluated. Each element must be present in the treatment variable
dofobj_dml_data.n_folds (int) – Number of folds. Default is
5.n_rep (int) – Number of repetitions for the sample splitting. Default is
1.score (str) – A str (
'APO') specifying the score function. Default is'APO'.weights (array, dict or None) – A numpy array of weights for each individual observation. If
None, then the'APO'score is applied (corresponds to weights equal to 1). An array has to be of shape(n,), wherenis the number of observations. A dictionary can be used to specify weights which depend on the treatment variable. In this case, the dictionary has to contain two keysweightsandweights_bar, where the values have to be arrays of shape(n,)and(n, n_rep). Default isNone.normalize_ipw (bool) – Indicates whether the inverse probability weights are normalized. Default is
False.trimming_rule (str, optional, deprecated) – (DEPRECATED) A str (
'truncate'is the only choice) specifying the trimming approach. Useps_processor_configinstead. Will be removed in a future version.trimming_threshold (float, optional, deprecated) – (DEPRECATED) The threshold used for trimming. Use
ps_processor_configinstead. Will be removed in a future version.ps_processor_config (PSProcessorConfig, optional) – Configuration for propensity score processing (clipping, calibration, etc.).
draw_sample_splitting (bool) – Indicates whether the sample splitting should be drawn during initialization of the object. Default is
True.
Methods
bootstrap([method, n_rep_boot])Multiplier bootstrap for DoubleML models.
causal_contrast(reference_levels)Average causal contrasts for DoubleMLAPOS models.
confint([joint, level])Confidence intervals for DoubleML models.
Draw sample splitting for DoubleML models.
fit([n_jobs_models, n_jobs_cv, ...])Estimate DoubleMLAPOS models.
sensitivity_analysis([cf_y, cf_d, rho, ...])Performs a sensitivity analysis to account for unobserved confounders.
sensitivity_benchmark(benchmarking_set[, ...])Computes a benchmark for a given set of features.
sensitivity_plot([idx_treatment, value, ...])Contour plot of the sensivity with respect to latent/confounding variables.
set_sample_splitting(all_smpls[, ...])Set the sample splitting for DoubleML models.
tune_ml_models(ml_param_space[, ...])Hyperparameter-tuning for DoubleML models using Optuna.
Attributes
all_coefEstimates of the causal parameter(s) for the
n_repdifferent sample splits after callingfit()(shape (n_treatment_levels,n_rep)).all_seStandard errors of the causal parameter(s) for the
n_repdifferent sample splits after callingfit()(shape (n_treatment_levels,n_rep)).boot_methodThe method to construct the bootstrap replications.
boot_t_statBootstrapped t-statistics for the causal parameter(s) after calling
fit()andbootstrap()(shape (n_rep_boot,n_treatment_levels,n_rep)).coefEstimates for the causal parameter(s) after calling
fit()(shape (n_treatment_levels,)).frameworkThe corresponding
doubleml.DoubleMLFrameworkobject.modellistThe list of models for each level.
n_foldsNumber of folds.
n_repNumber of repetitions for the sample splitting.
n_rep_bootThe number of bootstrap replications.
n_treatment_levelsThe number of treatment levels.
normalize_ipwIndicates whether the inverse probability weights are normalized.
ps_processorPropensity score processor.
ps_processor_configConfiguration for propensity score processing (clipping, calibration, etc.).
pvalp-values for the causal parameter(s) (shape (
n_treatment_levels,)).scoreThe score function.
seStandard errors for the causal parameter(s) after calling
fit()(shape (n_treatment_levels,)).sensitivity_elementsValues of the sensitivity components after calling
fit(); If available (e.g., PLR, IRM) a dictionary with entriessigma2,nu2,psi_sigma2,psi_nu2andriesz_rep.sensitivity_paramsValues of the sensitivity parameters after calling
sesitivity_analysis(); If available (e.g., PLR, IRM) a dictionary with entriestheta,se,ci,rvandrva.sensitivity_summaryReturns a summary for the sensitivity analysis after calling
sensitivity_analysis().smplsThe partition used for cross-fitting.
summaryA summary for the estimated causal effect after calling
fit().t_statt-statistics for the causal parameter(s) after calling
fit()(shape (n_treatment_levels,)).treatment_levelsThe evaluated treatment levels.
trimming_ruleSpecifies the used trimming rule.
trimming_thresholdSpecifies the used trimming threshold.
weightsSpecifies the weights for a weighted average potential outcome.
- DoubleMLAPOS.bootstrap(method='normal', n_rep_boot=500)#
Multiplier bootstrap for DoubleML models.
- DoubleMLAPOS.causal_contrast(reference_levels)#
Average causal contrasts for DoubleMLAPOS models. Estimates the difference in average potential outcomes between the treatment levels and the reference levels. The reference levels have to be a subset of the treatment levels or a single treatment level.
- Parameters:
reference_levels – The reference levels for the difference in average potential outcomes. Has to be an element of
treatment_levels.- Returns:
acc – A DoubleMLFramwork class for average causal contrast(s).
- Return type:
DoubleMLFramework
- DoubleMLAPOS.confint(joint=False, level=0.95)#
Confidence intervals for DoubleML models.
- DoubleMLAPOS.draw_sample_splitting()#
Draw sample splitting for DoubleML models.
The samples are drawn according to the attributes
n_foldsandn_rep.- Returns:
self
- Return type:
- DoubleMLAPOS.fit(n_jobs_models=None, n_jobs_cv=None, store_predictions=True, store_models=False, external_predictions=None)#
Estimate DoubleMLAPOS models.
- Parameters:
n_jobs_models (None or int) – The number of CPUs to use to fit the treatment_levels.
Nonemeans1. Default isNone.n_jobs_cv (None or int) – The number of CPUs to use to fit the learners.
Nonemeans1. Does not speed up computation for quantile models. Default isNone.store_predictions (bool) – Indicates whether the predictions for the nuisance functions should be stored in
predictions. Default isTrue.store_models (bool) – Indicates whether the fitted models for the nuisance functions should be stored in
models. This allows to analyze the fitted models or extract information like variable importance. Default isFalse.external_predictions (dict or None) – A nested dictionary where the keys correspond the the treatment levels and can contain predictions according to each treatment level. The values have to be dictionaries which can contain keys
'ml_g0','ml_g1'and'ml_m'. Default is None.
- Returns:
self
- Return type:
- DoubleMLAPOS.sensitivity_analysis(cf_y=0.03, cf_d=0.03, rho=1.0, level=0.95, null_hypothesis=0.0)#
Performs a sensitivity analysis to account for unobserved confounders.
The evaluated scenario is stored as a dictionary in the property
sensitivity_params.- Parameters:
cf_y (float) – Percentage of the residual variation of the outcome explained by latent/confounding variables. Default is
0.03.cf_d (float) – Percentage gains in the variation of the Riesz representer generated by latent/confounding variables. Default is
0.03.rho (float) – The correlation between the differences in short and long representations in the main regression and Riesz representer. Has to be in [-1,1]. The absolute value determines the adversarial strength of the confounding (maximizes at 1.0). Default is
1.0.level (float) – The confidence level. Default is
0.95.null_hypothesis (float or numpy.ndarray) – Null hypothesis for the effect. Determines the robustness values. If it is a single float uses the same null hypothesis for all estimated parameters. Else the array has to be of shape (n_coefs,). Default is
0.0.
- Returns:
self
- Return type:
- DoubleMLAPOS.sensitivity_benchmark(benchmarking_set, fit_args=None)#
Computes a benchmark for a given set of features. Returns a DataFrame containing the corresponding values for cf_y, cf_d, rho and the change in estimates. :returns: benchmark_results – Benchmark results. :rtype: pandas.DataFrame
- DoubleMLAPOS.sensitivity_plot(idx_treatment=0, value='theta', rho=1.0, level=0.95, null_hypothesis=0.0, include_scenario=True, benchmarks=None, fill=True, grid_bounds=(0.15, 0.15), grid_size=100)#
Contour plot of the sensivity with respect to latent/confounding variables.
- Parameters:
idx_treatment (int) – Index of the treatment to perform the sensitivity analysis. Default is
0.value (str) – Determines which contours to plot. Valid values are
'theta'(refers to the bounds) and'ci'(refers to the bounds including statistical uncertainty). Default is'theta'.rho (float) – The correlation between the differences in short and long representations in the main regression and Riesz representer. Has to be in [-1,1]. The absolute value determines the adversarial strength of the confounding (maximizes at 1.0). Default is
1.0.level (float) – The confidence level. Default is
0.95.null_hypothesis (float) – Null hypothesis for the effect. Determines the direction of the contour lines.
include_scenario (bool) – Indicates whether to highlight the scenario from the call of
sensitivity_analysis(). Default isTrue.benchmarks (dict or None) – Dictionary of benchmarks to be included in the plot. The keys are
cf_y,cf_dandname. Default isNone.fill (bool) – Indicates whether to use a heatmap style or only contour lines. Default is
True.grid_bounds (tuple) – Determines the evaluation bounds of the grid for
cf_dandcf_y. Has to contain two floats in [0, 1). Default is(0.15, 0.15).grid_size (int) – Determines the number of evaluation points of the grid. Default is
100.
- Returns:
fig – Plotly figure of the sensitivity contours.
- Return type:
- DoubleMLAPOS.set_sample_splitting(all_smpls, all_smpls_cluster=None)#
Set the sample splitting for DoubleML models.
The attributes
n_foldsandn_repare derived from the provided partition.- Parameters:
- If nested list of lists of tuples:
The outer list needs to provide an entry per repeated sample splitting (length of list is set as
n_rep). The inner list needs to provide a tuple (train_ind, test_ind) per fold (length of list is set asn_folds). test_ind must form a partition for each inner list.- If list of tuples:
The list needs to provide a tuple (train_ind, test_ind) per fold (length of list is set as
n_folds). test_ind must form a partition.n_rep=1is always set.- If tuple:
Must be a tuple with two elements train_ind and test_ind. Only viable option is to set train_ind and test_ind to np.arange(n_obs), which corresponds to no sample splitting.
n_folds=1andn_rep=1is always set.
all_smpls_cluster (list or None) – Nested list or
None. The first level of nesting corresponds to the number of repetitions. The second level of nesting corresponds to the number of folds. The third level of nesting contains a tuple of training and testing lists. Both training and testing contain an array for each cluster variable, which form a partition of the clusters. Default isNone.
- Returns:
self
- Return type:
Examples
>>> import numpy as np >>> import doubleml as dml >>> from doubleml.plm.datasets import make_plr_CCDDHNR2018 >>> from sklearn.ensemble import RandomForestRegressor >>> from sklearn.base import clone >>> np.random.seed(3141) >>> learner = RandomForestRegressor(max_depth=2, n_estimators=10) >>> ml_g = learner >>> ml_m = learner >>> obj_dml_data = make_plr_CCDDHNR2018(n_obs=10, alpha=0.5) >>> dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_g, ml_m) >>> # sample splitting with two folds and cross-fitting >>> smpls = [([0, 1, 2, 3, 4], [5, 6, 7, 8, 9]), ... ([5, 6, 7, 8, 9], [0, 1, 2, 3, 4])] >>> dml_plr_obj.set_sample_splitting(smpls) <doubleml.plm.plr.DoubleMLPLR object at 0x...> >>> # sample splitting with two folds and repeated cross-fitting with n_rep = 2 >>> smpls = [[([0, 1, 2, 3, 4], [5, 6, 7, 8, 9]), ... ([5, 6, 7, 8, 9], [0, 1, 2, 3, 4])], ... [([0, 2, 4, 6, 8], [1, 3, 5, 7, 9]), ... ([1, 3, 5, 7, 9], [0, 2, 4, 6, 8])]] >>> dml_plr_obj.set_sample_splitting(smpls) <doubleml.plm.plr.DoubleMLPLR object at 0x...>
- DoubleMLAPOS.tune_ml_models(ml_param_space, scoring_methods=None, cv=5, set_as_params=True, return_tune_res=False, optuna_settings=None)#
Hyperparameter-tuning for DoubleML models using Optuna.
The hyperparameter-tuning is performed using Optuna’s Bayesian optimization. Unlike grid/randomized search, Optuna tuning is performed once on the whole dataset using cross-validation, and the same optimal hyperparameters are used for all folds.
- Parameters:
ml_param_space (dict) –
A dict with a parameter grid function for each nuisance model (see attribute
params_names) or for each learner (see attributelearner_names). Mixed specification are allowed, i.e., some nuisance models can share the same learner. For mixed specifications, learner-specific settings will be overwritten by nuisance model-specific settings.Each parameter grid must be specified as a callable function that takes an Optuna trial and returns a dictionary of hyperparameters.
For PLR models, keys should be:
'ml_l','ml_m'(and optionally'ml_g'for IV-type score). For IRM models, keys should be:'ml_g0','ml_g1'(or just'ml_g'for both),'ml_m'.Example:
def ml_l_params(trial): return { 'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True), 'n_estimators': trial.suggest_int('n_estimators', 100, 500, step=50), 'num_leaves': trial.suggest_int('num_leaves', 20, 256), 'min_child_samples': trial.suggest_int('min_child_samples', 5, 100), } ml_param_space = {'ml_l': ml_l_params, 'ml_m': ml_m_params}
Note: Optuna tuning is performed globally (not fold-specific) to ensure consistent hyperparameters across all folds.
scoring_methods (None or dict) – The scoring method used to evaluate the predictions. The scoring method must be set per nuisance model via a dict (see attribute
params_namesfor the keys). If None, the estimator’s score method is used. Default isNone.cv (int, cross-validation splitter, or iterable of (train_indices, test_indices)) – Cross-validation strategy used for Optuna-based tuning. If an integer is provided, a shuffled
sklearn.model_selection.KFoldwith the specified number of splits andrandom_state=42is used. Custom splitters must implementsplit(and ideallyget_n_splits), or be an iterable yielding(train_indices, test_indices)pairs. Default is5.set_as_params (bool) – Indicates whether the hyperparameters should be set in order to be used when
fit()is called. Default isTrue.return_tune_res (bool) – Indicates whether detailed tuning results should be returned. Default is
False.optuna_settings (None or dict) –
Optional configuration passed to the Optuna tuner. Supports global settings as well as learner-specific overrides (using the keys from
ml_param_space). The dictionary can contain entries corresponding to Optuna’s study and optimize configuration such as:n_trials(int): Number of optimization trials (default: 100)timeout(float): Time limit in seconds for the study (default: None)direction(str): Optimization direction, ‘maximize’ or ‘minimize’. For sklearn scorers, use ‘maximize’ for negative metrics like ‘neg_mean_squared_error’ (since -0.1 > -0.2 means better performance). Can be set globally or per learner. (default: ‘maximize’)sampler(optuna.samplers.BaseSampler): Optuna sampler instance (default: None, uses TPE)callbacks(list): List of callback functions (default: None)show_progress_bar(bool): Show progress bar during optimization (default: False)n_jobs_optuna(int): Number of parallel trials (default: None)verbosity(int): Optuna logging verbosity level (default: None)study(optuna.study.Study): Pre-created study instance (default: None)study_kwargs(dict): Additional kwargs for study creation (default: {})optimize_kwargs(dict): Additional kwargs for study.optimize() (default: {})
To set direction per learner (similar to
scoring_methods):optuna_settings = { 'n_trials': 50, 'direction': 'maximize', # Global default 'ml_g0': {'direction': 'maximize'}, # Per-learner override 'ml_m': {'n_trials': 100, 'direction': 'maximize'} }
Defaults to
None.
- Returns:
self (object) – Returned if
return_tune_resisFalse.tune_res (list) – A list containing detailed tuning results and the proposed hyperparameters. Returned if
return_tune_resisTrue.
Examples
>>> import numpy as np >>> from doubleml import DoubleMLData, DoubleMLPLR >>> from doubleml.plm.datasets import make_plr_CCDDHNR2018 >>> from lightgbm import LGBMRegressor >>> import optuna >>> # Generate data >>> np.random.seed(42) >>> data = make_plr_CCDDHNR2018(n_obs=500, dim_x=20, return_type='DataFrame') >>> dml_data = DoubleMLData(data, 'y', 'd') >>> # Initialize model >>> dml_plr = DoubleMLPLR( ... dml_data, ... LGBMRegressor(n_estimators=50, verbose=-1, random_state=42), ... LGBMRegressor(n_estimators=50, verbose=-1, random_state=42) ... ) >>> # Define parameter grid functions >>> def ml_l_params(trial): ... return { ... 'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True), ... } >>> def ml_m_params(trial): ... return { ... 'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True), ... } >>> ml_param_space = {'ml_l': ml_l_params, 'ml_m': ml_m_params} >>> # Tune with TPE sampler >>> optuna_settings = { ... 'n_trials': 5, ... 'sampler': optuna.samplers.TPESampler(seed=42), ... } >>> tune_res = dml_plr.tune_ml_models(ml_param_space, optuna_settings=optuna_settings, return_tune_res=True) >>> print(tune_res[0]['ml_l'].best_params) {'learning_rate': 0.03907122389107094} >>> # Fit and get results >>> dml_plr.fit().summary coef std err t P>|t| 2.5 % 97.5 % d 0.57436 0.045206 12.705519 5.510257e-37 0.485759 0.662961 >>> # Example with scoring methods and directions >>> scoring_methods = { ... 'ml_l': 'neg_mean_squared_error', # Negative metric ... 'ml_m': 'neg_mean_squared_error' ... } >>> optuna_settings = { ... 'n_trials': 50, ... 'direction': 'maximize', # Maximize negative MSE (minimize MSE) ... 'sampler': optuna.samplers.TPESampler(seed=42), ... } >>> tune_res = dml_plr.tune_ml_models(ml_param_space, scoring_methods=scoring_methods, ... optuna_settings=optuna_settings, return_tune_res=True) >>> print(tune_res[0]['ml_l'].best_params) {'learning_rate': 0.04300012336462904} >>> dml_plr.fit().summary coef std err t P>|t| 2.5 % 97.5 % d 0.574796 0.045062 12.755721 2.896820e-37 0.486476 0.663115