doubleml.DoubleMLIRM#
- class doubleml.DoubleMLIRM(obj_dml_data, ml_g, ml_m, n_folds=5, n_rep=1, score='ATE', weights=None, normalize_ipw=False, trimming_rule='truncate', trimming_threshold=0.01, draw_sample_splitting=True)#
Double machine learning for interactive regression models
- Parameters:
obj_dml_data (
DoubleMLData
object) – TheDoubleMLData
object providing the data and specifying the variables for the causal model.ml_g (estimator implementing
fit()
andpredict()
) – A machine learner implementingfit()
andpredict()
methods (e.g.sklearn.ensemble.RandomForestRegressor
) for the nuisance function \(g_0(D,X) = E[Y|X,D]\). For a binary outcome variable \(Y\) (with values 0 and 1), a classifier implementingfit()
andpredict_proba()
can also be specified. Ifsklearn.base.is_classifier()
returnsTrue
,predict_proba()
is used otherwisepredict()
.ml_m (classifier implementing
fit()
andpredict_proba()
) – A machine learner implementingfit()
andpredict_proba()
methods (e.g.sklearn.ensemble.RandomForestClassifier
) for the nuisance function \(m_0(X) = E[D|X]\).n_folds (int) – Number of folds. Default is
5
.n_rep (int) – Number of repetitons for the sample splitting. Default is
1
.score (str or callable) – A str (
'ATE'
or'ATTE'
) specifying the score function or a callable object / function with signaturepsi_a, psi_b = score(y, d, g_hat0, g_hat1, m_hat, smpls)
. Default is'ATE'
.weights (array, dict or None) – An numpy array of weights for each individual observation. If None, then the
'ATE'
score is applied (corresponds to weights equal to 1). Can only be used withscore = 'ATE'
. An array has to be of shape(n,)
, wheren
is the number of observations. A dictionary can be used to specify weights which depend on the treatment variable. In this case, the dictionary has to contain two keysweights
andweights_bar
, where the values have to be arrays of shape(n,)
and(n, n_rep)
. Default isNone
.normalize_ipw (bool) – Indicates whether the inverse probability weights are normalized. Default is
False
.trimming_rule (str) – A str (
'truncate'
is the only choice) specifying the trimming approach. Default is'truncate'
.trimming_threshold (float) – The threshold used for trimming. Default is
1e-2
.draw_sample_splitting (bool) – Indicates whether the sample splitting should be drawn during initialization of the object. Default is
True
.
Examples
>>> import numpy as np >>> import doubleml as dml >>> from doubleml.datasets import make_irm_data >>> from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier >>> np.random.seed(3141) >>> ml_g = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2) >>> ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2) >>> data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type='DataFrame') >>> obj_dml_data = dml.DoubleMLData(data, 'y', 'd') >>> dml_irm_obj = dml.DoubleMLIRM(obj_dml_data, ml_g, ml_m) >>> dml_irm_obj.fit().summary coef std err t P>|t| 2.5 % 97.5 % d 0.414073 0.238529 1.735941 0.082574 -0.053436 0.881581
Notes
Interactive regression (IRM) models take the form
\[ \begin{align}\begin{aligned}Y = g_0(D, X) + U, & &\mathbb{E}(U | X, D) = 0,\\D = m_0(X) + V, & &\mathbb{E}(V | X) = 0,\end{aligned}\end{align} \]where the treatment variable is binary, \(D \in \lbrace 0,1 \rbrace\). We consider estimation of the average treatment effects when treatment effects are fully heterogeneous. Target parameters of interest in this model are the average treatment effect (ATE),
\[\theta_0 = \mathbb{E}[g_0(1, X) - g_0(0,X)]\]and the average treatment effect of the treated (ATTE),
\[\theta_0 = \mathbb{E}[g_0(1, X) - g_0(0,X) | D=1].\]Methods
bootstrap
([method, n_rep_boot])Multiplier bootstrap for DoubleML models.
cate
(basis[, is_gate])Calculate conditional average treatment effects (CATE) for a given basis.
confint
([joint, level])Confidence intervals for DoubleML models.
Construct a
doubleml.DoubleMLFramework
object.Draw sample splitting for DoubleML models.
evaluate_learners
([learners, metric])Evaluate fitted learners for DoubleML models on cross-validated predictions.
fit
([n_jobs_cv, store_predictions, ...])Estimate DoubleML models.
gate
(groups, **kwargs)Calculate group average treatment effects (GATE) for groups.
get_params
(learner)Get hyperparameters for the nuisance model of DoubleML models.
p_adjust
([method])Multiple testing adjustment for DoubleML models.
policy_tree
(features[, depth])Estimate a decision tree for optimal treatment policy by weighted classification.
sensitivity_analysis
([cf_y, cf_d, rho, ...])Performs a sensitivity analysis to account for unobserved confounders.
sensitivity_benchmark
(benchmarking_set[, ...])Computes a benchmark for a given set of features.
sensitivity_plot
([idx_treatment, value, ...])Contour plot of the sensivity with respect to latent/confounding variables.
set_ml_nuisance_params
(learner, treat_var, ...)Set hyperparameters for the nuisance models of DoubleML models.
set_sample_splitting
(all_smpls[, ...])Set the sample splitting for DoubleML models.
tune
(param_grids[, tune_on_folds, ...])Hyperparameter-tuning for DoubleML models.
Attributes
all_coef
Estimates of the causal parameter(s) for the
n_rep
different sample splits after callingfit()
.all_se
Standard errors of the causal parameter(s) for the
n_rep
different sample splits after callingfit()
.boot_method
The method to construct the bootstrap replications.
boot_t_stat
Bootstrapped t-statistics for the causal parameter(s) after calling
fit()
andbootstrap()
.coef
Estimates for the causal parameter(s) after calling
fit()
.framework
The corresponding
doubleml.DoubleMLFramework
object.learner
The machine learners for the nuisance functions.
learner_names
The names of the learners.
models
The fitted nuisance models.
n_folds
Number of folds.
n_rep
Number of repetitions for the sample splitting.
n_rep_boot
The number of bootstrap replications.
normalize_ipw
Indicates whether the inverse probability weights are normalized.
nuisance_loss
The losses of the nuisance models (root-mean-squared-errors or logloss).
nuisance_targets
The outcome of the nuisance models.
params
The hyperparameters of the learners.
params_names
The names of the nuisance models with hyperparameters.
predictions
The predictions of the nuisance models in form of a dictinary.
psi
Values of the score function after calling
fit()
; For models (e.g., PLR, IRM, PLIV, IIVM) with linear score (in the parameter) \(\psi(W; \theta, \eta) = \psi_a(W; \eta) \theta + \psi_b(W; \eta)\).psi_deriv
Values of the derivative of the score function with respect to the parameter \(\theta\) after calling
fit()
; For models (e.g., PLR, IRM, PLIV, IIVM) with linear score (in the parameter) \(\psi_a(W; \eta)\).psi_elements
Values of the score function components after calling
fit()
; For models (e.g., PLR, IRM, PLIV, IIVM) with linear score (in the parameter) a dictionary with entriespsi_a
andpsi_b
for \(\psi_a(W; \eta)\) and \(\psi_b(W; \eta)\).pval
p-values for the causal parameter(s) after calling
fit()
.score
The score function.
se
Standard errors for the causal parameter(s) after calling
fit()
.sensitivity_elements
Values of the sensitivity components after calling
fit()
; If available (e.g., PLR, IRM) a dictionary with entriessigma2
,nu2
,psi_sigma2
,psi_nu2
andriesz_rep
.sensitivity_params
Values of the sensitivity parameters after calling
sesitivity_analysis()
; If available (e.g., PLR, IRM) a dictionary with entriestheta
,se
,ci
,rv
andrva
.sensitivity_summary
Returns a summary for the sensitivity analysis after calling
sensitivity_analysis()
.smpls
The partition used for cross-fitting.
smpls_cluster
The partition of clusters used for cross-fitting.
summary
A summary for the estimated causal effect after calling
fit()
.t_stat
t-statistics for the causal parameter(s) after calling
fit()
.trimming_rule
Specifies the used trimming rule.
trimming_threshold
Specifies the used trimming threshold.
weights
Specifies the weights for a weighted ATE.
- DoubleMLIRM.bootstrap(method='normal', n_rep_boot=500)#
Multiplier bootstrap for DoubleML models.
- DoubleMLIRM.cate(basis, is_gate=False, **kwargs)#
Calculate conditional average treatment effects (CATE) for a given basis.
- Parameters:
basis (
pandas.DataFrame
) – The basis for estimating the best linear predictor. Has to have the shape(n_obs, d)
, wheren_obs
is the number of observations andd
is the number of predictors.is_gate (bool) – Indicates whether the basis is constructed for GATEs (dummy-basis). Default is
False
.**kwargs (dict) – Additional keyword arguments to be passed to
statsmodels.regression.linear_model.OLS.fit()
e.g.cov_type
.
- Returns:
model – Best linear Predictor model.
- Return type:
doubleML.DoubleMLBLP
- DoubleMLIRM.confint(joint=False, level=0.95)#
Confidence intervals for DoubleML models.
- DoubleMLIRM.construct_framework()#
Construct a
doubleml.DoubleMLFramework
object. Can be used to construct e.g. confidence intervals.- Returns:
doubleml_framework
- Return type:
doubleml.DoubleMLFramework
- DoubleMLIRM.draw_sample_splitting()#
Draw sample splitting for DoubleML models.
The samples are drawn according to the attributes
n_folds
andn_rep
.- Returns:
self
- Return type:
- DoubleMLIRM.evaluate_learners(learners=None, metric=<function _rmse>)#
Evaluate fitted learners for DoubleML models on cross-validated predictions.
- Parameters:
learners (list) – A list of strings which correspond to the nuisance functions of the model.
metric (callable) – A callable function with inputs
y_pred
andy_true
of shape(1, n)
, wheren
specifies the number of observations. Remark that some models like IRM are not able to provide all values fory_true
for all learners and might contain somenan
values in the target vector. Default is the root-mean-square error.
- Returns:
dist – A dictionary containing the evaluated metric for each learner.
- Return type:
Examples
>>> import numpy as np >>> import doubleml as dml >>> from sklearn.metrics import mean_absolute_error >>> from doubleml.datasets import make_irm_data >>> from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier >>> np.random.seed(3141) >>> ml_g = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2) >>> ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2) >>> data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type='DataFrame') >>> obj_dml_data = dml.DoubleMLData(data, 'y', 'd') >>> dml_irm_obj = dml.DoubleMLIRM(obj_dml_data, ml_g, ml_m) >>> dml_irm_obj.fit() >>> def mae(y_true, y_pred): >>> subset = np.logical_not(np.isnan(y_true)) >>> return mean_absolute_error(y_true[subset], y_pred[subset]) >>> dml_irm_obj.evaluate_learners(metric=mae) {'ml_g0': array([[0.85974356]]), 'ml_g1': array([[0.85280376]]), 'ml_m': array([[0.35365143]])}
- DoubleMLIRM.fit(n_jobs_cv=None, store_predictions=True, external_predictions=None, store_models=False)#
Estimate DoubleML models.
- Parameters:
n_jobs_cv (None or int) – The number of CPUs to use to fit the learners.
None
means1
. Default isNone
.store_predictions (bool) – Indicates whether the predictions for the nuisance functions should be stored in
predictions
. Default isTrue
.store_models (bool) – Indicates whether the fitted models for the nuisance functions should be stored in
models
. This allows to analyze the fitted models or extract information like variable importance. Default isFalse
.external_predictions (None or dict) – If None all models for the learners are fitted and evaluated. If a dictionary containing predictions for a specific learner is supplied, the model will use the supplied nuisance predictions instead. Has to be a nested dictionary where the keys refer to the treatment and the keys of the nested dictionarys refer to the corresponding learners. Default is None.
- Returns:
self
- Return type:
- DoubleMLIRM.gate(groups, **kwargs)#
Calculate group average treatment effects (GATE) for groups.
- Parameters:
groups (
pandas.DataFrame
) – The group indicator for estimating the best linear predictor. Groups should be mutually exclusive. Has to be dummy coded with shape(n_obs, d)
, wheren_obs
is the number of observations andd
is the number of groups or(n_obs, 1)
and contain the corresponding groups (as str).**kwargs (dict) – Additional keyword arguments to be passed to
statsmodels.regression.linear_model.OLS.fit()
e.g.cov_type
.
- Returns:
model – Best linear Predictor model for Group Effects.
- Return type:
doubleML.DoubleMLBLP
- DoubleMLIRM.get_params(learner)#
Get hyperparameters for the nuisance model of DoubleML models.
- DoubleMLIRM.p_adjust(method='romano-wolf')#
Multiple testing adjustment for DoubleML models.
- Parameters:
method (str) – A str (
'romano-wolf''
,'bonferroni'
,'holm'
, etc) specifying the adjustment method. In addition to'romano-wolf''
, all methods implemented instatsmodels.stats.multitest.multipletests()
can be applied. Default is'romano-wolf'
.- Returns:
p_val – A data frame with adjusted p-values.
- Return type:
pd.DataFrame
- DoubleMLIRM.policy_tree(features, depth=2, **tree_params)#
Estimate a decision tree for optimal treatment policy by weighted classification.
- Parameters:
depth (int) – The depth of the estimated decision tree. Has to be larger than 0. Deeper trees derive a more complex decision policy. Default is
2
.features (
pandas.DataFrame
) – The covariates on which the policy tree is learned. Has to be of shape(n_obs, d)
, wheren_obs
is the number of observations andd
is the number of covariates to be included.**tree_params (dict) – Parameters that are forwarded to the
sklearn.tree.DecisionTreeClassifier
. Note that by default we perform minimal pruning by setting theccp_alpha = 0.01
andmin_samples_leaf = 8
. This can be adjusted.
- Returns:
model – Policy tree model.
- Return type:
doubleML.DoubleMLPolicyTree
- DoubleMLIRM.sensitivity_analysis(cf_y=0.03, cf_d=0.03, rho=1.0, level=0.95, null_hypothesis=0.0)#
Performs a sensitivity analysis to account for unobserved confounders.
The evaluated scenario is stored as a dictionary in the property
sensitivity_params
.- Parameters:
cf_y (float) – Percentage of the residual variation of the outcome explained by latent/confounding variables. Default is
0.03
.cf_d (float) – Percentage gains in the variation of the Riesz representer generated by latent/confounding variables. Default is
0.03
.rho (float) – The correlation between the differences in short and long representations in the main regression and Riesz representer. Has to be in [-1,1]. The absolute value determines the adversarial strength of the confounding (maximizes at 1.0). Default is
1.0
.level (float) – The confidence level. Default is
0.95
.null_hypothesis (float or numpy.ndarray) – Null hypothesis for the effect. Determines the robustness values. If it is a single float uses the same null hypothesis for all estimated parameters. Else the array has to be of shape (n_coefs,). Default is
0.0
.
- Returns:
self
- Return type:
- DoubleMLIRM.sensitivity_benchmark(benchmarking_set, fit_args=None)#
Computes a benchmark for a given set of features. Returns a DataFrame containing the corresponding values for cf_y, cf_d, rho and the change in estimates. :returns: benchmark_results – Benchmark results. :rtype: pandas.DataFrame
- DoubleMLIRM.sensitivity_plot(idx_treatment=0, value='theta', rho=1.0, level=0.95, null_hypothesis=0.0, include_scenario=True, benchmarks=None, fill=True, grid_bounds=(0.15, 0.15), grid_size=100)#
Contour plot of the sensivity with respect to latent/confounding variables.
- Parameters:
idx_treatment (int) – Index of the treatment to perform the sensitivity analysis. Default is
0
.value (str) – Determines which contours to plot. Valid values are
'theta'
(refers to the bounds) and'ci'
(refers to the bounds including statistical uncertainty). Default is'theta'
.rho (float) – The correlation between the differences in short and long representations in the main regression and Riesz representer. Has to be in [-1,1]. The absolute value determines the adversarial strength of the confounding (maximizes at 1.0). Default is
1.0
.level (float) – The confidence level. Default is
0.95
.null_hypothesis (float) – Null hypothesis for the effect. Determines the direction of the contour lines.
include_scenario (bool) – Indicates whether to highlight the scenario from the call of
sensitivity_analysis()
. Default isTrue
.benchmarks (dict or None) – Dictionary of benchmarks to be included in the plot. The keys are
cf_y
,cf_d
andname
. Default isNone
.fill (bool) – Indicates whether to use a heatmap style or only contour lines. Default is
True
.grid_bounds (tuple) – Determines the evaluation bounds of the grid for
cf_d
andcf_y
. Has to contain two floats in [0, 1). Default is(0.15, 0.15)
.grid_size (int) – Determines the number of evaluation points of the grid. Default is
100
.
- Returns:
fig – Plotly figure of the sensitivity contours.
- Return type:
- DoubleMLIRM.set_ml_nuisance_params(learner, treat_var, params)#
Set hyperparameters for the nuisance models of DoubleML models.
- Parameters:
learner (str) – The nuisance model / learner (see attribute
params_names
).treat_var (str) – The treatment variable (hyperparameters can be set treatment-variable specific).
params (dict or list) – A dict with estimator parameters (used for all folds) or a nested list with fold specific parameters. The outer list needs to be of length
n_rep
and the inner list of lengthn_folds
.
- Returns:
self
- Return type:
- DoubleMLIRM.set_sample_splitting(all_smpls, all_smpls_cluster=None)#
Set the sample splitting for DoubleML models.
The attributes
n_folds
andn_rep
are derived from the provided partition.- Parameters:
- If nested list of lists of tuples:
The outer list needs to provide an entry per repeated sample splitting (length of list is set as
n_rep
). The inner list needs to provide a tuple (train_ind, test_ind) per fold (length of list is set asn_folds
). test_ind must form a partition for each inner list.- If list of tuples:
The list needs to provide a tuple (train_ind, test_ind) per fold (length of list is set as
n_folds
). test_ind must form a partition.n_rep=1
is always set.- If tuple:
Must be a tuple with two elements train_ind and test_ind. Only viable option is to set train_ind and test_ind to np.arange(n_obs), which corresponds to no sample splitting.
n_folds=1
andn_rep=1
is always set.
all_smpls_cluster (list or None) – Nested list or
None
. The first level of nesting corresponds to the number of repetitions. The second level of nesting corresponds to the number of folds. The third level of nesting contains a tuple of training and testing lists. Both training and testing contain an array for each cluster variable, which form a partition of the clusters. Default isNone
.
- Returns:
self
- Return type:
Examples
>>> import numpy as np >>> import doubleml as dml >>> from doubleml.datasets import make_plr_CCDDHNR2018 >>> from sklearn.ensemble import RandomForestRegressor >>> from sklearn.base import clone >>> np.random.seed(3141) >>> learner = RandomForestRegressor(max_depth=2, n_estimators=10) >>> ml_g = learner >>> ml_m = learner >>> obj_dml_data = make_plr_CCDDHNR2018(n_obs=10, alpha=0.5) >>> dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_g, ml_m) >>> # simple sample splitting with two folds and without cross-fitting >>> smpls = ([0, 1, 2, 3, 4], [5, 6, 7, 8, 9]) >>> dml_plr_obj.set_sample_splitting(smpls) >>> # sample splitting with two folds and cross-fitting >>> smpls = [([0, 1, 2, 3, 4], [5, 6, 7, 8, 9]), >>> ([5, 6, 7, 8, 9], [0, 1, 2, 3, 4])] >>> dml_plr_obj.set_sample_splitting(smpls) >>> # sample splitting with two folds and repeated cross-fitting with n_rep = 2 >>> smpls = [[([0, 1, 2, 3, 4], [5, 6, 7, 8, 9]), >>> ([5, 6, 7, 8, 9], [0, 1, 2, 3, 4])], >>> [([0, 2, 4, 6, 8], [1, 3, 5, 7, 9]), >>> ([1, 3, 5, 7, 9], [0, 2, 4, 6, 8])]] >>> dml_plr_obj.set_sample_splitting(smpls)
- DoubleMLIRM.tune(param_grids, tune_on_folds=False, scoring_methods=None, n_folds_tune=5, search_mode='grid_search', n_iter_randomized_search=100, n_jobs_cv=None, set_as_params=True, return_tune_res=False)#
Hyperparameter-tuning for DoubleML models.
The hyperparameter-tuning is performed using either an exhaustive search over specified parameter values implemented in
sklearn.model_selection.GridSearchCV
or via a randomized search implemented insklearn.model_selection.RandomizedSearchCV
.- Parameters:
param_grids (dict) – A dict with a parameter grid for each nuisance model / learner (see attribute
learner_names
).tune_on_folds (bool) – Indicates whether the tuning should be done fold-specific or globally. Default is
False
.scoring_methods (None or dict) – The scoring method used to evaluate the predictions. The scoring method must be set per nuisance model via a dict (see attribute
learner_names
for the keys). If None, the estimator’s score method is used. Default isNone
.n_folds_tune (int) – Number of folds used for tuning. Default is
5
.search_mode (str) – A str (
'grid_search'
or'randomized_search'
) specifying whether hyperparameters are optimized viasklearn.model_selection.GridSearchCV
orsklearn.model_selection.RandomizedSearchCV
. Default is'grid_search'
.n_iter_randomized_search (int) – If
search_mode == 'randomized_search'
. The number of parameter settings that are sampled. Default is100
.n_jobs_cv (None or int) – The number of CPUs to use to tune the learners.
None
means1
. Default isNone
.set_as_params (bool) – Indicates whether the hyperparameters should be set in order to be used when
fit()
is called. Default isTrue
.return_tune_res (bool) – Indicates whether detailed tuning results should be returned. Default is
False
.
- Returns:
self (object) – Returned if
return_tune_res
isFalse
.tune_res (list) – A list containing detailed tuning results and the proposed hyperparameters. Returned if
return_tune_res
isTrue
.