2.2.5. doubleml.irm.DoubleMLPQ#
- class doubleml.irm.DoubleMLPQ(obj_dml_data, ml_g, ml_m, treatment=1, quantile=0.5, n_folds=5, n_rep=1, score='PQ', normalize_ipw=True, kde=None, trimming_rule='truncate', trimming_threshold=0.01, draw_sample_splitting=True)#
- Double machine learning for potential quantiles - Parameters:
- obj_dml_data ( - DoubleMLDataobject) – The- DoubleMLDataobject providing the data and specifying the variables for the causal model.
- ml_g (classifier implementing - fit()and- predict()) – A machine learner implementing- fit()and- predict_proba()methods (e.g.- sklearn.ensemble.RandomForestClassifier) for the nuisance function \(g_0(X) = E[Y <= \theta | X, D=d]\) .
- ml_m (classifier implementing - fit()and- predict_proba()) – A machine learner implementing- fit()and- predict_proba()methods (e.g.- sklearn.ensemble.RandomForestClassifier) for the nuisance function \(m_0(X) = E[D=d|X]\).
- treatment (int) – Binary treatment indicator. Has to be either - 0or- 1. Determines the potential outcome to be considered. Default is- 1.
- quantile (float) – Quantile of the potential outcome. Has to be between - 0and- 1. Default is- 0.5.
- n_folds (int) – Number of folds. Default is - 5.
- n_rep (int) – Number of repetitions for the sample splitting. Default is - 1.
- score (str) – A str ( - 'PQ'is the only choice) specifying the score function for potential quantiles. Default is- 'PQ'.
- normalize_ipw (bool) – Indicates whether the inverse probability weights are normalized. Default is - True.
- kde (callable or None) – A callable object / function with signature - deriv = kde(u, weights)for weighted kernel density estimation. Here- derivshould evaluate the density in- 0. Default is- 'None', which uses- statsmodels.nonparametric.kde.KDEUnivariatewith a gaussian kernel and silverman for bandwidth determination.
- trimming_rule (str) – A str ( - 'truncate'is the only choice) specifying the trimming approach. Default is- 'truncate'.
- trimming_threshold (float) – The threshold used for trimming. Default is - 1e-2.
- draw_sample_splitting (bool) – Indicates whether the sample splitting should be drawn during initialization of the object. Default is - True.
 
 - Examples - >>> import numpy as np >>> import doubleml as dml >>> from doubleml.datasets import make_irm_data >>> from sklearn.ensemble import RandomForestClassifier >>> np.random.seed(3141) >>> ml_g = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2) >>> ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2) >>> data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type='DataFrame') >>> obj_dml_data = dml.DoubleMLData(data, 'y', 'd') >>> dml_pq_obj = dml.DoubleMLPQ(obj_dml_data, ml_g, ml_m, treatment=1, quantile=0.5) >>> dml_pq_obj.fit().summary coef std err t P>|t| 2.5 % 97.5 % d 0.553878 0.149858 3.696011 0.000219 0.260161 0.847595 - Methods - bootstrap([method, n_rep_boot])- Multiplier bootstrap for DoubleML models. - confint([joint, level])- Confidence intervals for DoubleML models. - Construct a - doubleml.DoubleMLFrameworkobject.- Draw sample splitting for DoubleML models. - evaluate_learners([learners, metric])- Evaluate fitted learners for DoubleML models on cross-validated predictions. - fit([n_jobs_cv, store_predictions, ...])- Estimate DoubleML models. - get_params(learner)- Get hyperparameters for the nuisance model of DoubleML models. - p_adjust([method])- Multiple testing adjustment for DoubleML models. - sensitivity_analysis([cf_y, cf_d, rho, ...])- Performs a sensitivity analysis to account for unobserved confounders. - sensitivity_benchmark(benchmarking_set[, ...])- Computes a benchmark for a given set of features. - sensitivity_plot([idx_treatment, value, ...])- Contour plot of the sensivity with respect to latent/confounding variables. - set_ml_nuisance_params(learner, treat_var, ...)- Set hyperparameters for the nuisance models of DoubleML models. - set_sample_splitting(all_smpls[, ...])- Set the sample splitting for DoubleML models. - tune(param_grids[, tune_on_folds, ...])- Hyperparameter-tuning for DoubleML models. - Attributes - all_coef- Estimates of the causal parameter(s) for the - n_repdifferent sample splits after calling- fit().- all_se- Standard errors of the causal parameter(s) for the - n_repdifferent sample splits after calling- fit().- boot_method- The method to construct the bootstrap replications. - boot_t_stat- Bootstrapped t-statistics for the causal parameter(s) after calling - fit()and- bootstrap().- coef- Estimates for the causal parameter(s) after calling - fit().- framework- The corresponding - doubleml.DoubleMLFrameworkobject.- kde- The kernel density estimation of the derivative. - learner- The machine learners for the nuisance functions. - learner_names- The names of the learners. - models- The fitted nuisance models. - n_folds- Number of folds. - n_obs- The number of observations used for estimation. - n_rep- Number of repetitions for the sample splitting. - n_rep_boot- The number of bootstrap replications. - normalize_ipw- Indicates whether the inverse probability weights are normalized. - nuisance_loss- The losses of the nuisance models (root-mean-squared-errors or logloss). - nuisance_targets- The outcome of the nuisance models. - params- The hyperparameters of the learners. - params_names- The names of the nuisance models with hyperparameters. - predictions- The predictions of the nuisance models in form of a dictinary. - psi- Values of the score function after calling - fit(); For models (e.g., PLR, IRM, PLIV, IIVM) with linear score (in the parameter) \(\psi(W; \theta, \eta) = \psi_a(W; \eta) \theta + \psi_b(W; \eta)\).- psi_deriv- Values of the derivative of the score function with respect to the parameter \(\theta\) after calling - fit(); For models (e.g., PLR, IRM, PLIV, IIVM) with linear score (in the parameter) \(\psi_a(W; \eta)\).- psi_elements- Values of the score function components after calling - fit(); For models (e.g., PLR, IRM, PLIV, IIVM) with linear score (in the parameter) a dictionary with entries- psi_aand- psi_bfor \(\psi_a(W; \eta)\) and \(\psi_b(W; \eta)\).- pval- p-values for the causal parameter(s) after calling - fit().- quantile- Quantile for potential outcome. - score- The score function. - se- Standard errors for the causal parameter(s) after calling - fit().- sensitivity_elements- Values of the sensitivity components after calling - fit(); If available (e.g., PLR, IRM) a dictionary with entries- sigma2,- nu2,- psi_sigma2,- psi_nu2and- riesz_rep.- sensitivity_params- Values of the sensitivity parameters after calling - sesitivity_analysis(); If available (e.g., PLR, IRM) a dictionary with entries- theta,- se,- ci,- rvand- rva.- sensitivity_summary- Returns a summary for the sensitivity analysis after calling - sensitivity_analysis().- smpls- The partition used for cross-fitting. - smpls_cluster- The partition of clusters used for cross-fitting. - summary- A summary for the estimated causal effect after calling - fit().- t_stat- t-statistics for the causal parameter(s) after calling - fit().- treatment- Treatment indicator for potential outcome. - trimming_rule- Specifies the used trimming rule. - trimming_threshold- Specifies the used trimming threshold. 
- DoubleMLPQ.bootstrap(method='normal', n_rep_boot=500)#
- Multiplier bootstrap for DoubleML models. 
- DoubleMLPQ.confint(joint=False, level=0.95)#
- Confidence intervals for DoubleML models. 
- DoubleMLPQ.construct_framework()#
- Construct a - doubleml.DoubleMLFrameworkobject. Can be used to construct e.g. confidence intervals.- Returns:
- doubleml_framework 
- Return type:
- doubleml.DoubleMLFramework 
 
- DoubleMLPQ.draw_sample_splitting()#
- Draw sample splitting for DoubleML models. - The samples are drawn according to the attributes - n_foldsand- n_rep.- Returns:
- self 
- Return type:
 
- DoubleMLPQ.evaluate_learners(learners=None, metric=<function _rmse>)#
- Evaluate fitted learners for DoubleML models on cross-validated predictions. - Parameters:
- learners (list) – A list of strings which correspond to the nuisance functions of the model. 
- metric (callable) – A callable function with inputs - y_predand- y_trueof shape- (1, n), where- nspecifies the number of observations. Remark that some models like IRM are not able to provide all values for- y_truefor all learners and might contain some- nanvalues in the target vector. Default is the root-mean-square error.
 
- Returns:
- dist – A dictionary containing the evaluated metric for each learner. 
- Return type:
 - Examples - >>> import numpy as np >>> import doubleml as dml >>> from sklearn.metrics import mean_absolute_error >>> from doubleml.datasets import make_irm_data >>> from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier >>> np.random.seed(3141) >>> ml_g = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2) >>> ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2) >>> data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type='DataFrame') >>> obj_dml_data = dml.DoubleMLData(data, 'y', 'd') >>> dml_irm_obj = dml.DoubleMLIRM(obj_dml_data, ml_g, ml_m) >>> dml_irm_obj.fit() >>> def mae(y_true, y_pred): >>> subset = np.logical_not(np.isnan(y_true)) >>> return mean_absolute_error(y_true[subset], y_pred[subset]) >>> dml_irm_obj.evaluate_learners(metric=mae) {'ml_g0': array([[0.85974356]]), 'ml_g1': array([[0.85280376]]), 'ml_m': array([[0.35365143]])} 
- DoubleMLPQ.fit(n_jobs_cv=None, store_predictions=True, external_predictions=None, store_models=False)#
- Estimate DoubleML models. - Parameters:
- n_jobs_cv (None or int) – The number of CPUs to use to fit the learners. - Nonemeans- 1. Default is- None.
- store_predictions (bool) – Indicates whether the predictions for the nuisance functions should be stored in - predictions. Default is- True.
- store_models (bool) – Indicates whether the fitted models for the nuisance functions should be stored in - models. This allows to analyze the fitted models or extract information like variable importance. Default is- False.
- external_predictions (None or dict) – If None all models for the learners are fitted and evaluated. If a dictionary containing predictions for a specific learner is supplied, the model will use the supplied nuisance predictions instead. Has to be a nested dictionary where the keys refer to the treatment and the keys of the nested dictionarys refer to the corresponding learners. Default is None. 
 
- Returns:
- self 
- Return type:
 
- DoubleMLPQ.get_params(learner)#
- Get hyperparameters for the nuisance model of DoubleML models. 
- DoubleMLPQ.p_adjust(method='romano-wolf')#
- Multiple testing adjustment for DoubleML models. - Parameters:
- method (str) – A str ( - 'romano-wolf'',- 'bonferroni',- 'holm', etc) specifying the adjustment method. In addition to- 'romano-wolf'', all methods implemented in- statsmodels.stats.multitest.multipletests()can be applied. Default is- 'romano-wolf'.
- Returns:
- p_val – A data frame with adjusted p-values. 
- Return type:
- pd.DataFrame 
 
- DoubleMLPQ.sensitivity_analysis(cf_y=0.03, cf_d=0.03, rho=1.0, level=0.95, null_hypothesis=0.0)#
- Performs a sensitivity analysis to account for unobserved confounders. - The evaluated scenario is stored as a dictionary in the property - sensitivity_params.- Parameters:
- cf_y (float) – Percentage of the residual variation of the outcome explained by latent/confounding variables. Default is - 0.03.
- cf_d (float) – Percentage gains in the variation of the Riesz representer generated by latent/confounding variables. Default is - 0.03.
- rho (float) – The correlation between the differences in short and long representations in the main regression and Riesz representer. Has to be in [-1,1]. The absolute value determines the adversarial strength of the confounding (maximizes at 1.0). Default is - 1.0.
- level (float) – The confidence level. Default is - 0.95.
- null_hypothesis (float or numpy.ndarray) – Null hypothesis for the effect. Determines the robustness values. If it is a single float uses the same null hypothesis for all estimated parameters. Else the array has to be of shape (n_coefs,). Default is - 0.0.
 
- Returns:
- self 
- Return type:
 
- DoubleMLPQ.sensitivity_benchmark(benchmarking_set, fit_args=None)#
- Computes a benchmark for a given set of features. Returns a DataFrame containing the corresponding values for cf_y, cf_d, rho and the change in estimates. - Parameters:
- Returns:
- benchmark_results – Benchmark results. 
- Return type:
 
- DoubleMLPQ.sensitivity_plot(idx_treatment=0, value='theta', rho=1.0, level=0.95, null_hypothesis=0.0, include_scenario=True, benchmarks=None, fill=True, grid_bounds=(0.15, 0.15), grid_size=100)#
- Contour plot of the sensivity with respect to latent/confounding variables. - Parameters:
- idx_treatment (int) – Index of the treatment to perform the sensitivity analysis. Default is - 0.
- value (str) – Determines which contours to plot. Valid values are - 'theta'(refers to the bounds) and- 'ci'(refers to the bounds including statistical uncertainty). Default is- 'theta'.
- rho (float) – The correlation between the differences in short and long representations in the main regression and Riesz representer. Has to be in [-1,1]. The absolute value determines the adversarial strength of the confounding (maximizes at 1.0). Default is - 1.0.
- level (float) – The confidence level. Default is - 0.95.
- null_hypothesis (float) – Null hypothesis for the effect. Determines the direction of the contour lines. 
- include_scenario (bool) – Indicates whether to highlight the scenario from the call of - sensitivity_analysis(). Default is- True.
- benchmarks (dict or None) – Dictionary of benchmarks to be included in the plot. The keys are - cf_y,- cf_dand- name. Default is- None.
- fill (bool) – Indicates whether to use a heatmap style or only contour lines. Default is - True.
- grid_bounds (tuple) – Determines the evaluation bounds of the grid for - cf_dand- cf_y. Has to contain two floats in [0, 1). Default is- (0.15, 0.15).
- grid_size (int) – Determines the number of evaluation points of the grid. Default is - 100.
 
- Returns:
- fig – Plotly figure of the sensitivity contours. 
- Return type:
 
- DoubleMLPQ.set_ml_nuisance_params(learner, treat_var, params)#
- Set hyperparameters for the nuisance models of DoubleML models. - Parameters:
- learner (str) – The nuisance model / learner (see attribute - params_names).
- treat_var (str) – The treatment variable (hyperparameters can be set treatment-variable specific). 
- params (dict or list) – A dict with estimator parameters (used for all folds) or a nested list with fold specific parameters. The outer list needs to be of length - n_repand the inner list of length- n_folds.
 
- Returns:
- self 
- Return type:
 
- DoubleMLPQ.set_sample_splitting(all_smpls, all_smpls_cluster=None)#
- Set the sample splitting for DoubleML models. - The attributes - n_foldsand- n_repare derived from the provided partition.- Parameters:
- If nested list of lists of tuples:
- The outer list needs to provide an entry per repeated sample splitting (length of list is set as - n_rep). The inner list needs to provide a tuple (train_ind, test_ind) per fold (length of list is set as- n_folds). test_ind must form a partition for each inner list.
- If list of tuples:
- The list needs to provide a tuple (train_ind, test_ind) per fold (length of list is set as - n_folds). test_ind must form a partition.- n_rep=1is always set.
- If tuple:
- Must be a tuple with two elements train_ind and test_ind. Only viable option is to set train_ind and test_ind to np.arange(n_obs), which corresponds to no sample splitting. - n_folds=1and- n_rep=1is always set.
 
- all_smpls_cluster (list or None) – Nested list or - None. The first level of nesting corresponds to the number of repetitions. The second level of nesting corresponds to the number of folds. The third level of nesting contains a tuple of training and testing lists. Both training and testing contain an array for each cluster variable, which form a partition of the clusters. Default is- None.
 
- Returns:
- self 
- Return type:
 - Examples - >>> import numpy as np >>> import doubleml as dml >>> from doubleml.datasets import make_plr_CCDDHNR2018 >>> from sklearn.ensemble import RandomForestRegressor >>> from sklearn.base import clone >>> np.random.seed(3141) >>> learner = RandomForestRegressor(max_depth=2, n_estimators=10) >>> ml_g = learner >>> ml_m = learner >>> obj_dml_data = make_plr_CCDDHNR2018(n_obs=10, alpha=0.5) >>> dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_g, ml_m) >>> # simple sample splitting with two folds and without cross-fitting >>> smpls = ([0, 1, 2, 3, 4], [5, 6, 7, 8, 9]) >>> dml_plr_obj.set_sample_splitting(smpls) >>> # sample splitting with two folds and cross-fitting >>> smpls = [([0, 1, 2, 3, 4], [5, 6, 7, 8, 9]), >>> ([5, 6, 7, 8, 9], [0, 1, 2, 3, 4])] >>> dml_plr_obj.set_sample_splitting(smpls) >>> # sample splitting with two folds and repeated cross-fitting with n_rep = 2 >>> smpls = [[([0, 1, 2, 3, 4], [5, 6, 7, 8, 9]), >>> ([5, 6, 7, 8, 9], [0, 1, 2, 3, 4])], >>> [([0, 2, 4, 6, 8], [1, 3, 5, 7, 9]), >>> ([1, 3, 5, 7, 9], [0, 2, 4, 6, 8])]] >>> dml_plr_obj.set_sample_splitting(smpls) 
- DoubleMLPQ.tune(param_grids, tune_on_folds=False, scoring_methods=None, n_folds_tune=5, search_mode='grid_search', n_iter_randomized_search=100, n_jobs_cv=None, set_as_params=True, return_tune_res=False)#
- Hyperparameter-tuning for DoubleML models. - The hyperparameter-tuning is performed using either an exhaustive search over specified parameter values implemented in - sklearn.model_selection.GridSearchCVor via a randomized search implemented in- sklearn.model_selection.RandomizedSearchCV.- Parameters:
- param_grids (dict) – A dict with a parameter grid for each nuisance model / learner (see attribute - learner_names).
- tune_on_folds (bool) – Indicates whether the tuning should be done fold-specific or globally. Default is - False.
- scoring_methods (None or dict) – The scoring method used to evaluate the predictions. The scoring method must be set per nuisance model via a dict (see attribute - learner_namesfor the keys). If None, the estimator’s score method is used. Default is- None.
- n_folds_tune (int) – Number of folds used for tuning. Default is - 5.
- search_mode (str) – A str ( - 'grid_search'or- 'randomized_search') specifying whether hyperparameters are optimized via- sklearn.model_selection.GridSearchCVor- sklearn.model_selection.RandomizedSearchCV. Default is- 'grid_search'.
- n_iter_randomized_search (int) – If - search_mode == 'randomized_search'. The number of parameter settings that are sampled. Default is- 100.
- n_jobs_cv (None or int) – The number of CPUs to use to tune the learners. - Nonemeans- 1. Default is- None.
- set_as_params (bool) – Indicates whether the hyperparameters should be set in order to be used when - fit()is called. Default is- True.
- return_tune_res (bool) – Indicates whether detailed tuning results should be returned. Default is - False.
 
- Returns:
- self (object) – Returned if - return_tune_resis- False.
- tune_res (list) – A list containing detailed tuning results and the proposed hyperparameters. Returned if - return_tune_resis- True.
 
 
 
    
  
  
    