Note

https://docs.doubleml.org/stable/examples/py_double_ml_sensitivity.ipynb

Python: Sensitivity Analysis#

This notebook illustrates the sensitivity analysis tools with the partiallly linear regression model (PLR). The DoubleML package implements sensitivity analysis based on Chernozhukov et al. (2022).

Simulation Example#

For illustration purposes, we will work with generated data. This enables us to set the counfounding strength, such that we can correctly access quality of e.g. the robustness values.

[1]:

import numpy as np
import pandas as pd

from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier

import doubleml as dml
from doubleml.datasets import make_confounded_plr_data, fetch_401K

Data#

The data will be generated via make_confounded_plr_data and set the confounding values to cf_y=0.1 and cf_d=0.1.

Both parameters determine the strength of the confounding

cf_y measures the proportion of residual variance in the outcome explained by unobserved confounders
cf_d measires the porportion of residual variance of the treatment representer generated by unobserved confounders.

For further details, see Chernozhukov et al. (2022) or User guide. Further, the true treatment effect is set to theta=5.0.

[2]:

cf_y = 0.1
cf_d = 0.1
theta = 5.0

[3]:

np.random.seed(42)
dpg_dict = make_confounded_plr_data(n_obs=1000, cf_y=cf_y, cf_d=cf_d, theta=theta)
x_cols = [f'X{i + 1}' for i in np.arange(dpg_dict['x'].shape[1])]
df = pd.DataFrame(np.column_stack((dpg_dict['x'], dpg_dict['y'], dpg_dict['d'])), columns=x_cols + ['y', 'd'])
dml_data = dml.DoubleMLData(df, 'y', 'd')
print(dml_data)

================== DoubleMLData Object ==================

------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['X1', 'X2', 'X3', 'X4']
Instrument variable(s): None
No. Observations: 1000

------------------ DataFrame info    ------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Columns: 6 entries, X1 to d
dtypes: float64(6)
memory usage: 47.0 KB

DoubleML Object#

We fit a DoubleMLPLR object and use basic random forest to flexibly estimate the nuisance elements.

[4]:

np.random.seed(42)
dml_obj = dml.DoubleMLPLR(dml_data,
                          ml_l=RandomForestRegressor(),
                          ml_m=RandomForestRegressor(),
                          n_folds=5,
                          score='partialling out')
dml_obj.fit()
print(dml_obj)

================== DoubleMLPLR Object ==================

------------------ Data Summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['X1', 'X2', 'X3', 'X4']
Instrument variable(s): None
No. Observations: 1000


------------------ Score & Algorithm ------------------
Score function: partialling out

------------------ Machine Learner   ------------------
Learner ml_l: RandomForestRegressor()
Learner ml_m: RandomForestRegressor()
Out-of-sample Performance:
Regression:
Learner ml_l RMSE: [[11.82684324]]
Learner ml_m RMSE: [[1.11552911]]

------------------ Resampling        ------------------
No. folds: 5
No. repeated sample splits: 1

------------------ Fit Summary       ------------------
       coef   std err        t         P>|t|     2.5 %    97.5 %
d  4.130122  0.455448  9.06827  1.209219e-19  3.237461  5.022783

The effect estimate is biased due to the unobserved confounding and the corresponding confidence interval does not cover the true parameter theta=5.0

[5]:

print(dml_obj.confint())

      2.5 %    97.5 %
d  3.237461  5.022783

Sensitivity Analysis#

To perform a sensitivity analysis with the DoubleML package you can use the sensitivity_analysis() method. The sensitivity analysis is based on the strength of the confounding cf_y and cf_d (default values \(0.03\)) and the parameter rho, which measures the correlation between the difference of the long and short form of the outcome regression and the Riesz representer (the default value \(1.0\) is conservative and considers adversarial counfounding). To additionally incorporate statistical uncertainty, a significance level (default \(0.95\)) is used.

These input parameters are used to calculate upper and lower bounds (including the corresponding confidence level) on the treatment effect estimate.

Further, the sensitivity analysis includes a null hypothesis (default \(0.0\)), which is used to compute robustness values. The robustness value \(RV\) is defined as the required confounding strength (cf_y=rv and cf_d=rv), such that the lower or upper bound of the causal parameter includes the null hypothesis. The robustness value \(RVa\) defined analogous but additionally incorporates statistical uncertainty (as it is based on the confidence intervals of the bounds). For a more detailed explanation see the User Guide or Chernozhukov et al. (2022).

The results of the analysis can be printed via the sensitivity_summary property or directly accessed via the sensitivity_params property.

[6]:

dml_obj.sensitivity_analysis()
print(dml_obj.sensitivity_summary)

================== Sensitivity Analysis ==================

------------------ Scenario          ------------------
Significance Level: level=0.95
Sensitivity parameters: cf_y=0.03; cf_d=0.03, rho=1.0

------------------ Bounds with CI    ------------------
   CI lower  theta lower     theta  theta upper  CI upper
d  3.098308     3.832693  4.130122     4.427551  5.192526

------------------ Robustness Values ------------------
   H_0     RV (%)    RVa (%)
d  0.0  34.287815  29.784405

[7]:

dml_obj.sensitivity_params

[7]:

{'theta': {'lower': array([3.8326928]), 'upper': array([4.42755087])},
 'se': {'lower': array([0.44647451]), 'upper': array([0.46507214])},
 'ci': {'lower': array([3.09830758]), 'upper': array([5.19252647])},
 'rv': array([0.34287815]),
 'rva': array([0.29784405]),
 'input': {'cf_y': 0.03,
  'cf_d': 0.03,
  'rho': 1.0,
  'level': 0.95,
  'null_hypothesis': array([0.])}}

The robustness value \(RV\) means that unobserved counfounders would have to account for \(34\%\) of the residual variation (in treatment and outcome) to explain away the treatment effect.

We can also consider a contour plot to consider different values of confounding cf_y and cf_d. The contour plot and robustness values are based on the upper or lower bounds based on the null hypothesis (in this case this results in the lower bound).

Remark that both, the robustness values and the contour plot use the prespecified value of rho and the null_hypothesis.

[8]:

dml_obj.sensitivity_plot()

To consider a different null hypothesis and add a different scenario to the the contour plot, we can adjust the parameter null_hypothesis.

[9]:

dml_obj.sensitivity_analysis(cf_y=cf_y, cf_d=cf_d, null_hypothesis=theta)
print(dml_obj.sensitivity_summary)

================== Sensitivity Analysis ==================

------------------ Scenario          ------------------
Significance Level: level=0.95
Sensitivity parameters: cf_y=0.1; cf_d=0.1, rho=1.0

------------------ Bounds with CI    ------------------
   CI lower  theta lower     theta  theta upper  CI upper
d  2.397811     3.100858  4.130122     5.159386  5.967467

------------------ Robustness Values ------------------
   H_0    RV (%)   RVa (%)
d  5.0  8.520641  1.168195

[10]:

dml_obj.sensitivity_plot()

Application: 401(k)#

In this real-data example, we illustrate how to use the sensitivity analysis from the DoubleML package to evaluate effect estimates from 401(k) eligibility on accumulated assets. The 401(k) data set has been analyzed in several studies, among others Chernozhukov et al. (2018), see Kallus et al. (2019) for quantile effects.

Remark: This notebook focuses on sensitivity analysis. For a basic introduction to the DoubleML package and a detailed example of the average treatment effect estimation for the 401(k) data set, we refer to the notebook Python: Impact of 401(k) on Financial Wealth.

Data and Effect Estimation#

Define eligiblity as treatment and the net financial assets as outcome to construct a DoubleML object.

[11]:

data = fetch_401K(return_type='DataFrame')

# Set up basic model: Specify variables for data-backend
features_base = ['age', 'inc', 'educ', 'fsize', 'marr',
                 'twoearn', 'db', 'pira', 'hown']

# Initialize DoubleMLData (data-backend of DoubleML)
data_dml = dml.DoubleMLData(data,
                            y_col='net_tfa',
                            d_cols='e401',
                            x_cols=features_base)
print(data_dml)

================== DoubleMLData Object ==================

------------------ Data summary      ------------------
Outcome variable: net_tfa
Treatment variable(s): ['e401']
Covariates: ['age', 'inc', 'educ', 'fsize', 'marr', 'twoearn', 'db', 'pira', 'hown']
Instrument variable(s): None
No. Observations: 9915

------------------ DataFrame info    ------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9915 entries, 0 to 9914
Columns: 14 entries, nifa to hown
dtypes: float32(4), int8(10)
memory usage: 251.9 KB

Use basic random forests to estimate the nuisance elements and fit a DoubleMLPLR model.

[12]:

learner_l = RandomForestRegressor(n_estimators=500, max_depth=10,
                                  max_features=5, min_samples_leaf=10)
learner_m = RandomForestClassifier(n_estimators=500, max_depth=10,
                                   max_features=5, min_samples_leaf=10)

[13]:

np.random.seed(42)
dml_plr_obj = dml.DoubleMLPLR(data_dml,
                              ml_l = learner_l,
                              ml_m = learner_m,
                              n_folds = 5,
                              n_rep=3)
dml_plr_obj.fit()
print(dml_plr_obj)

================== DoubleMLPLR Object ==================

------------------ Data Summary      ------------------
Outcome variable: net_tfa
Treatment variable(s): ['e401']
Covariates: ['age', 'inc', 'educ', 'fsize', 'marr', 'twoearn', 'db', 'pira', 'hown']
Instrument variable(s): None
No. Observations: 9915


------------------ Score & Algorithm ------------------
Score function: partialling out

------------------ Machine Learner   ------------------
Learner ml_l: RandomForestRegressor(max_depth=10, max_features=5, min_samples_leaf=10,
                      n_estimators=500)
Learner ml_m: RandomForestClassifier(max_depth=10, max_features=5, min_samples_leaf=10,
                       n_estimators=500)
Out-of-sample Performance:
Regression:
Learner ml_l RMSE: [[54163.99232145]
 [54716.48029755]
 [54378.6810775 ]]
Classification:
Learner ml_m Log Loss: [[0.57643609]
 [0.57585824]
 [0.57775704]]

------------------ Resampling        ------------------
No. folds: 5
No. repeated sample splits: 3

------------------ Fit Summary       ------------------
             coef      std err         t         P>|t|        2.5 %  \
e401  8843.744236  1328.431437  6.657283  2.789355e-11  6240.066464

            97.5 %
e401  11447.422007

Sensitivity Analysis#

In their paper Chernozhukov et al. (2022) argue that confounding should not account for more than \(4\%\) of the residual variation of the outcome and \(3\%\) of the residual variation of the treatment. Consequently, we set cf_y=0.04 and cf_d=0.03.

[14]:

dml_plr_obj.sensitivity_analysis(cf_y=0.04, cf_d=0.03)
print(dml_plr_obj.sensitivity_summary)

================== Sensitivity Analysis ==================

------------------ Scenario          ------------------
Significance Level: level=0.95
Sensitivity parameters: cf_y=0.04; cf_d=0.03, rho=1.0

------------------ Bounds with CI    ------------------
         CI lower  theta lower        theta   theta upper     CI upper
e401  2308.340274  4567.712503  8843.744236  13119.775969  15354.93458

------------------ Robustness Values ------------------
      H_0    RV (%)   RVa (%)
e401  0.0  7.014681  5.205224

Our robustness values are similar to the ones in the paper. The effect estimate is robust in the sense that unobserved confounders (e.g. latent firm characteristics) would have to account for at least \(7,1\%\) of the residual variation (in the outcome and treatment) to reduce the lower bound on the effect to zero. Including estimation uncertainty unobserved confounders still have to explain \(5.3\%\) of the residual variation to render the effect estimate insignificant.

[15]:

dml_plr_obj.sensitivity_plot()

Benchmarking Analysis#

Usually, it is quite hard to argue which strength of possible confounding cf_y and cf_d is reasonable. A common approach is to use observed confounders to get an informed guess on the strength of possible confounding.

This is implemented via the sensitivity_benchmark() method. For a more detailed description see User Guide or Chernozhukov et al. (2022) Appendix D.

To benchmark the confounding values, one has to specify a set of observed confounders as benchmarking_set. The methods then returns a dataframe with the benchmarked values for cf_y,cf_d,rho and the change in the estimates if the benchmarking_set would be omitted from the set of confounders. To consider the confounding strength of multiple variables one has to repeat the benchmarking procedure for each variable separately (this might take some time as a seperate benchmark model has to be fitted each repetion).

In this example, we will try to consider the same benchmarks as in Chernozhukov et al. (2022) Appendix D. Therefore, we will take a look on the following variables:

income inc
participation individual retirement account pira
two-earner household twoearn

[16]:

benchmark_inc = dml_plr_obj.sensitivity_benchmark(benchmarking_set=["inc"])
benchmark_pira = dml_plr_obj.sensitivity_benchmark(benchmarking_set=["pira"])
benchmark_twoearn = dml_plr_obj.sensitivity_benchmark(benchmarking_set=["twoearn"])

[17]:

print(benchmark_inc)
print(benchmark_pira)
print(benchmark_twoearn)

         cf_y      cf_d       rho  delta_theta
e401  0.13956  0.046728  0.358395  3696.252253
         cf_y      cf_d       rho  delta_theta
e401  0.05039  0.002779  0.214764    338.70583
          cf_y      cf_d       rho  delta_theta
e401  0.011823  0.000743 -0.804284  -278.905951

We can add the scenario to the contour plot via a dictionary (remark that rho will be still based on the original value from sensitivity_params and can be manually adjusted by calling sensitivity_analysis() again).

[18]:

benchmark_dict = {"cf_y" : [benchmark_inc.loc["e401", "cf_y"], benchmark_pira.loc["e401", "cf_y"], benchmark_twoearn.loc["e401", "cf_y"]],
                  "cf_d" : [benchmark_inc.loc["e401", "cf_d"], benchmark_pira.loc["e401", "cf_d"], benchmark_twoearn.loc["e401", "cf_d"]],
                  "name" : ["inc", "pira", "twoearn"]}
dml_plr_obj.sensitivity_plot(benchmarks=benchmark_dict)

Sensitivity Analysis with IRM#

The sensitivity analysis with the IRM model is basically analogous.

[19]:

np.random.seed(42)
dml_irm_obj = dml.DoubleMLIRM(data_dml,
                              ml_g = learner_l,
                              ml_m = learner_m,
                              n_folds = 5,
                              n_rep = 3)
dml_irm_obj.fit()

print(dml_irm_obj)

================== DoubleMLIRM Object ==================

------------------ Data Summary      ------------------
Outcome variable: net_tfa
Treatment variable(s): ['e401']
Covariates: ['age', 'inc', 'educ', 'fsize', 'marr', 'twoearn', 'db', 'pira', 'hown']
Instrument variable(s): None
No. Observations: 9915


------------------ Score & Algorithm ------------------
Score function: ATE

------------------ Machine Learner   ------------------
Learner ml_g: RandomForestRegressor(max_depth=10, max_features=5, min_samples_leaf=10,
                      n_estimators=500)
Learner ml_m: RandomForestClassifier(max_depth=10, max_features=5, min_samples_leaf=10,
                       n_estimators=500)
Out-of-sample Performance:
Regression:
Learner ml_g0 RMSE: [[48296.97470872]
 [48331.30093956]
 [48315.49650883]]
Learner ml_g1 RMSE: [[64723.28425026]
 [64340.98393441]
 [64797.07479278]]
Classification:
Learner ml_m Log Loss: [[0.57599221]
 [0.57572422]
 [0.5776971 ]]

------------------ Resampling        ------------------
No. folds: 5
No. repeated sample splits: 3

------------------ Fit Summary       ------------------
             coef      std err        t         P>|t|        2.5 %  \
e401  7710.567004  1187.889566  6.49098  8.528000e-11  5382.346238

           97.5 %
e401  10038.78777

[20]:

dml_irm_obj.sensitivity_analysis(cf_y=0.04, cf_d=0.03)
print(dml_irm_obj.sensitivity_summary)

================== Sensitivity Analysis ==================

------------------ Scenario          ------------------
Significance Level: level=0.95
Sensitivity parameters: cf_y=0.04; cf_d=0.03, rho=1.0

------------------ Bounds with CI    ------------------
         CI lower  theta lower        theta   theta upper     CI upper
e401  1306.608818   3593.56223  7710.567004  11935.392833  14114.64269

------------------ Robustness Values ------------------
      H_0    RV (%)   RVa (%)
e401  0.0  6.240295  4.339269

[21]:

dml_irm_obj.sensitivity_plot()

[22]:

benchmark_inc = dml_irm_obj.sensitivity_benchmark(benchmarking_set=["inc"])
benchmark_pira = dml_irm_obj.sensitivity_benchmark(benchmarking_set=["pira"])
benchmark_twoearn = dml_irm_obj.sensitivity_benchmark(benchmarking_set=["twoearn"])

[23]:

print(benchmark_inc)
print(benchmark_pira)
print(benchmark_twoearn)

          cf_y  cf_d  rho  delta_theta
e401  0.110681   0.0  1.0  4093.849747
          cf_y      cf_d      rho  delta_theta
e401  0.055165  0.010213  0.07961   205.322404
          cf_y      cf_d       rho  delta_theta
e401  0.013593  0.127563 -0.032953   -68.421793

[24]:

benchmark_dict = {"cf_y" : [benchmark_inc.loc["e401", "cf_y"], benchmark_pira.loc["e401", "cf_y"], benchmark_twoearn.loc["e401", "cf_y"]],
                  "cf_d" : [benchmark_inc.loc["e401", "cf_d"], benchmark_pira.loc["e401", "cf_d"], benchmark_twoearn.loc["e401", "cf_d"]],
                  "name" : ["inc", "pira", "twoearn"]}
dml_irm_obj.sensitivity_plot(benchmarks=benchmark_dict)