# 10. Sensitivity analysis#

The DoubleML package implements sensitivity analysis with respect to omitted variable bias based on Chernozhukov et al. (2022).

## 10.1. General algorithm#

The section Theory contains a general summary and the relevant defintions, whereas Implementation considers the general part of the implementation.

### 10.1.1. Theory#

Assume that we can write the model in the following representation

where usually \(g_0(W) = \mathbb{E}[Y|X, D]\) (currently, the sensitivity analysis is only available for linear models). As long as \(\mathbb{E}[m(W,f)]\) is a continuous linear functional of \(f\), there exists a unique square integrable random variable \(\alpha_0(W)\), called Riesz representer (see Riesz-Fréchet representation theorem), such that

The target parameter \(\theta_0\) has the following representation

which corresponds to a Neyman orthogonal score function (orthogonal with respect to nuisance elements \((g, \alpha)\)). To bound the omitted variable bias, the following further elements are needed. The variance of the outcome regression

and the second moment of the Riesz representer

Both representations are Neyman orthogonal with respect to \(g\) and \(\alpha\), respectively. Further, define the corresponding score functions

Recall that the parameter \(\theta_0\) is identified via the moment condition

If \(W=(Y, D, X)\) does not include all confounding variables, the “true” target parameter \(\tilde{\theta}_0\) would only be identified via the extendend (or “long”) form

where \(\tilde{W}=(Y, D, X, A)\) includes the unobserved counfounders \(A\). In Theorem 2 of their paper Chernozhukov et al. (2022) are able to bound the omitted variable bias

where

denotes the product of additional variations in the outcome regression and Riesz representer generated by omitted confounders and

denotes the correlations between the deviations generated by omitted confounders. The choice \(\rho=1\) is conservative and accounts for adversarial confounding. Further, the bound can be expressed as

where

As \(\sigma_0^2\) and \(\nu_0^2\) do not depend on the unobserved confounders \(A\) they are identified. Further, the other parts have the following interpretations

`cf_y`

\(:=\frac{\mathbb{E}[(\tilde{g}(\tilde{W}) - g(W))^2]}{\mathbb{E}[(Y - g(W))^2]}\) measures the proportion of residual variance in the outcome \(Y\) explained by the latent confounders \(A\)`cf_d`

\(:=1 - \frac{\mathbb{E}\big[\alpha(W)^2\big]}{\mathbb{E}\big[\tilde{\alpha}(\tilde{W})^2\big]}\) measures the proportion of residual variance in the Riesz representer \(\tilde{\alpha}(\tilde{W})\) generated by the latent confounders \(A\)

Note

`cf_y`

has the interpretation as the*nonparametric partial*\(R^2\)*of*\(A\)*with*\(Y\)*given*\((D,X)\)

For model-specific interpretations of

`cf_d`

or \(C_D^2\), see the corresponding chapters (e.g. Partially linear regression model (PLR)).

Consequently, for given values `cf_y`

and `cf_d`

, we can create lower and upper bounds for target parameter \(\tilde{\theta}_0\) of the form

Let \(\psi(W,\theta,\eta)\) the (correctly scaled) score function for the target parameter \(\theta_0\). Then

determines a orthongonal score function for \(\theta_{\pm}\), with nuisance elements \(\eta_\pm:=(g, \alpha, \sigma, \nu)\). The score can be used to calculate the standard deviations of \(\theta_{\pm}\) via

For more detail and interpretations see Chernozhukov et al. (2022).

### 10.1.2. Implementation#

The Partially linear regression model (PLR) will be used as an example

```
In [1]: import numpy as np
In [2]: import doubleml as dml
In [3]: from doubleml.datasets import make_plr_CCDDHNR2018
In [4]: from sklearn.ensemble import RandomForestRegressor
In [5]: from sklearn.base import clone
In [6]: learner = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
In [7]: ml_l = clone(learner)
In [8]: ml_m = clone(learner)
In [9]: np.random.seed(1111)
In [10]: data = make_plr_CCDDHNR2018(alpha=0.5, n_obs=500, dim_x=20, return_type='DataFrame')
In [11]: obj_dml_data = dml.DoubleMLData(data, 'y', 'd')
In [12]: dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_l, ml_m)
```

If the sensitivity analysis is implemented (see Model-specific implementations), the corresponding sensitivity elements are estimated
automatically by calling the `fit()`

method. In most cases these elements are based on the following plug-in estimators

where \(\hat{g}(W)\) and \(\hat{\alpha}(W)\) denote the cross-fitted predictions of the outcome regression and the Riesz representer (both are model specific, see Model-specific implementations). Further, the corresponding scores are defined as

After the `fit()`

call, the sensitivity elements are stored in a dictionary and can be accessed via the `sensitivity_elements`

property.

```
In [13]: dml_plr_obj.fit()
Out[13]: <doubleml.plm.plr.DoubleMLPLR at 0x7fe3162a3680>
In [14]: dml_plr_obj.sensitivity_elements.keys()
Out[14]: dict_keys(['sigma2', 'nu2', 'psi_sigma2', 'psi_nu2', 'riesz_rep'])
```

Each value is a \(3\)-dimensional array, with the variances being of form `(1, n_rep, n_coefs)`

and the scores of form `(n_obs, n_rep, n_coefs)`

.
The `sensitivity_analysis()`

method then computes the upper and lower bounds for the estimate, based on the sensitivity parameters
`cf_y`

, `cf_d`

and `rho`

(default is `rho=1.0`

to account for adversarial confounding). Additionally, one-sided confidence bounds are computed
based on a supplied significance level (default `level=0.95`

).
The results are summarized as a formatted string in the `sensitivity_summary`

```
In [15]: dml_plr_obj.sensitivity_analysis(cf_y=0.03, cf_d=0.03, rho=1.0, level=0.95)
Out[15]: <doubleml.plm.plr.DoubleMLPLR at 0x7fe3162a3680>
In [16]: print(dml_plr_obj.sensitivity_summary)
================== Sensitivity Analysis ==================
------------------ Scenario ------------------
Significance Level: level=0.95
Sensitivity parameters: cf_y=0.03; cf_d=0.03, rho=1.0
------------------ Bounds with CI ------------------
CI lower theta lower theta theta upper CI upper
0 0.408476 0.482461 0.512672 0.542883 0.616698
------------------ Robustness Values ------------------
H_0 RV (%) RVa (%)
0 0.0 40.029364 35.077502
```

or can be directly accessed via the `sensitivity_params`

property.

```
In [17]: dml_plr_obj.sensitivity_params
Out[17]:
{'theta': {'lower': array([0.48246134]), 'upper': array([0.5428834])},
'se': {'lower': array([0.04497975]), 'upper': array([0.04487585])},
'ci': {'lower': array([0.40847623]), 'upper': array([0.61669761])},
'rv': array([0.40029364]),
'rva': array([0.35077502]),
'input': {'cf_y': 0.03,
'cf_d': 0.03,
'rho': 1.0,
'level': 0.95,
'null_hypothesis': array([0.])}}
```

The bounds are saved as a nested dictionary, where the keys `'theta'`

denote the bounds on the parameter \(\hat{\theta}_{\pm}\), `'se'`

denotes the corresponding standard error and `'ci'`

denotes the lower and upper
confidence bounds for \(\hat{\theta}_{\pm}\). Each of the keys refers to a dictionary with keys `'lower'`

and `'upper'`

which refer to the lower or upper bound, e.g. `sensitivity_params['theta']['lower']`

refers to the lower bound \(\hat{\theta}_{-}\) of the estimated cofficient .

Further, the sensitivity analysis has an input parameter `theta`

(with default `theta=0.0`

), which refers to the null hypothesis used for each coefficient.
This null hypothesis is used to calculate the robustness values as displayed in the `sensitivity_params`

.

The robustness value $RV$ is defined as the required confounding strength (`cf_y=rv`

and `cf_d=rv`

), such that the lower or upper bound of the causal parameter includes the null hypothesis.
If the estimated parameter \(\hat{\theta}\) is larger than the null hypothesis the lower bound is used and vice versa.
The robustness value $RVa$ defined analogous, but additionally incorporates statistical uncertainty (as it is based on the confidence intervals of the bounds).

To obtain a more complete overview over the sensitivity one can call the `sensitivity_plot()`

method. The methods creates a contour plot, which calculates estimate of the upper or lower bound for \(\theta\)
(based on the null hypothesis) for each combination of `cf_y`

and `cf_d`

in a grid of values.

By adjusting the parameter `value='ci'`

in the `sensitivity_plot()`

method the bounds are displayed for the corresponding confidence level.

Note

The

`sensitivity_plot()`

requires to call`sensitivity_analysis`

first, since the choice of the bound (upper or lower) is based on the corresponding null hypothesis. Further, the parameters`rho`

and`level`

are used. Both are contained in the`sensitivity_params`

property.The

`sensitivity_plot()`

is created for the first treatment variable. This can be changed via the`idx_treatment`

parameter.The robustness values are given via the intersection countour of the null hypothesis and the identity.

### 10.1.3. Benchmarking#

The input parameters for the sensitivity analysis are quite hard to interpret (depending on the model). Consequently it is challenging to come up with reasonable bounds
for the confounding strength `cf_y`

and `cf_d`

(and `rho`

). To get a grasp on the magnitude of the bounds a popular approach is to rely on observed confounders
to obtain an informed guess on the strength of possibly unobserved confounders.

The underlying principle is relatively simple. If we have an observed confounder \(X_1\), we are able to emulate omitted confounding by purposely omitting
\(X_1\) and refitting the whole model. This enables us to compare the “long” and “short” form with and without omitted confounding.
Considering the `sensitivity_params`

of both models one can estimate the corresponding strength of confounding `cf_y`

and `cf_d`

(and `rho`

).

Note

The benchmarking can also be done with a set of benchmarking variables (e.g. \(X_1, X_2, X_3\)), which tries to emulate the effect of multiple unobserved confounders.

The approach is quite computationally demanding, as the short model that omits the benchmark variables has to be fitted.

The `sensitivity_benchmark()`

method implements this approach.
The method just requires a set of valid covariates, the `benchmarking_set`

, to compute the benchmark. The benchmark variables have to be a subset of the covariates used in the main analysis.

```
In [1]: dml_plr_obj.sensitivity_benchmark(benchmarking_set=["X1"])
Out[1]:
cf_y cf_d rho delta_theta
d 0.024364 0.455981 1.0 0.091406
```

The method returns a `pandas.DataFrame`

, containing the benchmarked values for `cf_y`

, `cf_d`

, `rho`

and the change in the estimates
`delta_theta`

.

Note

The benchmarking results should be used to get an idea of the magnitude/validity of proposed confounding strength of the omitted confounders. Whether these values are close to the real confounding, depends entirely on the setting and choice of the benchmarking variables. A good benchmarking set has a strong justification which refers to the omitted confounders.

If the benchmarking variables are only weak confounders, the estimates of

`rho`

can be slightly unstable (due to small denominators).

The implementation is based on Chernozhukov et al. (2022) Appendix D and corresponds to a generalization of the benchmarking process in the Sensemakr package for regression models to the use with double machine learning. For an introduction to Sensemakr see Cinelli and Hazlett (2020) and the Sensemakr introduction.

The benchmarked estimates are the following:

Let the subscript \(short\), denote the “short” form of the model, where the benchmarking variables are omitted.

\(\hat{\sigma}^2_{short}\) denotes the variance of the outcome regression in the “short” form.

\(\hat{\nu}^2_{short}\) denotes the second moment of the Riesz representer in the “short” form.

Both parameters are contained in the `sensitivity_params`

of the “short” form.
This enables the following estimation of the nonparametric \(R^2\)’s of the outcome regression

\(\hat{R}^2:= 1 - \frac{\hat{\sigma}^2}{\textrm{Var}(Y)}\)

\(\hat{R}^2_{short}:= 1 - \frac{\hat{\sigma}^2_{short}}{\textrm{Var}(Y)}\)

and the correlation ratio of the estimated Riesz representations

The benchmarked estimates are then defined as

`cf_y`

\(:=\frac{\hat{R}^2 - \hat{R}^2_{short}}{1 - \hat{R}^2}\) measures the proportion of residual variance in the outcome \(Y\) explained by adding the purposely omitted`benchmarking_set`

`cf_d`

\(:=\frac{1 - \hat{R}^2_{\alpha}}{\hat{R}^2_{\alpha}}\) measures the proportional gain in variation that the`benchmarking_set`

creates in the Riesz representer

Further, the degree of adversity \(\rho\) can be estimated via

For a more detailed description, see Chernozhukov et al. (2022) Appendix D.

Note

As benchmarking requires the estimation of a seperate model, the use with external predictions is generally not possible, without supplying further predictions.

## 10.2. Model-specific implementations#

This section contains the implementation details for each specific model and model specific interpretations.

### 10.2.1. Partially linear regression model (PLR)#

In the Partially linear regression model (PLR) the confounding strength `cf_d`

can be further be simplified to match the explanation of `cf_y`

.
Given the that the Riesz representer takes the following form

one can show that

Therefore,

`cf_y`

\(:=\frac{\mathbb{E}[(\tilde{g}(\tilde{W}) - g(W))^2]}{\mathbb{E}[(Y - g(W))^2]}\) measures the proportion of residual variance in the outcome \(Y\) explained by the latent confounders \(A\)`cf_d`

\(:=\frac{\mathbb{E}\big[\big(\mathbb{E}[D|X,A] - \mathbb{E}[D|X]\big)^2\big]}{\mathbb{E}\big[\big(D - \mathbb{E}[D|X]\big)^2\big]}\) measures the proportion of residual variance in the treatment \(D\) explained by the latent confounders \(A\)

Note

In the Partially linear regression model (PLR), both `cf_y`

and `cf_d`

can be interpreted as *nonparametric partial* \(R^2\)

`cf_y`

has the interpretation as the*nonparametric partial*\(R^2\)*of*\(A\)*with*\(Y\)*given*\((D,X)\)

`cf_d`

has the interpretation as the*nonparametric partial*\(R^2\)*of*\(A\)*with*\(D\)*given*\(X\)

Using the partially linear regression model with `score='partialling out'`

the `nuisance_elements`

are implemented in the following form

with scores

If `score='IV-type'`

the senstivity elements are instead set to

### 10.2.2. Interactive regression model (IRM)#

In the Binary Interactive Regression Model (IRM) the target parameter can be written as

where \(\omega(Y,D,X)\) are weights (e.g. set to \(1\) for the ATE). This implies the following representations

Note

In the Binary Interactive Regression Model (IRM) with for the ATE (weights equal to \(1\)), the form and interpretation of `cf_y`

is the same as in the Partially linear regression model (PLR).

`cf_y`

has the interpretation as the*nonparametric partial*\(R^2\)*of*\(A\)*with*\(Y\)*given*\((D,X)\)

`cf_d`

takes the following form

where the numerator measures the *gain in average conditional precision to predict* \(D\) *by using* \(A\) *in addition to* \(X\).
The denominator is the *average conditional precision to predict* \(D\) *by using* \(A\) *and* \(X\). Consequently `cf_d`

measures the *relative gain in average conditional precision*.

Remark that \(P(D=1|X,A)(1-P(D=1|X,A))\) denotes the variance of the conditional distribution of \(D\) given \((X,A)\), such that the inverse measures the precision of predicting \(D\) conditional on \((X,A)\).

Since \(C_D^2=\frac{cf_d}{1 - cf_d}\), this corresponds to

which has the same numerator but is instead relative to the *average conditional precision to predict* \(D\) *by using only* \(X\).

Including weights changes only the definition of `cf_d`

to

which has a interpretation as the *relative weighted gain in average conditional precision*.

The `nuisance_elements`

are then computed with plug-in versions according to the general Implementation.
For `score='ATE'`

, the weights are set to one

wheras for `score='ATTE'`

such that

### 10.2.3. Average Potential Outcomes (APOs)#

In the Binary Interactive Regression Model (IRM) the (weighted) average potential outcome for the treatment level \(d\) can be written as

where \(\omega(Y,D,X)\) are weights (e.g. set to \(1\) for the APO). This implies the following representations

Note

In the Binary Interactive Regression Model (IRM) the form and interpretation of `cf_y`

only depends on the conditional expectation \(\mathbb{E}[Y|D,X]\).

`cf_y`

has the interpretation as the*nonparametric partial*\(R^2\)*of*\(A\)*with*\(Y\)*given*\((D,X)\)

`cf_d`

takes the following form

where the numerator measures the *average change in inverse propensity weights for* \(D=d\) *conditional on* \(A\) *in addition to* \(X\).
The denominator is the *average inverse propensity weights for* \(D=d\) *conditional on* \(A\) *and* \(X\). Consequently `cf_d`

measures the *relative change in inverse propensity weights*.
Including weights changes only the definition of `cf_d`

to

which has a interpretation as the *relative weighted change in inverse propensity weights*.

The `nuisance_elements`

are then computed with plug-in versions according to the general Implementation.
The default weights are set to one

whereas

have to be supplied for weights which depend on \(Y\) or \(D\).

### 10.2.4. Difference-in-Differences for Panel Data#

In the Panel data with `score='observational'`

and `in_sample_normalization=True`

the score function implies the following representations

If instead `in_sample_normalization=False`

, the Riesz representer changes to

For `score='experimental'`

implies the score function implies the following representations

The `nuisance_elements`

are then computed with plug-in versions according to the general Implementation.

### 10.2.5. Difference-in-Differences for repeated cross-sections#

In the Repeated cross-sections with `score='observational'`

and `in_sample_normalization=True`

the score function implies the following representations

If instead `in_sample_normalization=False`

, the Riesz representer (after simplifications) changes to

For `score='experimental'`

and `in_sample_normalization=True`

implies the score function implies the following representations

And again, if instead `in_sample_normalization=False`

, the Riesz representer (after simplifications) changes to

The `nuisance_elements`

are then computed with plug-in versions according to the general Implementation.