# 8. Variance estimation and confidence intervals#

## 8.1. Variance estimation#

Under regularity conditions the estimator $$\tilde{\theta}_0$$ concentrates in a $$1/\sqrt{N}$$-neighborhood of $$\theta_0$$ and the sampling error $$\sqrt{N}(\tilde{\theta}_0 - \theta_0)$$ is approximately normal

$\sqrt{N}(\tilde{\theta}_0 - \theta_0) \leadsto N(o, \sigma^2),$

with mean zero and variance given by

$\sigma^2 := J_0^{-2} \mathbb{E}(\psi^2(W; \theta_0, \eta_0)),$

where $$J_0 = \mathbb{E}(\psi_a(W; \eta_0))$$, if the score function is linear in the parameter $$\theta$$. If the score is not linear in the parameter $$\theta$$, then $$J_0 = \partial_\theta\mathbb{E}(\psi(W; \theta, \eta_0)) \big|_{\theta=\theta_0}$$.

Estimates of the variance are obtained by

\begin{align}\begin{aligned}\hat{\sigma}^2 &= \hat{J}_0^{-2} \frac{1}{N} \sum_{k=1}^{K} \sum_{i \in I_k} \big[\psi(W_i; \tilde{\theta}_0, \hat{\eta}_{0,k})\big]^2,\\\hat{J}_0 &= \frac{1}{N} \sum_{k=1}^{K} \sum_{i \in I_k} \psi_a(W_i; \hat{\eta}_{0,k}),\end{aligned}\end{align}

for score functions being linear in the parameter $$\theta$$. For non-linear score functions, the implementation assumes that derivatives and expectations are interchangeable, so that

$\hat{J}_0 = \frac{1}{N} \sum_{k=1}^{K} \sum_{i \in I_k} \partial_\theta \psi(W_i; \tilde{\theta}_0, \hat{\eta}_{0,k}).$

An approximate confidence interval is given by

$\big[\tilde{\theta}_0 \pm \Phi^{-1}(1 - \alpha/2) \hat{\sigma} / \sqrt{N}].$

As an example we consider a partially linear regression model (PLR) implemented in DoubleMLPLR.

In : import doubleml as dml

In : from doubleml.datasets import make_plr_CCDDHNR2018

In : from sklearn.ensemble import RandomForestRegressor

In : from sklearn.base import clone

In : np.random.seed(3141)

In : learner = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)

In : ml_l = clone(learner)

In : ml_m = clone(learner)

In : data = make_plr_CCDDHNR2018(alpha=0.5, return_type='DataFrame')

In : obj_dml_data = dml.DoubleMLData(data, 'y', 'd')

In : dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_l, ml_m)

In : dml_plr_obj.fit();

library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)
lgr::get_logger("mlr3")$set_threshold("warn") learner = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5) ml_l = learner$clone()
ml_m = learner$clone() set.seed(3141) obj_dml_data = make_plr_CCDDHNR2018(alpha=0.5) dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l, ml_m)
dml_plr_obj$fit()  The fit() method of DoubleMLPLR stores the estimate $$\tilde{\theta}_0$$ in its coef attribute. In : print(dml_plr_obj.coef) [0.48011605]  print(dml_plr_obj$coef)

        d
0.5443965


The asymptotic standard error $$\hat{\sigma}/\sqrt{N}$$ is stored in its se attribute.

In : print(dml_plr_obj.se)
[0.04051489]

print(dml_plr_obj$se)   d 0.04512331  Additionally, the value of the $$t$$-statistic and the corresponding p-value are provided in the attributes t_stat and pval. In : print(dml_plr_obj.t_stat) [11.85036084] In : print(dml_plr_obj.pval) [2.14266176e-32]  print(dml_plr_obj$t_stat)
print(dml_plr_obj$pval)   d 12.06464   d 1.623681e-33  Note • In Python, an overview of all these estimates, together with a 95 % confidence interval is stored in the attribute summary. • In R, a summary can be obtained by using the method summary(). The confint() method performs estimation of confidence intervals. In : print(dml_plr_obj.summary) coef std err t P>|t| 2.5 % 97.5 % d 0.480116 0.040515 11.850361 2.142662e-32 0.400708 0.559524  dml_plr_obj$summary()
dml_plr_objconfint()  Estimates and significance testing of the effect of target variables Estimate. Std. Error t value Pr(>|t|) d 0.54440 0.04512 12.06 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1  A matrix: 1 × 2 of type dbl 2.5 %97.5 % d0.45595650.6328366 A more detailed overview of the fitted model, its specifications and the summary can be obtained via the string-representation of the object. In : print(dml_plr_obj) ================== DoubleMLPLR Object ================== ------------------ Data summary ------------------ Outcome variable: y Treatment variable(s): ['d'] Covariates: ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7', 'X8', 'X9', 'X10', 'X11', 'X12', 'X13', 'X14', 'X15', 'X16', 'X17', 'X18', 'X19', 'X20'] Instrument variable(s): None No. Observations: 500 ------------------ Score & algorithm ------------------ Score function: partialling out DML algorithm: dml2 ------------------ Machine learner ------------------ Learner ml_l: RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2) Learner ml_m: RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2) Out-of-sample Performance: Learner ml_l RMSE: [[1.12659372]] Learner ml_m RMSE: [[1.04007791]] ------------------ Resampling ------------------ No. folds: 5 No. repeated sample splits: 1 Apply cross-fitting: True ------------------ Fit summary ------------------ coef std err t P>|t| 2.5 % 97.5 % d 0.480116 0.040515 11.850361 2.142662e-32 0.400708 0.559524  print(dml_plr_obj)  ================= DoubleMLPLR Object ================== ------------------ Data summary ------------------ Outcome variable: y Treatment variable(s): d Covariates: X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X18, X19, X20 Instrument(s): No. Observations: 500 ------------------ Score & algorithm ------------------ Score function: partialling out DML algorithm: dml2 ------------------ Machine learner ------------------ ml_l: regr.ranger ml_m: regr.ranger ------------------ Resampling ------------------ No. folds: 5 No. repeated sample splits: 1 Apply cross-fitting: TRUE ------------------ Fit summary ------------------ Estimates and significance testing of the effect of target variables Estimate. Std. Error t value Pr(>|t|) d 0.54440 0.04512 12.06 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1  ## 8.2. Confidence bands and multiplier bootstrap for valid simultaneous inference# DoubleML provides methods to perform valid simultaneous inference for multiple treatment variables. As an example, consider a PLR with $$p_1$$ causal parameters of interest $$\theta_{0,1}, \ldots, \theta_{0,p_1}$$ associated with treatment variables $$D_1, \ldots, D_{p_1}$$. Inference on multiple target coefficients can be performed by iteratively applying the DML inference procedure over the target variables of interests: Each of the coefficients of interest, $$\theta_{0,j}$$, with $$j \in \lbrace 1, \ldots, p_1 \rbrace$$, solves a corresponding moment condition $\mathbb{E}[ \psi_j(W; \theta_{0,j}, \eta_{0,j})] = 0.$ Analogously to the case with a single parameter of interest, the PLR model with multiple treatment variables includes two regression steps to achieve orthogonality. First, the main regression is given by $Y = D_j \theta_{0,j} + g_{0,j}([D_k, X]) + \zeta_j, \quad \mathbb{E}(\zeta_j | D, X) = 0,$ with $$[D_k, X]$$ being a matrix comprising the confounders, $$X$$, and all remaining treatment variables $$D_k$$ with $$k \in \lbrace 1, \ldots, p_1\rbrace \setminus j$$, by default. Second, the relationship between the treatment variable $$D_j$$ and the remaining explanatory variables is determined by the equation $D_j = m_{0,j}([D_k, X]) + V_j, \quad \mathbb{E}(V_j | D_k, X) = 0,$ For further details, we refer to Belloni et al. (2018). Simultaneous inference can be based on a multiplier bootstrap procedure introduced in Chernozhukov et al. (2013, 2014). Alternatively, traditional correction approaches, for example the Bonferroni correction, can be used to adjust p-values. The bootstrap() method provides an implementation of a multiplier bootstrap for double machine learning models. For $$b=1, \ldots, B$$ weights $$\xi_{i, b}$$ are generated according to a normal (Gaussian) bootstrap, wild bootstrap or exponential bootstrap. The number of bootstrap samples is provided as input n_rep_boot and for method one can choose 'Bayes', 'normal' or 'wild'. Based on the estimates of the standard errors $$\hat{\sigma}_j$$ and $$\hat{J}_{0,j} = \mathbb{E}_N(\psi_{a,j}(W; \eta_{0,j}))$$ that are obtained from DML, we construct bootstrap coefficients $$\theta^{*,b}_j$$ and bootstrap t-statistics $$t^{*,b}_j$$ for $$j=1, \ldots, p_1$$ \begin{align}\begin{aligned}\theta^{*,b}_{j} &= \frac{1}{\sqrt{N} \hat{J}_{0,j}}\sum_{k=1}^{K} \sum_{i \in I_k} \xi_{i}^b \cdot \psi_j(W_i; \tilde{\theta}_{0,j}, \hat{\eta}_{0,j;k}),\\t^{*,b}_{j} &= \frac{1}{\sqrt{N} \hat{J}_{0,j} \hat{\sigma}_{j}} \sum_{k=1}^{K} \sum_{i \in I_k} \xi_{i}^b \cdot \psi_j(W_i; \tilde{\theta}_{0,j}, \hat{\eta}_{0,j;k}).\end{aligned}\end{align} The output of the multiplier bootstrap can be used to determine the constant, $$c_{1-\alpha}$$ that is required for the construction of a simultaneous $$(1-\alpha)$$ confidence band $\left[\tilde\theta_{0,j} \pm c_{1-\alpha} \cdot \hat\sigma_j/\sqrt{N} \right].$ To demonstrate the bootstrap, we simulate data from a sparse partially linear regression model. Then we estimate the PLR model and perform the multiplier bootstrap. Joint confidence intervals based on the multiplier bootstrap are then obtained by setting the option joint when calling the method confint. Moreover, a multiple hypotheses testing adjustment of p-values from a high-dimensional model can be obtained with the method p_adjust. DoubleML performs a version of the Romano-Wolf stepdown adjustment, which is based on the multiplier bootstrap, by default. Alternatively, p_adjust allows users to apply traditional corrections via the option method. In : import doubleml as dml In : import numpy as np In : from sklearn.base import clone In : from sklearn.linear_model import LassoCV # Simulate data In : np.random.seed(1234) In : n_obs = 500 In : n_vars = 100 In : X = np.random.normal(size=(n_obs, n_vars)) In : theta = np.array([3., 3., 3.]) In : y = np.dot(X[:, :3], theta) + np.random.standard_normal(size=(n_obs,)) In : dml_data = dml.DoubleMLData.from_arrays(X[:, 10:], y, X[:, :10]) In : learner = LassoCV() In : ml_l = clone(learner) In : ml_m = clone(learner) In : dml_plr = dml.DoubleMLPLR(dml_data, ml_l, ml_m) In : print(dml_plr.fit().bootstrap().confint(joint=True)) 2.5 % 97.5 % d1 2.813342 3.055680 d2 2.815224 3.083258 d3 2.860663 3.109069 d4 -0.141546 0.091391 d5 -0.060845 0.176929 d6 -0.158697 0.078474 d7 -0.172022 0.062964 d8 -0.067721 0.174499 d9 -0.092365 0.139491 d10 -0.110717 0.138698 In : print(dml_plr.p_adjust()) coef pval d1 2.934511 0.000 d2 2.949241 0.000 d3 2.984866 0.000 d4 -0.025077 0.850 d5 0.058042 0.492 d6 -0.040112 0.850 d7 -0.054529 0.850 d8 0.053389 0.514 d9 0.023563 0.802 d10 0.013990 0.850 In : print(dml_plr.p_adjust(method='bonferroni')) coef pval d1 2.934511 0.0 d2 2.949241 0.0 d3 2.984866 0.0 d4 -0.025077 1.0 d5 0.058042 1.0 d6 -0.040112 1.0 d7 -0.054529 1.0 d8 0.053389 1.0 d9 0.023563 1.0 d10 0.013990 1.0  library(DoubleML) library(mlr3) library(mlr3learners) library(data.table) lgr::get_logger("mlr3")set_threshold("warn")

set.seed(3141)
n_obs = 500
n_vars = 100
theta = rep(3, 3)
X = matrix(stats::rnorm(n_obs * n_vars), nrow = n_obs, ncol = n_vars)
y = X[, 1:3, drop = FALSE] %*% theta  + stats::rnorm(n_obs)
dml_data = double_ml_data_from_matrix(X = X[, 11:n_vars], y = y, d = X[,1:10])

learner = lrn("regr.cv_glmnet", s="lambda.min")
ml_l = learner$clone() ml_m = learner$clone()
dml_plr = DoubleMLPLR$new(dml_data, ml_l, ml_m) dml_plr$fit()
dml_plr$bootstrap() dml_plr$confint(joint=TRUE)
dml_plr$p_adjust() dml_plr$p_adjust(method="bonferroni")

A matrix: 10 × 2 of type dbl
2.5 %97.5 %
d1 2.890273683.14532650
d2 2.907944783.14368145
d3 2.874303353.12752825
d4-0.147909240.07828372
d5-0.097796750.16803512
d6-0.121054720.12539340
d7-0.165362990.09310496
d8-0.101279300.14200098
d9-0.138682380.09980311
d10-0.044449780.19680840
A matrix: 10 × 2 of type dbl
Estimate.pval
d1 3.0178000920.000
d2 3.0258131140.000
d3 3.0009157990.000
d4-0.0348127630.938
d5 0.0351191850.938
d6 0.0021693380.958
d7-0.0361290150.938
d8 0.0203608380.954
d9-0.0194396330.954
d10 0.0761793120.428
A matrix: 10 × 2 of type dbl
Estimate.pval
d1 3.0178000920.0000000
d2 3.0258131140.0000000
d3 3.0009157990.0000000
d4-0.0348127631.0000000
d5 0.0351191851.0000000
d6 0.0021693381.0000000
d7-0.0361290151.0000000
d8 0.0203608381.0000000
d9-0.0194396331.0000000
d10 0.0761793120.8116912