8. Variance estimation and confidence intervals#
8.1. Variance estimation#
Under regularity conditions the estimator \(\tilde{\theta}_0\) concentrates in a \(1/\sqrt{N}\)-neighborhood of \(\theta_0\) and the sampling error \(\sqrt{N}(\tilde{\theta}_0 - \theta_0)\) is approximately normal
with mean zero and variance given by
where \(J_0 = \mathbb{E}(\psi_a(W; \eta_0))\), if the score function is linear in the parameter \(\theta\). If the score is not linear in the parameter \(\theta\), then \(J_0 = \partial_\theta\mathbb{E}(\psi(W; \theta, \eta_0)) \big|_{\theta=\theta_0}\).
Estimates of the variance are obtained by
for score functions being linear in the parameter \(\theta\). For non-linear score functions, the implementation assumes that derivatives and expectations are interchangeable, so that
An approximate confidence interval is given by
As an example we consider a partially linear regression model (PLR)
implemented in DoubleMLPLR
.
In [1]: import doubleml as dml
In [2]: from doubleml.datasets import make_plr_CCDDHNR2018
In [3]: from sklearn.ensemble import RandomForestRegressor
In [4]: from sklearn.base import clone
In [5]: np.random.seed(3141)
In [6]: learner = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
In [7]: ml_l = clone(learner)
In [8]: ml_m = clone(learner)
In [9]: data = make_plr_CCDDHNR2018(alpha=0.5, return_type='DataFrame')
In [10]: obj_dml_data = dml.DoubleMLData(data, 'y', 'd')
In [11]: dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_l, ml_m)
In [12]: dml_plr_obj.fit();
library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)
lgr::get_logger("mlr3")$set_threshold("warn")
learner = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5)
ml_l = learner$clone()
ml_m = learner$clone()
set.seed(3141)
obj_dml_data = make_plr_CCDDHNR2018(alpha=0.5)
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l, ml_m)
dml_plr_obj$fit()
The fit()
method of DoubleMLPLR
stores the estimate \(\tilde{\theta}_0\) in its coef
attribute.
In [13]: print(dml_plr_obj.coef)
[0.48069071]
print(dml_plr_obj$coef)
d
0.5448331
The asymptotic standard error \(\hat{\sigma}/\sqrt{N}\) is stored in its se
attribute.
In [14]: print(dml_plr_obj.se)
[0.04053339]
print(dml_plr_obj$se)
d
0.04501612
Additionally, the value of the \(t\)-statistic and the corresponding p-value are provided in the attributes
t_stat
and pval
.
In [15]: print(dml_plr_obj.t_stat)
[11.85912862]
In [16]: print(dml_plr_obj.pval)
[1.92972925e-32]
print(dml_plr_obj$t_stat)
print(dml_plr_obj$pval)
d
12.10307
d
1.017393e-33
Note
In Python, an overview of all these estimates, together with a 95 % confidence interval is stored in the attribute
summary
.In R, a summary can be obtained by using the method
summary()
. Theconfint()
method performs estimation of confidence intervals.
In [17]: print(dml_plr_obj.summary)
coef std err t P>|t| 2.5 % 97.5 %
d 0.480691 0.040533 11.859129 1.929729e-32 0.401247 0.560135
dml_plr_obj$summary()
dml_plr_obj$confint()
Estimates and significance testing of the effect of target variables
Estimate. Std. Error t value Pr(>|t|)
d 0.54483 0.04502 12.1 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
2.5 % | 97.5 % | |
---|---|---|
d | 0.4566031 | 0.6330631 |
A more detailed overview of the fitted model, its specifications and the summary can be obtained via the string-representation of the object.
In [18]: print(dml_plr_obj)
================== DoubleMLPLR Object ==================
------------------ Data summary ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7', 'X8', 'X9', 'X10', 'X11', 'X12', 'X13', 'X14', 'X15', 'X16', 'X17', 'X18', 'X19', 'X20']
Instrument variable(s): None
No. Observations: 500
------------------ Score & algorithm ------------------
Score function: partialling out
------------------ Machine learner ------------------
Learner ml_l: RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2)
Learner ml_m: RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2)
Out-of-sample Performance:
Regression:
Learner ml_l RMSE: [[1.12705095]]
Learner ml_m RMSE: [[1.03917696]]
------------------ Resampling ------------------
No. folds: 5
No. repeated sample splits: 1
------------------ Fit summary ------------------
coef std err t P>|t| 2.5 % 97.5 %
d 0.480691 0.040533 11.859129 1.929729e-32 0.401247 0.560135
print(dml_plr_obj)
================= DoubleMLPLR Object ==================
------------------ Data summary ------------------
Outcome variable: y
Treatment variable(s): d
Covariates: X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X18, X19, X20
Instrument(s):
No. Observations: 500
------------------ Score & algorithm ------------------
Score function: partialling out
DML algorithm: dml2
------------------ Machine learner ------------------
ml_l: regr.ranger
ml_m: regr.ranger
------------------ Resampling ------------------
No. folds: 5
No. repeated sample splits: 1
Apply cross-fitting: TRUE
------------------ Fit summary ------------------
Estimates and significance testing of the effect of target variables
Estimate. Std. Error t value Pr(>|t|)
d 0.54483 0.04502 12.1 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
8.2. Confidence bands and multiplier bootstrap for valid simultaneous inference#
DoubleML provides methods to perform valid simultaneous inference for multiple treatment variables. As an example, consider a PLR with \(p_1\) causal parameters of interest \(\theta_{0,1}, \ldots, \theta_{0,p_1}\) associated with treatment variables \(D_1, \ldots, D_{p_1}\). Inference on multiple target coefficients can be performed by iteratively applying the DML inference procedure over the target variables of interests: Each of the coefficients of interest, \(\theta_{0,j}\), with \(j \in \lbrace 1, \ldots, p_1 \rbrace\), solves a corresponding moment condition
Analogously to the case with a single parameter of interest, the PLR model with multiple treatment variables includes two regression steps to achieve orthogonality. First, the main regression is given by
with \([D_k, X]\) being a matrix comprising the confounders, \(X\), and all remaining treatment variables \(D_k\) with \(k \in \lbrace 1, \ldots, p_1\rbrace \setminus j\), by default. Second, the relationship between the treatment variable \(D_j\) and the remaining explanatory variables is determined by the equation
For further details, we refer to Belloni et al. (2018). Simultaneous inference can be based on a multiplier bootstrap procedure introduced in Chernozhukov et al. (2013, 2014). Alternatively, traditional correction approaches, for example the Bonferroni correction, can be used to adjust p-values.
The bootstrap()
method provides an implementation of a multiplier bootstrap for double machine learning models.
For \(b=1, \ldots, B\) weights \(\xi_{i, b}\) are generated according to a normal (Gaussian) bootstrap, wild
bootstrap or exponential bootstrap.
The number of bootstrap samples is provided as input n_rep_boot
and for method
one can choose 'Bayes'
,
'normal'
or 'wild'
.
Based on the estimates of the standard errors \(\hat{\sigma}_j\)
and \(\hat{J}_{0,j} = \mathbb{E}_N(\psi_{a,j}(W; \eta_{0,j}))\)
that are obtained from DML, we construct bootstraped t-statistics \(t^{*,b}_j\)
for \(j=1, \ldots, p_1\)
The output of the multiplier bootstrap can be used to determine the constant, \(c_{1-\alpha}\) that is required for the construction of a simultaneous \((1-\alpha)\) confidence band
To demonstrate the bootstrap, we simulate data from a sparse partially linear regression model.
Then we estimate the PLR model and perform the multiplier bootstrap.
Joint confidence intervals based on the multiplier bootstrap are then obtained by setting the option joint
when calling the method confint
.
Moreover, a multiple hypotheses testing adjustment of p-values from a high-dimensional model can be obtained with
the method p_adjust
. DoubleML performs a version of the Romano-Wolf stepdown adjustment,
which is based on the multiplier bootstrap, by default. Alternatively, p_adjust
allows users to apply traditional corrections
via the option method
.
In [19]: import doubleml as dml
In [20]: import numpy as np
In [21]: from sklearn.base import clone
In [22]: from sklearn.linear_model import LassoCV
# Simulate data
In [23]: np.random.seed(1234)
In [24]: n_obs = 500
In [25]: n_vars = 100
In [26]: X = np.random.normal(size=(n_obs, n_vars))
In [27]: theta = np.array([3., 3., 3.])
In [28]: y = np.dot(X[:, :3], theta) + np.random.standard_normal(size=(n_obs,))
In [29]: dml_data = dml.DoubleMLData.from_arrays(X[:, 10:], y, X[:, :10])
In [30]: learner = LassoCV()
In [31]: ml_l = clone(learner)
In [32]: ml_m = clone(learner)
In [33]: dml_plr = dml.DoubleMLPLR(dml_data, ml_l, ml_m)
In [34]: print(dml_plr.fit().bootstrap().confint(joint=True))
2.5 % 97.5 %
d1 2.813342 3.055680
d2 2.815224 3.083258
d3 2.860663 3.109069
d4 -0.141546 0.091391
d5 -0.060845 0.176929
d6 -0.158697 0.078474
d7 -0.172022 0.062964
d8 -0.067721 0.174499
d9 -0.092365 0.139491
d10 -0.110717 0.138698
In [35]: print(dml_plr.p_adjust())
thetas pval
d1 2.934511 0.000
d2 2.949241 0.000
d3 2.984866 0.000
d4 -0.025077 0.902
d5 0.058042 0.784
d6 -0.040112 0.808
d7 -0.054529 0.784
d8 0.053389 0.784
d9 0.023563 0.902
d10 0.013990 0.902
In [36]: print(dml_plr.p_adjust(method='bonferroni'))
thetas pval
d1 2.934511 0.0
d2 2.949241 0.0
d3 2.984866 0.0
d4 -0.025077 1.0
d5 0.058042 1.0
d6 -0.040112 1.0
d7 -0.054529 1.0
d8 0.053389 1.0
d9 0.023563 1.0
d10 0.013990 1.0
library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)
lgr::get_logger("mlr3")$set_threshold("warn")
set.seed(3141)
n_obs = 500
n_vars = 100
theta = rep(3, 3)
X = matrix(stats::rnorm(n_obs * n_vars), nrow = n_obs, ncol = n_vars)
y = X[, 1:3, drop = FALSE] %*% theta + stats::rnorm(n_obs)
dml_data = double_ml_data_from_matrix(X = X[, 11:n_vars], y = y, d = X[,1:10])
learner = lrn("regr.cv_glmnet", s="lambda.min")
ml_l = learner$clone()
ml_m = learner$clone()
dml_plr = DoubleMLPLR$new(dml_data, ml_l, ml_m)
dml_plr$fit()
dml_plr$bootstrap()
dml_plr$confint(joint=TRUE)
dml_plr$p_adjust()
dml_plr$p_adjust(method="bonferroni")
2.5 % | 97.5 % | |
---|---|---|
d1 | 2.89027368 | 3.14532650 |
d2 | 2.90794478 | 3.14368145 |
d3 | 2.87430335 | 3.12752825 |
d4 | -0.14790924 | 0.07828372 |
d5 | -0.09779675 | 0.16803512 |
d6 | -0.12105472 | 0.12539340 |
d7 | -0.16536299 | 0.09310496 |
d8 | -0.10127930 | 0.14200098 |
d9 | -0.13868238 | 0.09980311 |
d10 | -0.04444978 | 0.19680840 |
Estimate. | pval | |
---|---|---|
d1 | 3.017800092 | 0.000 |
d2 | 3.025813114 | 0.000 |
d3 | 3.000915799 | 0.000 |
d4 | -0.034812763 | 0.938 |
d5 | 0.035119185 | 0.938 |
d6 | 0.002169338 | 0.958 |
d7 | -0.036129015 | 0.938 |
d8 | 0.020360838 | 0.954 |
d9 | -0.019439633 | 0.954 |
d10 | 0.076179312 | 0.428 |
Estimate. | pval | |
---|---|---|
d1 | 3.017800092 | 0.0000000 |
d2 | 3.025813114 | 0.0000000 |
d3 | 3.000915799 | 0.0000000 |
d4 | -0.034812763 | 1.0000000 |
d5 | 0.035119185 | 1.0000000 |
d6 | 0.002169338 | 1.0000000 |
d7 | -0.036129015 | 1.0000000 |
d8 | 0.020360838 | 1.0000000 |
d9 | -0.019439633 | 1.0000000 |
d10 | 0.076179312 | 0.8116912 |
8.3. Simultaneous inference over different DoubleML models (advanced)#
The DoubleML package provides a method to perform valid simultaneous inference over different DoubleML models.
Note
Remark that the confidence intervals will generally only be valid if the stronger (uniform) assumptions on e.g. nuisance estimates are satisfied. Further, the models should be estimated on the same data set.
The doubleml.DoubleML
class contains a framework
attribute which stores a doubleml.DoubleMLFramework
object. This
object contains a scaled version of the score function
which is used to construct confidence intervals. The framework objects can be concatenated using the
doubleml.concat()
function.
In [37]: import doubleml as dml
In [38]: import numpy as np
In [39]: from sklearn.base import clone
In [40]: from sklearn.linear_model import LassoCV
In [41]: from sklearn.ensemble import RandomForestRegressor
In [42]: import doubleml as dml
# Simulate data
In [43]: np.random.seed(1234)
In [44]: n_obs = 500
In [45]: n_vars = 100
In [46]: X = np.random.normal(size=(n_obs, n_vars))
In [47]: theta = np.array([3., 3., 3.])
In [48]: y = np.dot(X[:, :3], theta) + np.random.standard_normal(size=(n_obs,))
In [49]: dml_data = dml.DoubleMLData.from_arrays(X[:, 10:], y, X[:, :10])
In [50]: learner = LassoCV()
In [51]: dml_plr_1 = dml.DoubleMLPLR(dml_data, clone(learner), clone(learner))
In [52]: learner_rf = RandomForestRegressor()
In [53]: dml_plr_2 = dml.DoubleMLPLR(dml_data, clone(learner_rf), clone(learner_rf))
In [54]: dml_plr_1.fit()
Out[54]: <doubleml.plm.plr.DoubleMLPLR at 0x7f5a873f25d0>
In [55]: dml_plr_2.fit()
Out[55]: <doubleml.plm.plr.DoubleMLPLR at 0x7f5a873f2de0>
In [56]: dml_combined = dml.concat([dml_plr_1.framework, dml_plr_2.framework])
In [57]: dml_combined.bootstrap().confint(joint=True)
Out[57]:
2.5 % 97.5 %
0 2.797737 3.071285
1 2.797965 3.100517
2 2.844667 3.125065
3 -0.156545 0.106391
4 -0.076156 0.192240
5 -0.173969 0.093746
6 -0.187153 0.078096
7 -0.083318 0.190096
8 -0.107295 0.154421
9 -0.126777 0.154758
10 2.634577 3.054348
11 2.676534 3.178704
12 2.771157 3.215967
13 -0.360065 0.157091
14 -0.167993 0.358799
15 -0.402113 0.199458
16 -0.313056 0.238225
17 -0.266922 0.318584
18 -0.181446 0.385240
19 -0.280514 0.321686
Frameworks can also be added or subtracted from each other. Of course, this changes the estimated parameter and should be used with caution.
In [58]: import doubleml as dml
In [59]: import numpy as np
In [60]: from sklearn.base import clone
In [61]: from sklearn.linear_model import LassoCV
In [62]: from sklearn.ensemble import RandomForestRegressor
In [63]: import doubleml as dml
# Simulate data
In [64]: np.random.seed(1234)
In [65]: n_obs = 500
In [66]: n_vars = 100
In [67]: X = np.random.normal(size=(n_obs, n_vars))
In [68]: theta = np.array([3., 3., 3.])
In [69]: y = np.dot(X[:, :3], theta) + np.random.standard_normal(size=(n_obs,))
In [70]: dml_data = dml.DoubleMLData.from_arrays(X[:, 10:], y, X[:, :10])
In [71]: learner = LassoCV()
In [72]: dml_plr_1 = dml.DoubleMLPLR(dml_data, clone(learner), clone(learner))
In [73]: learner_rf = RandomForestRegressor()
In [74]: dml_plr_2 = dml.DoubleMLPLR(dml_data, clone(learner_rf), clone(learner_rf))
In [75]: dml_plr_1.fit()
Out[75]: <doubleml.plm.plr.DoubleMLPLR at 0x7f5a87352840>
In [76]: dml_plr_2.fit()
Out[76]: <doubleml.plm.plr.DoubleMLPLR at 0x7f5a87352300>
In [77]: dml_combined = dml_plr_1.framework - dml_plr_2.framework
In [78]: dml_combined.bootstrap().confint(joint=True)
Out[78]:
2.5 % 97.5 %
0 -0.074304 0.254400
1 -0.145748 0.188991
2 -0.164034 0.146641
3 -0.122777 0.275596
4 -0.242815 0.168092
5 -0.178934 0.301366
6 -0.242139 0.207912
7 -0.197484 0.252601
8 -0.292047 0.135379
9 -0.244622 0.231430
One possible use case is to substract the estimates from two average potential outcome models as e.g. in the DoubleMLQTE
example.
This also works for multiple repetitions if both models have the same number of repetitions, as each repetition is treated seperately.
8.4. References#
Belloni, A., Chernozhukov, V., Chetverikov, D., Wei, Y. (2018), Uniformly valid post-regularization confidence regions for many functional parameters in z-estimation framework. The Annals of Statistics, 46 (6B): 3643-75, doi: 10.1214/17-AOS1671.
Chernozhukov, V., Chetverikov, D., Kato, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. The Annals of Statistics 41 (6): 2786-2819, doi: 10.1214/13-AOS1161.
Chernozhukov, V., Chetverikov, D., Kato, K. (2014), Gaussian approximation of suprema of empirical processes. The Annals of Statistics 42 (4): 1564-97, doi: 10.1214/14-AOS1230.