6. Double machine learning algorithms#
The DoubleML package comes with two different algorithms to obtain DML estimates.
Note
The algorithms are argument dml_procedure
is deprecated in the python package. Generally, the second version of the algorithm DML2 is recommended, to obtain
more stable estimates.
6.1. Algorithm DML1#
The algorithm dml_procedure='dml1'
can be summarized as
Inputs: Choose a model (PLR, PLIV, IRM, IIVM), provide data \((W_i)_{i=1}^{N}\), a Neyman-orthogonal score function \(\psi(W; \theta, \eta)\) and specify machine learning method(s) for the nuisance function(s) \(\eta\).
Train ML predictors on folds: Take a \(K\)-fold random partition \((I_k)_{k=1}^{K}\) of observation indices \([N] = \lbrace 1, \ldots, N\rbrace\) such that the size of each fold \(I_k\) is \(n=N/K\). For each \(k \in [K] = \lbrace 1, \ldots, K\rbrace\), construct a high-quality machine learning estimator
\[\hat{\eta}_{0,k} = \hat{\eta}_{0,k}\big((W_i)_{i\not\in I_k}\big)\]of \(\eta_0\), where \(x \mapsto \hat{\eta}_{0,k}(x)\) depends only on the subset of data \((W_i)_{i\not\in I_k}\).
Estimate causal parameter: For each \(k \in [K]\), construct the estimator \(\check{\theta}_{0,k}\) as the solution to the equation
\[\frac{1}{n} \sum_{i \in I_k} \psi(W_i; \check{\theta}_{0,k}, \hat{\eta}_{0,k}) = 0.\]The estimate of the causal parameter is obtain via aggregation
\[\tilde{\theta}_0 = \frac{1}{K} \sum_{k=1}^{K} \check{\theta}_{0,k}.\]Outputs: The estimate of the causal parameter \(\tilde{\theta}_0\) as well as the values of the evaluated score function are returned.
6.2. Algorithm DML2#
The algorithm dml_procedure='dml2'
can be summarized as
Inputs: Choose a model (PLR, PLIV, IRM, IIVM), provide data \((W_i)_{i=1}^{N}\), a Neyman-orthogonal score function \(\psi(W; \theta, \eta)\) and specify machine learning method(s) for the nuisance function(s) \(\eta\).
Train ML predictors on folds: Take a \(K\)-fold random partition \((I_k)_{k=1}^{K}\) of observation indices \([N] = \lbrace 1, \ldots, N\rbrace\) such that the size of each fold \(I_k\) is \(n=N/K\). For each \(k \in [K] = \lbrace 1, \ldots, K\rbrace\), construct a high-quality machine learning estimator
\[\hat{\eta}_{0,k} = \hat{\eta}_{0,k}\big((W_i)_{i\not\in I_k}\big)\]of \(\eta_0\), where \(x \mapsto \hat{\eta}_{0,k}(x)\) depends only on the subset of data \((W_i)_{i\not\in I_k}\).
Estimate causal parameter: Construct the estimator for the causal parameter \(\tilde{\theta}_0\) as the solution to the equation
\[\frac{1}{N} \sum_{k=1}^{K} \sum_{i \in I_k} \psi(W_i; \tilde{\theta}_0, \hat{\eta}_{0,k}) = 0.\]Outputs: The estimate of the causal parameter \(\tilde{\theta}_0\) as well as the values of the evaluate score function are returned.
6.3. Implementation of the double machine learning algorithms#
As an example we consider a partially linear regression model (PLR)
implemented in DoubleMLPLR
.
The default version of the DoubleML
class is based on the DML2 algorithm.
In [1]: import doubleml as dml
In [2]: from doubleml.datasets import make_plr_CCDDHNR2018
In [3]: from sklearn.ensemble import RandomForestRegressor
In [4]: from sklearn.base import clone
In [5]: np.random.seed(3141)
In [6]: learner = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
In [7]: ml_l = clone(learner)
In [8]: ml_m = clone(learner)
In [9]: data = make_plr_CCDDHNR2018(alpha=0.5, return_type='DataFrame')
In [10]: obj_dml_data = dml.DoubleMLData(data, 'y', 'd')
In [11]: dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_l, ml_m)
In [12]: dml_plr_obj.fit();
The DML algorithm can be selected via parameter dml_procedure='dml1'
vs. dml_procedure='dml2'
.
library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)
lgr::get_logger("mlr3")$set_threshold("warn")
learner = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5)
ml_l = learner$clone()
ml_m = learner$clone()
set.seed(3141)
data = make_plr_CCDDHNR2018(alpha=0.5, return_type='data.table')
obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l, ml_m, dml_procedure="dml1")
dml_plr_obj$fit()
The fit()
method of DoubleMLPLR
stores the estimate \(\tilde{\theta}_0\) in its coef
attribute.
In [13]: dml_plr_obj.coef
Out[13]: array([0.48069071])
dml_plr_obj$coef
Let \(k(i) = \lbrace k: i \in I_k \rbrace\).
The values of the score function \((\psi(W_i; \tilde{\theta}_0, \hat{\eta}_{0,k(i)}))_{i \in [N]}\)
are stored in the attribute psi
.
In [14]: dml_plr_obj.psi[:5]
Out[14]:
array([[[ 0.02052929]],
[[-0.00409412]],
[[ 0.00138944]],
[[-0.11208236]],
[[-0.29678199]]])
For the DML1 algorithm, the estimates for the different folds
\(\check{\theta}_{0,k}`\), \(k \in [K]\) are stored in attribute all_dml1_coef
.
dml_plr_obj$psi[1:5, ,1]
- 0.00950122695463054
- 0.751712655588833
- 0.00888458890362062
- -0.403626490670169
- 0.866179899731091
dml_plr_obj$all_dml1_coef
- 0.705595810371231
- 0.5115547181877
- 0.465965114589023
- 0.49231564722955
- 0.541684435562712