6. Double machine learning algorithms#

The DoubleML package comes with two different algorithms to obtain DML estimates.

Note

The algorithms are argument dml_procedure is deprecated in the python package. Generally, the second version of the algorithm DML2 is recommended, to obtain more stable estimates.

6.1. Algorithm DML1#

The algorithm dml_procedure='dml1' can be summarized as

  1. Inputs: Choose a model (PLR, PLIV, IRM, IIVM), provide data \((W_i)_{i=1}^{N}\), a Neyman-orthogonal score function \(\psi(W; \theta, \eta)\) and specify machine learning method(s) for the nuisance function(s) \(\eta\).

  2. Train ML predictors on folds: Take a \(K\)-fold random partition \((I_k)_{k=1}^{K}\) of observation indices \([N] = \lbrace 1, \ldots, N\rbrace\) such that the size of each fold \(I_k\) is \(n=N/K\). For each \(k \in [K] = \lbrace 1, \ldots, K\rbrace\), construct a high-quality machine learning estimator

    \[\hat{\eta}_{0,k} = \hat{\eta}_{0,k}\big((W_i)_{i\not\in I_k}\big)\]

    of \(\eta_0\), where \(x \mapsto \hat{\eta}_{0,k}(x)\) depends only on the subset of data \((W_i)_{i\not\in I_k}\).

  3. Estimate causal parameter: For each \(k \in [K]\), construct the estimator \(\check{\theta}_{0,k}\) as the solution to the equation

    \[\frac{1}{n} \sum_{i \in I_k} \psi(W_i; \check{\theta}_{0,k}, \hat{\eta}_{0,k}) = 0.\]

    The estimate of the causal parameter is obtain via aggregation

    \[\tilde{\theta}_0 = \frac{1}{K} \sum_{k=1}^{K} \check{\theta}_{0,k}.\]
  4. Outputs: The estimate of the causal parameter \(\tilde{\theta}_0\) as well as the values of the evaluated score function are returned.

6.2. Algorithm DML2#

The algorithm dml_procedure='dml2' can be summarized as

  1. Inputs: Choose a model (PLR, PLIV, IRM, IIVM), provide data \((W_i)_{i=1}^{N}\), a Neyman-orthogonal score function \(\psi(W; \theta, \eta)\) and specify machine learning method(s) for the nuisance function(s) \(\eta\).

  2. Train ML predictors on folds: Take a \(K\)-fold random partition \((I_k)_{k=1}^{K}\) of observation indices \([N] = \lbrace 1, \ldots, N\rbrace\) such that the size of each fold \(I_k\) is \(n=N/K\). For each \(k \in [K] = \lbrace 1, \ldots, K\rbrace\), construct a high-quality machine learning estimator

    \[\hat{\eta}_{0,k} = \hat{\eta}_{0,k}\big((W_i)_{i\not\in I_k}\big)\]

    of \(\eta_0\), where \(x \mapsto \hat{\eta}_{0,k}(x)\) depends only on the subset of data \((W_i)_{i\not\in I_k}\).

  3. Estimate causal parameter: Construct the estimator for the causal parameter \(\tilde{\theta}_0\) as the solution to the equation

    \[\frac{1}{N} \sum_{k=1}^{K} \sum_{i \in I_k} \psi(W_i; \tilde{\theta}_0, \hat{\eta}_{0,k}) = 0.\]
  4. Outputs: The estimate of the causal parameter \(\tilde{\theta}_0\) as well as the values of the evaluate score function are returned.

6.3. Implementation of the double machine learning algorithms#

As an example we consider a partially linear regression model (PLR) implemented in DoubleMLPLR. The default version of the DoubleML class is based on the DML2 algorithm.

In [1]: import doubleml as dml

In [2]: from doubleml.datasets import make_plr_CCDDHNR2018

In [3]: from sklearn.ensemble import RandomForestRegressor

In [4]: from sklearn.base import clone

In [5]: np.random.seed(3141)

In [6]: learner = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)

In [7]: ml_l = clone(learner)

In [8]: ml_m = clone(learner)

In [9]: data = make_plr_CCDDHNR2018(alpha=0.5, return_type='DataFrame')

In [10]: obj_dml_data = dml.DoubleMLData(data, 'y', 'd')

In [11]: dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_l, ml_m)

In [12]: dml_plr_obj.fit();

The DML algorithm can be selected via parameter dml_procedure='dml1' vs. dml_procedure='dml2'.

library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)
lgr::get_logger("mlr3")$set_threshold("warn")

learner = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5)
ml_l = learner$clone()
ml_m = learner$clone()
set.seed(3141)
data = make_plr_CCDDHNR2018(alpha=0.5, return_type='data.table')
obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l, ml_m, dml_procedure="dml1")
dml_plr_obj$fit()

The fit() method of DoubleMLPLR stores the estimate \(\tilde{\theta}_0\) in its coef attribute.

In [13]: dml_plr_obj.coef
Out[13]: array([0.48069071])
dml_plr_obj$coef
d: 0.543423145188043

Let \(k(i) = \lbrace k: i \in I_k \rbrace\). The values of the score function \((\psi(W_i; \tilde{\theta}_0, \hat{\eta}_{0,k(i)}))_{i \in [N]}\) are stored in the attribute psi.

In [14]: dml_plr_obj.psi[:5]
Out[14]: 
array([[[ 0.02052929]],

       [[-0.00409412]],

       [[ 0.00138944]],

       [[-0.11208236]],

       [[-0.29678199]]])

For the DML1 algorithm, the estimates for the different folds \(\check{\theta}_{0,k}`\), \(k \in [K]\) are stored in attribute all_dml1_coef.

dml_plr_obj$psi[1:5, ,1]
  1. 0.00950122695463054
  2. 0.751712655588833
  3. 0.00888458890362062
  4. -0.403626490670169
  5. 0.866179899731091
dml_plr_obj$all_dml1_coef
  1. 0.705595810371231
  2. 0.5115547181877
  3. 0.465965114589023
  4. 0.49231564722955
  5. 0.541684435562712