5. Double machine learning algorithms

The DoubleML package comes with two different algorithms to obtain DML estimates.

5.1. Algorithm DML1

The algorithm dml_procedure='dml1' can be summarized as

  1. Inputs: Choose a model (PLR, PLIV, IRM, IIVM), provide data \((W_i)_{i=1}^{N}\), a Neyman-orthogonal score function \(\psi(W; \theta, \eta)\) and specify machine learning method(s) for the nuisance function(s) \(\eta\).

  2. Train ML predictors on folds: Take a \(K\)-fold random partition \((I_k)_{k=1}^{K}\) of observation indices \([N] = \lbrace 1, \ldots, N\rbrace\) such that the size of each fold \(I_k\) is \(n=N/K\). For each \(k \in [K] = \lbrace 1, \ldots, K\rbrace\), construct a high-quality machine learning estimator

    \[\hat{\eta}_{0,k} = \hat{\eta}_{0,k}\big((W_i)_{i\not\in I_k}\big)\]

    of \(\eta_0\), where \(x \mapsto \hat{\eta}_{0,k}(x)\) depends only on the subset of data \((W_i)_{i\not\in I_k}\).

  3. Estimate causal parameter: For each \(k \in [K]\), construct the estimator \(\check{\theta}_{0,k}\) as the solution to the equation

    \[\frac{1}{n} \sum_{i \in I_k} \psi(W_i; \check{\theta}_{0,k}, \hat{\eta}_{0,k}) = 0.\]

    The estimate of the causal parameter is obtain via aggregation

    \[\tilde{\theta}_0 = \frac{1}{K} \sum_{k=1}^{K} \check{\theta}_{0,k}.\]
  4. Outputs: The estimate of the causal parameter \(\tilde{\theta}_0\) as well as the values of the evaluated score function are returned.

5.2. Algorithm DML2

The algorithm dml_procedure='dml2' can be summarized as

  1. Inputs: Choose a model (PLR, PLIV, IRM, IIVM), provide data \((W_i)_{i=1}^{N}\), a Neyman-orthogonal score function \(\psi(W; \theta, \eta)\) and specify machine learning method(s) for the nuisance function(s) \(\eta\).

  2. Train ML predictors on folds: Take a \(K\)-fold random partition \((I_k)_{k=1}^{K}\) of observation indices \([N] = \lbrace 1, \ldots, N\rbrace\) such that the size of each fold \(I_k\) is \(n=N/K\). For each \(k \in [K] = \lbrace 1, \ldots, K\rbrace\), construct a high-quality machine learning estimator

    \[\hat{\eta}_{0,k} = \hat{\eta}_{0,k}\big((W_i)_{i\not\in I_k}\big)\]

    of \(\eta_0\), where \(x \mapsto \hat{\eta}_{0,k}(x)\) depends only on the subset of data \((W_i)_{i\not\in I_k}\).

  3. Estimate causal parameter: Construct the estimator for the causal parameter \(\tilde{\theta}_0\) as the solution to the equation

    \[\frac{1}{N} \sum_{k=1}^{K} \sum_{i \in I_k} \psi(W_i; \tilde{\theta}_0, \hat{\eta}_{0,k}) = 0.\]
  4. Outputs: The estimate of the causal parameter \(\tilde{\theta}_0\) as well as the values of the evaluate score function are returned.

5.3. Implementation of the double machine learning algorithms

As an example we consider a partially linear regression model (PLR) implemented in DoubleMLPLR. The DML algorithm can be selected via parameter dml_procedure='dml1' vs. dml_procedure='dml2'.

In [1]: import doubleml as dml

In [2]: from doubleml.datasets import make_plr_CCDDHNR2018

In [3]: from sklearn.ensemble import RandomForestRegressor

In [4]: from sklearn.base import clone

In [5]: np.random.seed(3141)

In [6]: learner = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)

In [7]: ml_g = clone(learner)

In [8]: ml_m = clone(learner)

In [9]: data = make_plr_CCDDHNR2018(alpha=0.5, return_type='DataFrame')

In [10]: obj_dml_data = dml.DoubleMLData(data, 'y', 'd')

In [11]: dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_g, ml_m, dml_procedure='dml1')

In [12]: dml_plr_obj.fit();
library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)
lgr::get_logger("mlr3")$set_threshold("warn")

learner = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5)
ml_g = learner$clone()
ml_m = learner$clone()
set.seed(3141)
data = make_plr_CCDDHNR2018(alpha=0.5, return_type='data.table')
obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_g, ml_m, dml_procedure="dml1")
dml_plr_obj$fit()

The fit() method of DoubleMLPLR stores the estimate \(\tilde{\theta}_0\) in its coef attribute.

In [13]: dml_plr_obj.coef
Out[13]: array([0.45757317])
dml_plr_obj$coef
d: 0.54287532563466

Let \(k(i) = \lbrace k: i \in I_k \rbrace\). The values of the score function \((\psi(W_i; \tilde{\theta}_0, \hat{\eta}_{0,k(i)}))_{i \in [N]}\) are stored in the attribute psi.

In [14]: dml_plr_obj.psi[:5]
Out[14]: 
array([[[-0.15330081]],

       [[ 0.02065838]],

       [[ 0.00576888]],

       [[ 0.0821825 ]],

       [[-0.23044651]]])
dml_plr_obj$psi[1:5, ,1]
  1. -0.000784623154372457
  2. 0.783124384910379
  3. 0.00902031947837708
  4. -0.403569975514042
  5. 0.867033752141195

For the DML1 algorithm, the estimates for the different folds \(\check{\theta}_{0,k}\), \(k \in [K]\) are stored in attribute all_dml1_coef.

In [15]: dml_plr_obj.all_dml1_coef
Out[15]: array([[[0.47097149, 0.34734707, 0.4689859 , 0.44862244, 0.55193895]]])
dml_plr_obj$all_dml1_coef
  1. 0.708695026860755
  2. 0.509339693389362
  3. 0.465212699957609
  4. 0.495850216426873
  5. 0.535278991538703