# 5. Double machine learning algorithms¶

The DoubleML package comes with two different algorithms to obtain DML estimates.

## 5.1. Algorithm DML1¶

The algorithm dml_procedure='dml1' can be summarized as

1. Inputs: Choose a model (PLR, PLIV, IRM, IIVM), provide data $$(W_i)_{i=1}^{N}$$, a Neyman-orthogonal score function $$\psi(W; \theta, \eta)$$ and specify machine learning method(s) for the nuisance function(s) $$\eta$$.

2. Train ML predictors on folds: Take a $$K$$-fold random partition $$(I_k)_{k=1}^{K}$$ of observation indices $$[N] = \lbrace 1, \ldots, N\rbrace$$ such that the size of each fold $$I_k$$ is $$n=N/K$$. For each $$k \in [K] = \lbrace 1, \ldots, K\rbrace$$, construct a high-quality machine learning estimator

$\hat{\eta}_{0,k} = \hat{\eta}_{0,k}\big((W_i)_{i\not\in I_k}\big)$

of $$\eta_0$$, where $$x \mapsto \hat{\eta}_{0,k}(x)$$ depends only on the subset of data $$(W_i)_{i\not\in I_k}$$.

3. Estimate causal parameter: For each $$k \in [K]$$, construct the estimator $$\check{\theta}_{0,k}$$ as the solution to the equation

$\frac{1}{n} \sum_{i \in I_k} \psi(W_i; \check{\theta}_{0,k}, \hat{\eta}_{0,k}) = 0.$

The estimate of the causal parameter is obtain via aggregation

$\tilde{\theta}_0 = \frac{1}{K} \sum_{k=1}^{K} \check{\theta}_{0,k}.$
4. Outputs: The estimate of the causal parameter $$\tilde{\theta}_0$$ as well as the values of the evaluated score function are returned.

## 5.2. Algorithm DML2¶

The algorithm dml_procedure='dml2' can be summarized as

1. Inputs: Choose a model (PLR, PLIV, IRM, IIVM), provide data $$(W_i)_{i=1}^{N}$$, a Neyman-orthogonal score function $$\psi(W; \theta, \eta)$$ and specify machine learning method(s) for the nuisance function(s) $$\eta$$.

2. Train ML predictors on folds: Take a $$K$$-fold random partition $$(I_k)_{k=1}^{K}$$ of observation indices $$[N] = \lbrace 1, \ldots, N\rbrace$$ such that the size of each fold $$I_k$$ is $$n=N/K$$. For each $$k \in [K] = \lbrace 1, \ldots, K\rbrace$$, construct a high-quality machine learning estimator

$\hat{\eta}_{0,k} = \hat{\eta}_{0,k}\big((W_i)_{i\not\in I_k}\big)$

of $$\eta_0$$, where $$x \mapsto \hat{\eta}_{0,k}(x)$$ depends only on the subset of data $$(W_i)_{i\not\in I_k}$$.

3. Estimate causal parameter: Construct the estimator for the causal parameter $$\tilde{\theta}_0$$ as the solution to the equation

$\frac{1}{N} \sum_{k=1}^{K} \sum_{i \in I_k} \psi(W_i; \tilde{\theta}_0, \hat{\eta}_{0,k}) = 0.$
4. Outputs: The estimate of the causal parameter $$\tilde{\theta}_0$$ as well as the values of the evaluate score function are returned.

## 5.3. Implementation of the double machine learning algorithms¶

As an example we consider a partially linear regression model (PLR) implemented in DoubleMLPLR. The DML algorithm can be selected via parameter dml_procedure='dml1' vs. dml_procedure='dml2'.

In [1]: import doubleml as dml

In [2]: from doubleml.datasets import make_plr_CCDDHNR2018

In [3]: from sklearn.ensemble import RandomForestRegressor

In [4]: from sklearn.base import clone

In [5]: np.random.seed(3141)

In [6]: learner = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)

In [7]: ml_g = clone(learner)

In [8]: ml_m = clone(learner)

In [9]: data = make_plr_CCDDHNR2018(alpha=0.5, return_type='DataFrame')

In [10]: obj_dml_data = dml.DoubleMLData(data, 'y', 'd')

In [11]: dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_g, ml_m, dml_procedure='dml1')

In [12]: dml_plr_obj.fit();

library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)
lgr::get_logger("mlr3")$set_threshold("warn") learner = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5) ml_g = learner$clone()
ml_m = learner$clone() set.seed(3141) data = make_plr_CCDDHNR2018(alpha=0.5, return_type='data.table') obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_g, ml_m, dml_procedure="dml1") dml_plr_obj$fit()


The fit() method of DoubleMLPLR stores the estimate $$\tilde{\theta}_0$$ in its coef attribute.

In [13]: dml_plr_obj.coef
Out[13]: array([0.45757317])

dml_plr_obj$coef  d: 0.54287532563466 Let $$k(i) = \lbrace k: i \in I_k \rbrace$$. The values of the score function $$(\psi(W_i; \tilde{\theta}_0, \hat{\eta}_{0,k(i)}))_{i \in [N]}$$ are stored in the attribute psi. In [14]: dml_plr_obj.psi[:5] Out[14]: array([[[-0.15330081]], [[ 0.02065838]], [[ 0.00576888]], [[ 0.0821825 ]], [[-0.23044651]]])  dml_plr_obj$psi[1:5, ,1]

1. -0.000784623154372457
2. 0.783124384910379
3. 0.00902031947837708
4. -0.403569975514042
5. 0.867033752141195

For the DML1 algorithm, the estimates for the different folds $$\check{\theta}_{0,k}$$, $$k \in [K]$$ are stored in attribute all_dml1_coef.

In [15]: dml_plr_obj.all_dml1_coef
Out[15]: array([[[0.47097149, 0.34734707, 0.4689859 , 0.44862244, 0.55193895]]])

dml_plr_obj\$all_dml1_coef

1. 0.708695026860755
2. 0.509339693389362
3. 0.465212699957609
4. 0.495850216426873
5. 0.535278991538703