# 6. Double machine learning algorithms#

The DoubleML package comes with two different algorithms to obtain DML estimates.

Note

The algorithms are argument dml_procedure is deprecated in the python package. Generally, the second version of the algorithm DML2 is recommended, to obtain more stable estimates.

## 6.1. Algorithm DML1#

The algorithm dml_procedure='dml1' can be summarized as

1. Inputs: Choose a model (PLR, PLIV, IRM, IIVM), provide data $$(W_i)_{i=1}^{N}$$, a Neyman-orthogonal score function $$\psi(W; \theta, \eta)$$ and specify machine learning method(s) for the nuisance function(s) $$\eta$$.

2. Train ML predictors on folds: Take a $$K$$-fold random partition $$(I_k)_{k=1}^{K}$$ of observation indices $$[N] = \lbrace 1, \ldots, N\rbrace$$ such that the size of each fold $$I_k$$ is $$n=N/K$$. For each $$k \in [K] = \lbrace 1, \ldots, K\rbrace$$, construct a high-quality machine learning estimator

$\hat{\eta}_{0,k} = \hat{\eta}_{0,k}\big((W_i)_{i\not\in I_k}\big)$

of $$\eta_0$$, where $$x \mapsto \hat{\eta}_{0,k}(x)$$ depends only on the subset of data $$(W_i)_{i\not\in I_k}$$.

3. Estimate causal parameter: For each $$k \in [K]$$, construct the estimator $$\check{\theta}_{0,k}$$ as the solution to the equation

$\frac{1}{n} \sum_{i \in I_k} \psi(W_i; \check{\theta}_{0,k}, \hat{\eta}_{0,k}) = 0.$

The estimate of the causal parameter is obtain via aggregation

$\tilde{\theta}_0 = \frac{1}{K} \sum_{k=1}^{K} \check{\theta}_{0,k}.$
4. Outputs: The estimate of the causal parameter $$\tilde{\theta}_0$$ as well as the values of the evaluated score function are returned.

## 6.2. Algorithm DML2#

The algorithm dml_procedure='dml2' can be summarized as

1. Inputs: Choose a model (PLR, PLIV, IRM, IIVM), provide data $$(W_i)_{i=1}^{N}$$, a Neyman-orthogonal score function $$\psi(W; \theta, \eta)$$ and specify machine learning method(s) for the nuisance function(s) $$\eta$$.

2. Train ML predictors on folds: Take a $$K$$-fold random partition $$(I_k)_{k=1}^{K}$$ of observation indices $$[N] = \lbrace 1, \ldots, N\rbrace$$ such that the size of each fold $$I_k$$ is $$n=N/K$$. For each $$k \in [K] = \lbrace 1, \ldots, K\rbrace$$, construct a high-quality machine learning estimator

$\hat{\eta}_{0,k} = \hat{\eta}_{0,k}\big((W_i)_{i\not\in I_k}\big)$

of $$\eta_0$$, where $$x \mapsto \hat{\eta}_{0,k}(x)$$ depends only on the subset of data $$(W_i)_{i\not\in I_k}$$.

3. Estimate causal parameter: Construct the estimator for the causal parameter $$\tilde{\theta}_0$$ as the solution to the equation

$\frac{1}{N} \sum_{k=1}^{K} \sum_{i \in I_k} \psi(W_i; \tilde{\theta}_0, \hat{\eta}_{0,k}) = 0.$
4. Outputs: The estimate of the causal parameter $$\tilde{\theta}_0$$ as well as the values of the evaluate score function are returned.

## 6.3. Implementation of the double machine learning algorithms#

As an example we consider a partially linear regression model (PLR) implemented in DoubleMLPLR. The default version of the DoubleML class is based on the DML2 algorithm.

In [1]: import doubleml as dml

In [2]: from doubleml.datasets import make_plr_CCDDHNR2018

In [3]: from sklearn.ensemble import RandomForestRegressor

In [4]: from sklearn.base import clone

In [5]: np.random.seed(3141)

In [6]: learner = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)

In [7]: ml_l = clone(learner)

In [8]: ml_m = clone(learner)

In [9]: data = make_plr_CCDDHNR2018(alpha=0.5, return_type='DataFrame')

In [10]: obj_dml_data = dml.DoubleMLData(data, 'y', 'd')

In [11]: dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_l, ml_m)

In [12]: dml_plr_obj.fit();


The DML algorithm can be selected via parameter dml_procedure='dml1' vs. dml_procedure='dml2'.

library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)
lgr::get_logger("mlr3")$set_threshold("warn") learner = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5) ml_l = learner$clone()
ml_m = learner$clone() set.seed(3141) data = make_plr_CCDDHNR2018(alpha=0.5, return_type='data.table') obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l, ml_m, dml_procedure="dml1") dml_plr_obj$fit()


The fit() method of DoubleMLPLR stores the estimate $$\tilde{\theta}_0$$ in its coef attribute.

In [13]: dml_plr_obj.coef
Out[13]: array([0.48069071])

dml_plr_obj$coef  d: 0.54287532563466 Let $$k(i) = \lbrace k: i \in I_k \rbrace$$. The values of the score function $$(\psi(W_i; \tilde{\theta}_0, \hat{\eta}_{0,k(i)}))_{i \in [N]}$$ are stored in the attribute psi. In [14]: dml_plr_obj.psi[:5] Out[14]: array([[[ 0.02052929]], [[-0.00409412]], [[ 0.00138944]], [[-0.11208236]], [[-0.29678199]]])  For the DML1 algorithm, the estimates for the different folds $$\check{\theta}_{0,k}$$, $$k \in [K]$$ are stored in attribute all_dml1_coef. dml_plr_obj$psi[1:5, ,1]

1. -0.000784623154372457
2. 0.783124384910379
3. 0.00902031947837708
4. -0.403569975514042
5. 0.867033752141195
dml_plr_obj\$all_dml1_coef
`
1. 0.708695026860755
2. 0.509339693389362
3. 0.465212699957609
4. 0.495850216426873
5. 0.535278991538703