7. Learners, hyperparameters and hyperparameter tuning#

The estimation of a double/debiased machine learning model involves the estimation of several nuisance function with machine learning estimators. Such learners are implemented in various Python and R packages. The implementation of DoubleML is based on the meta-packages scikit-learn for Python and mlr3 for R. The interfaces to specify the learners, set hyperparameters and tune hyperparameters are described in the following separately for Python and R.

7.1. Python: Learners and hyperparameters#

7.1.1. Minimum requirements for learners#

The minimum requirement for a learner to be used for nuisance models in the DoubleML package is

Most learners from scikit-learn satisfy all these minimum requirements.

7.1.2. Specifying learners and set hyperparameters#

The learners are set during initialization of the DoubleML model classes doubleml.DoubleMLPLR, doubleml.DoubleMLPLIV, doubleml.DoubleMLIRM and doubleml.DoubleMLIIVM. Lets simulate some data and consider the partially linear regression model. We need to specify learners for the nuisance functions \(g_0(X) = E[Y|X]\) and \(m_0(X) = E[D|X]\), for example sklearn.ensemble.RandomForestRegressor.

In [1]: import doubleml as dml

In [2]: from doubleml.datasets import make_plr_CCDDHNR2018

In [3]: from sklearn.ensemble import RandomForestRegressor

In [4]: np.random.seed(1234)

In [5]: ml_l = RandomForestRegressor()

In [6]: ml_m = RandomForestRegressor()

In [7]: data = make_plr_CCDDHNR2018(alpha=0.5, return_type='DataFrame')

In [8]: obj_dml_data = dml.DoubleMLData(data, 'y', 'd')

In [9]: dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_l, ml_m)

In [10]: dml_plr_obj.fit().summary
Out[10]: 
       coef   std err          t         P>|t|    2.5 %    97.5 %
d  0.503504  0.045993  10.947466  6.833227e-28  0.41336  0.593648

Without further specification of the hyperparameters, default values are used. To set hyperparameters:

  • We can also use pre-parametrized learners, like RandomForestRegressor(n_estimators=10).

  • Alternatively, hyperparameters can also be set after initialization via the method set_ml_nuisance_params(learner, treat_var, params)

In [11]: np.random.seed(1234)

In [12]: dml_plr_obj = dml.DoubleMLPLR(obj_dml_data,
   ....:                               RandomForestRegressor(n_estimators=10),
   ....:                               RandomForestRegressor())
   ....: 

In [13]: print(dml_plr_obj.fit().summary)
      coef   std err          t         P>|t|     2.5 %    97.5 %
d  0.53257  0.046922  11.350165  7.402301e-30  0.440605  0.624535

In [14]: np.random.seed(1234)

In [15]: dml_plr_obj = dml.DoubleMLPLR(obj_dml_data,
   ....:                               RandomForestRegressor(),
   ....:                               RandomForestRegressor())
   ....: 

In [16]: dml_plr_obj.set_ml_nuisance_params('ml_l', 'd', {'n_estimators': 10});

In [17]: print(dml_plr_obj.fit().summary)
      coef   std err          t         P>|t|     2.5 %    97.5 %
d  0.53257  0.046922  11.350165  7.402301e-30  0.440605  0.624535

Setting treatment-variable-specific or fold-specific hyperparameters:

  • In the multiple-treatment case, the method set_ml_nuisance_params(learner, treat_var, params) can be used to set different hyperparameters for different treatment variables.

  • The method set_ml_nuisance_params(learner, treat_var, params) accepts dicts and lists for params. A dict should be provided if for each fold the same hyperparameters should be used. Fold-specific parameters are supported. To do so, provide a nested list as params, where the outer list is of length n_rep and the inner list of length n_folds.

7.1.3. Hyperparameter tuning#

Parameter tuning of learners for the nuisance functions of DoubleML models can be done via the tune() method. To illustrate the parameter tuning, we generate data from a sparse partially linear regression model.

In [18]: import doubleml as dml

In [19]: import numpy as np

In [20]: np.random.seed(3141)

In [21]: n_obs = 200

In [22]: n_vars = 200

In [23]: theta = 3

In [24]: X = np.random.normal(size=(n_obs, n_vars))

In [25]: d = np.dot(X[:, :3], np.array([5, 5, 5])) + np.random.standard_normal(size=(n_obs,))

In [26]: y = theta * d + np.dot(X[:, :3], np.array([5, 5, 5])) + np.random.standard_normal(size=(n_obs,))

In [27]: dml_data = dml.DoubleMLData.from_arrays(X, y, d)

The hyperparameter-tuning is performed using either an exhaustive search over specified parameter values implemented in sklearn.model_selection.GridSearchCV or via a randomized search implemented in sklearn.model_selection.RandomizedSearchCV.

In [28]: import doubleml as dml

In [29]: from sklearn.linear_model import Lasso

In [30]: ml_l = Lasso()

In [31]: ml_m = Lasso()

In [32]: dml_plr_obj = dml.DoubleMLPLR(dml_data, ml_l, ml_m)

In [33]: par_grids = {'ml_l': {'alpha': np.arange(0.05, 1., 0.1)},
   ....:              'ml_m': {'alpha': np.arange(0.05, 1., 0.1)}}
   ....: 

In [34]: dml_plr_obj.tune(par_grids, search_mode='grid_search');

In [35]: print(dml_plr_obj.params)
{'ml_l': {'d': [[{'alpha': np.float64(0.45000000000000007)}, {'alpha': np.float64(0.45000000000000007)}, {'alpha': np.float64(0.45000000000000007)}, {'alpha': np.float64(0.45000000000000007)}, {'alpha': np.float64(0.45000000000000007)}]]}, 'ml_m': {'d': [[{'alpha': np.float64(0.15000000000000002)}, {'alpha': np.float64(0.15000000000000002)}, {'alpha': np.float64(0.15000000000000002)}, {'alpha': np.float64(0.15000000000000002)}, {'alpha': np.float64(0.15000000000000002)}]]}}

In [36]: print(dml_plr_obj.fit().summary)
       coef   std err          t  P>|t|     2.5 %    97.5 %
d  3.031134  0.071777  42.229759    0.0  2.890454  3.171815

In [37]: np.random.seed(1234)

In [38]: par_grids = {'ml_l': {'alpha': np.arange(0.05, 1., 0.01)},
   ....:              'ml_m': {'alpha': np.arange(0.05, 1., 0.01)}}
   ....: 

In [39]: dml_plr_obj.tune(par_grids, search_mode='randomized_search', n_iter_randomized_search=20);

In [40]: print(dml_plr_obj.params)
{'ml_l': {'d': [[{'alpha': np.float64(0.4000000000000001)}, {'alpha': np.float64(0.4000000000000001)}, {'alpha': np.float64(0.4000000000000001)}, {'alpha': np.float64(0.4000000000000001)}, {'alpha': np.float64(0.4000000000000001)}]]}, 'ml_m': {'d': [[{'alpha': np.float64(0.09000000000000001)}, {'alpha': np.float64(0.09000000000000001)}, {'alpha': np.float64(0.09000000000000001)}, {'alpha': np.float64(0.09000000000000001)}, {'alpha': np.float64(0.09000000000000001)}]]}}

In [41]: print(dml_plr_obj.fit().summary)
      coef   std err          t          P>|t|     2.5 %    97.5 %
d  2.96582  0.086679  34.216207  1.388216e-256  2.795932  3.135707

Hyperparameter tuning can also be done with more sophisticated methods, like for example an iterative fitting along a regularization path implemented in sklearn.linear_model.LassoCV. In this case the tuning should be done externally and the parameters can then be set via the set_ml_nuisance_params() method.

In [42]: import doubleml as dml

In [43]: from sklearn.linear_model import LassoCV

In [44]: np.random.seed(1234)

In [45]: ml_l_tune = LassoCV().fit(dml_data.x, dml_data.y)

In [46]: ml_m_tune = LassoCV().fit(dml_data.x, dml_data.d)

In [47]: ml_l = Lasso()

In [48]: ml_m = Lasso()

In [49]: dml_plr_obj = dml.DoubleMLPLR(dml_data, ml_l, ml_m)

In [50]: dml_plr_obj.set_ml_nuisance_params('ml_l', 'd', {'alpha': ml_l_tune.alpha_});

In [51]: dml_plr_obj.set_ml_nuisance_params('ml_m', 'd', {'alpha': ml_m_tune.alpha_});

In [52]: print(dml_plr_obj.params)
{'ml_l': {'d': [[{'alpha': np.float64(0.4311947070055128)}, {'alpha': np.float64(0.4311947070055128)}, {'alpha': np.float64(0.4311947070055128)}, {'alpha': np.float64(0.4311947070055128)}, {'alpha': np.float64(0.4311947070055128)}]]}, 'ml_m': {'d': [[{'alpha': np.float64(0.14281403493938022)}, {'alpha': np.float64(0.14281403493938022)}, {'alpha': np.float64(0.14281403493938022)}, {'alpha': np.float64(0.14281403493938022)}, {'alpha': np.float64(0.14281403493938022)}]]}}

In [53]: print(dml_plr_obj.fit().summary)
       coef   std err          t  P>|t|     2.5 %    97.5 %
d  3.048723  0.075869  40.183855    0.0  2.900021  3.197424

7.1.4. Evaluate learners#

To compare different learners it is possible to evaluate the out-of-sample performance of each learner. The summary already shows the root mean squared error (RMSE) for each learner and each corresponding repetition of cross-fitting (n_rep argument).

To illustrate the parameter tuning, we work with the following example.

In [54]: import doubleml as dml

In [55]: from doubleml.datasets import make_plr_CCDDHNR2018

In [56]: from sklearn.ensemble import RandomForestRegressor

In [57]: np.random.seed(1234)

In [58]: ml_l = RandomForestRegressor()

In [59]: ml_m = RandomForestRegressor()

In [60]: data = make_plr_CCDDHNR2018(alpha=0.5, return_type='DataFrame')

In [61]: obj_dml_data = dml.DoubleMLData(data, 'y', 'd')

In [62]: dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_l, ml_m)

In [63]: dml_plr_obj.fit()
Out[63]: <doubleml.plm.plr.DoubleMLPLR at 0x7f31f01ce9d0>

In [64]: print(dml_plr_obj)
================== DoubleMLPLR Object ==================

------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7', 'X8', 'X9', 'X10', 'X11', 'X12', 'X13', 'X14', 'X15', 'X16', 'X17', 'X18', 'X19', 'X20']
Instrument variable(s): None
No. Observations: 500

------------------ Score & algorithm ------------------
Score function: partialling out

------------------ Machine learner   ------------------
Learner ml_l: RandomForestRegressor()
Learner ml_m: RandomForestRegressor()
Out-of-sample Performance:
Learner ml_l RMSE: [[1.17385178]]
Learner ml_m RMSE: [[1.03244552]]

------------------ Resampling        ------------------
No. folds: 5
No. repeated sample splits: 1

------------------ Fit summary       ------------------
       coef   std err          t         P>|t|    2.5 %    97.5 %
d  0.503504  0.045993  10.947466  6.833227e-28  0.41336  0.593648

The RMSEs of each learner are also stored in the rmses attribute. Further, the evaluate_learners() method allows to evalute customized evaluation metrics as e.g. the mean absolute error. The default option is still the RMSE for evaluation.

In [65]: print(dml_plr_obj.rmses)
{'ml_l': array([[1.17385178]]), 'ml_m': array([[1.03244552]])}

In [66]: print(dml_plr_obj.evaluate_learners())
{'ml_l': array([[1.17385178]]), 'ml_m': array([[1.03244552]])}

To evaluate a customized metric one has to define a callable. For some models (e.g. the IRM model) it is important that the metric can handle nan values as not all target values are known.

In [67]: from sklearn.metrics import mean_absolute_error

In [68]: def mae(y_true, y_pred):
   ....:     subset = np.logical_not(np.isnan(y_true))
   ....:     return mean_absolute_error(y_true[subset], y_pred[subset])
   ....: 

In [69]: dml_plr_obj.evaluate_learners(learners=['ml_l'], metric=mae)
Out[69]: {'ml_l': array([[0.95559917]])}

A more detailed notebook on the choice of learners is available in the example gallery.

7.1.5. Advanced: External Predictions#

Since there might be cases where the user wants to use a learner that is not supported by DoubleML or do some extensive hyperparameter tuning, it is possible to use external predictions for the nuisance functions. Remark that this requires the user to take care of the cross-fitting procedure and learner evaluation.

To illustrate the use of external predictions, we work with the following example.

In [70]: import numpy as np

In [71]: import doubleml as dml

In [72]: from doubleml.datasets import make_irm_data

In [73]: from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier

In [74]: np.random.seed(3333)

In [75]: data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type='DataFrame')

In [76]: obj_dml_data = dml.DoubleMLData(data, 'y', 'd')

# DoubleML with interal predictions
In [77]: ml_g = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)

In [78]: ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)

In [79]: dml_irm_obj = dml.DoubleMLIRM(obj_dml_data, ml_g, ml_m)

In [80]: dml_irm_obj.fit()
Out[80]: <doubleml.irm.irm.DoubleMLIRM at 0x7f31e5f90ac0>

In [81]: print(dml_irm_obj.summary)
       coef   std err         t     P>|t|     2.5 %    97.5 %
d  0.050856  0.664276  0.076559  0.938975 -1.251101  1.352813

The doubleml.DoubleMLIRM model class saves nuisance predictions in the predictions attribute as a nested dictionary. To rely on external predictions, the user has to provide a nested dictionary, where the outer level keys correspond to the treatment variable names and the inner level keys correspond to the nuisance learner names. Further the values have to be numpy arrays of shape (n_obs, n_rep). Here we generate an external predictions dictionary from the internal predictions attribute.

In [82]: pred_dict = {"d": {
   ....:     "ml_g0": dml_irm_obj.predictions["ml_g0"][:, :, 0],
   ....:     "ml_g1": dml_irm_obj.predictions["ml_g1"][:, :, 0],
   ....:     "ml_m": dml_irm_obj.predictions["ml_m"][:, :, 0]
   ....:     }
   ....: }
   ....: 

The external predictions can be passed to the fit() method of the doubleml.DoubleML class via the external_predictions argument.

In [83]: ml_g = dml.utils.DMLDummyRegressor()

In [84]: ml_m = dml.utils.DMLDummyClassifier()

In [85]: dml_irm_obj_ext = dml.DoubleMLIRM(obj_dml_data, ml_g, ml_m)

In [86]: dml_irm_obj_ext.fit(external_predictions=pred_dict)
Out[86]: <doubleml.irm.irm.DoubleMLIRM at 0x7f31e61fc610>

In [87]: print(dml_irm_obj_ext.summary)
       coef   std err         t     P>|t|     2.5 %    97.5 %
d  0.050856  0.664276  0.076559  0.938975 -1.251101  1.352813

Both model have identical estimates. Remark that doubleml.DoubleML class usually require learners for initialization. With external predictions these learners are not used. The DMLDummyRegressor and DMLDummyClassifier are dummy learners which are used to initialize the doubleml.DoubleML class. Both dummy learners raise errors if specific methods are called to safeguard against undesired behavior. Further, the doubleml.DoubleMLData class requires features (e.g. via the x_cols argument) which are not used. This can be handled by adding a dummy column to the data.

7.2. R: Learners and hyperparameters#

7.2.1. Minimum requirements for learners#

The minimum requirement for a learner to be used for nuisance models in the DoubleML package is

An interactive list of provided learners in the mlr3 and extension packages can be found on the website of the mlr3extralearners package.

7.2.2. Specifying learners and set hyperparameters#

The learners are set during initialization of the DoubleML model classes DoubleML::DoubleMLPLR, DoubleML::DoubleMLPLIV , DoubleML::DoubleMLIRM and DoubleML::DoubleMLIIVM. Lets simulate some data and consider the partially linear regression model. We need to specify learners for the nuisance functions \(g_0(X) = E[Y|X]\) and \(m_0(X) = E[D|X]\), for example LearnerRegrRanger (lrn("regr.ranger")) for regression with random forests based on the ranger package for R.

library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)
lgr::get_logger("mlr3")$set_threshold("warn")

# set up a mlr3 learner
learner = lrn("regr.ranger")
ml_l = learner$clone()
ml_m = learner$clone()
set.seed(3141)
data = make_plr_CCDDHNR2018(alpha=0.5, return_type='data.table')
obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l, ml_m)
dml_plr_obj$fit()
dml_plr_obj$summary()
Estimates and significance testing of the effect of target variables
  Estimate. Std. Error t value Pr(>|t|)    
d   0.57505    0.04458    12.9   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Without further specification of the hyperparameters, default values are used. To set hyperparameters:

  • We can also use pre-parametrized learners lrn("regr.ranger", num.trees=10).

  • Alternatively, hyperparameters can be set after initialization via the method set_ml_nuisance_params(learner, treat_var, params, set_fold_specific).

set.seed(3141)
ml_l = lrn("regr.ranger", num.trees=10)
ml_m = lrn("regr.ranger")
obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l, ml_m)
dml_plr_obj$fit()
dml_plr_obj$summary()

set.seed(3141)
ml_l = lrn("regr.ranger")
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l , ml_m)
dml_plr_obj$set_ml_nuisance_params("ml_l", "d", list("num.trees"=10))
dml_plr_obj$fit()
dml_plr_obj$summary()
Estimates and significance testing of the effect of target variables
  Estimate. Std. Error t value Pr(>|t|)    
d   0.58765    0.04532   12.97   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Estimates and significance testing of the effect of target variables
  Estimate. Std. Error t value Pr(>|t|)    
d   0.58765    0.04532   12.97   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Setting treatment-variable-specific or fold-specific hyperparameters:

  • In the multiple-treatment case, the method set_ml_nuisance_params(learner, treat_var, params, set_fold_specific) can be used to set different hyperparameters for different treatment variables.

  • The method set_ml_nuisance_params(learner, treat_var, params, set_fold_specific) accepts lists for params. The structure of the list depends on whether the same parameters should be provided for all folds or separate values are passed for specific folds.

  • Global parameter passing: The values in params are used for estimation on all folds. The named list in the argument params should have entries with names corresponding to the parameters of the learners. It is required that option set_fold_specific is set to FALSE (default).

  • Fold-specific parameter passing: params is a nested list. The outer list needs to be of length n_rep and the inner list of length n_folds. The innermost list must have named entries that correspond to the parameters of the learner. It is required that option set_fold_specific is set to TRUE. Moreover, fold-specific parameter passing is only supported, if all parameters are set fold-specific.

  • External setting of parameters will override previously set parameters. To assert the choice of parameters, access the fields $learner and $params.

set.seed(3141)
ml_l = lrn("regr.ranger")
ml_m = lrn("regr.ranger")
obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")

n_rep = 2
n_folds = 3
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l, ml_m, n_rep=n_rep, n_folds=n_folds)

# Set globally
params = list("num.trees"=10)
dml_plr_obj$set_ml_nuisance_params("ml_l", "d", params=params)
dml_plr_obj$set_ml_nuisance_params("ml_m", "d", params=params)
dml_plr_obj$learner
dml_plr_obj$params
dml_plr_obj$fit()
dml_plr_obj$summary()
$ml_l
<LearnerRegrRanger:regr.ranger>: Random Forest
* Model: -
* Parameters: num.threads=1
* Packages: mlr3, mlr3learners, ranger
* Predict Types:  [response], se
* Feature Types: logical, integer, numeric, character, factor, ordered
* Properties: hotstart_backward, importance, oob_error, weights

$ml_m
<LearnerRegrRanger:regr.ranger>: Random Forest
* Model: -
* Parameters: num.threads=1
* Packages: mlr3, mlr3learners, ranger
* Predict Types:  [response], se
* Feature Types: logical, integer, numeric, character, factor, ordered
* Properties: hotstart_backward, importance, oob_error, weights
$ml_l
$d = $num.trees = 10
$ml_m
$d = $num.trees = 10
Estimates and significance testing of the effect of target variables
  Estimate. Std. Error t value Pr(>|t|)    
d    0.5249     0.0459   11.43   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


The following example illustrates how to set parameters for each fold.

learner = lrn("regr.ranger")
ml_l = learner$clone()
ml_m = learner$clone()
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l, ml_m, n_rep=n_rep, n_folds=n_folds)

# Set values for each fold
params_exact = rep(list(rep(list(params), n_folds)), n_rep)
dml_plr_obj$set_ml_nuisance_params("ml_l", "d", params=params_exact,
                                     set_fold_specific=TRUE)
dml_plr_obj$set_ml_nuisance_params("ml_m", "d", params=params_exact,
                                     set_fold_specific=TRUE)
dml_plr_obj$learner
dml_plr_obj$params
dml_plr_obj$fit()
dml_plr_obj$summary()
$ml_l
<LearnerRegrRanger:regr.ranger>: Random Forest
* Model: -
* Parameters: num.threads=1
* Packages: mlr3, mlr3learners, ranger
* Predict Types:  [response], se
* Feature Types: logical, integer, numeric, character, factor, ordered
* Properties: hotstart_backward, importance, oob_error, weights

$ml_m
<LearnerRegrRanger:regr.ranger>: Random Forest
* Model: -
* Parameters: num.threads=1
* Packages: mlr3, mlr3learners, ranger
* Predict Types:  [response], se
* Feature Types: logical, integer, numeric, character, factor, ordered
* Properties: hotstart_backward, importance, oob_error, weights
$ml_l
$d =
    1. $num.trees = 10
    2. $num.trees = 10
    3. $num.trees = 10
    1. $num.trees = 10
    2. $num.trees = 10
    3. $num.trees = 10
$ml_m
$d =
    1. $num.trees = 10
    2. $num.trees = 10
    3. $num.trees = 10
    1. $num.trees = 10
    2. $num.trees = 10
    3. $num.trees = 10
Estimates and significance testing of the effect of target variables
  Estimate. Std. Error t value Pr(>|t|)    
d   0.49098    0.04415   11.12   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


7.2.3. Using pipelines to construct learners#

Users can also specify learners that have been constructed from a pipeline using the mlr3pipelines package. In general, pipelines can be used to perform data preprocessing, feature selection, combine learners and even to perform hyperparameter tuning. In the following, we provide two examples on how to construct a single learner and how to stack different learners via a pipeline. For a more detailed introduction to mlr3pipelines, we refer to the Pipelines Chapter in the mlr3book. Moreover, a notebook on how to use mlr3pipelines in combination with DoubleML is available in the example gallery.

library(mlr3pipelines)

set.seed(3141)
# Define random forest learner in a pipeline
single_learner_pipeline = po("learner", lrn("regr.ranger", num.trees = 10))

# Use pipeline to create a new instance of a learner
ml_g = as_learner(single_learner_pipeline)
ml_m = as_learner(single_learner_pipeline)

obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")

n_rep = 2
n_folds = 3
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_g, ml_m, n_rep=n_rep, n_folds=n_folds)
dml_plr_obj$learner
dml_plr_obj$fit()
dml_plr_obj$summary()

set.seed(3141)
# Define ensemble learner in a pipeline
ensemble_learner_pipeline = gunion(list(
        po("learner", lrn("regr.cv_glmnet", s = "lambda.min")),
        po("learner", lrn("regr.ranger")),
        po("learner", lrn("regr.rpart", cp = 0.01)))) %>>%
    po("regravg", 3)

# Use pipeline to create a new instance of a learner
ml_g = as_learner(ensemble_learner_pipeline)
ml_m = as_learner(ensemble_learner_pipeline)

obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")

n_rep = 2
n_folds = 3
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_g, ml_m, n_rep=n_rep, n_folds=n_folds)
dml_plr_obj$learner
dml_plr_obj$fit()
dml_plr_obj$summary()
$ml_l
<GraphLearner:regr.ranger>
* Model: -
* Parameters: regr.ranger.num.threads=1, regr.ranger.num.trees=10
* Validate: NULL
* Packages: mlr3, mlr3pipelines, mlr3learners, ranger
* Predict Types:  [response], se, distr
* Feature Types: logical, integer, numeric, character, factor, ordered,
  POSIXct
* Properties: featureless, hotstart_backward, hotstart_forward,
  importance, loglik, marshal, missings, oob_error, selected_features,
  weights

$ml_m
<GraphLearner:regr.ranger>
* Model: -
* Parameters: regr.ranger.num.threads=1, regr.ranger.num.trees=10
* Validate: NULL
* Packages: mlr3, mlr3pipelines, mlr3learners, ranger
* Predict Types:  [response], se, distr
* Feature Types: logical, integer, numeric, character, factor, ordered,
  POSIXct
* Properties: featureless, hotstart_backward, hotstart_forward,
  importance, loglik, marshal, missings, oob_error, selected_features,
  weights
Estimates and significance testing of the effect of target variables
  Estimate. Std. Error t value Pr(>|t|)    
d    0.5249     0.0459   11.43   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


$ml_l
<GraphLearner:regr.cv_glmnet.regr.ranger.regr.rpart.regravg>
* Model: -
* Parameters: regr.cv_glmnet.family=gaussian,
  regr.cv_glmnet.s=lambda.min, regr.ranger.num.threads=1,
  regr.rpart.cp=0.01, regr.rpart.xval=0, regravg.weights=1
* Validate: NULL
* Packages: mlr3, mlr3pipelines, mlr3learners, glmnet, ranger, rpart
* Predict Types:  [response], se, distr
* Feature Types: logical, integer, numeric, character, factor, ordered,
  POSIXct
* Properties: featureless, hotstart_backward, hotstart_forward,
  importance, loglik, marshal, missings, oob_error, selected_features,
  weights

$ml_m
<GraphLearner:regr.cv_glmnet.regr.ranger.regr.rpart.regravg>
* Model: -
* Parameters: regr.cv_glmnet.family=gaussian,
  regr.cv_glmnet.s=lambda.min, regr.ranger.num.threads=1,
  regr.rpart.cp=0.01, regr.rpart.xval=0, regravg.weights=1
* Validate: NULL
* Packages: mlr3, mlr3pipelines, mlr3learners, glmnet, ranger, rpart
* Predict Types:  [response], se, distr
* Feature Types: logical, integer, numeric, character, factor, ordered,
  POSIXct
* Properties: featureless, hotstart_backward, hotstart_forward,
  importance, loglik, marshal, missings, oob_error, selected_features,
  weights
Estimates and significance testing of the effect of target variables
  Estimate. Std. Error t value Pr(>|t|)    
d   0.55173    0.04631   11.91   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


7.2.4. Hyperparameter tuning#

Parameter tuning of learners for the nuisance functions of DoubleML models can be done via the tune() method. The tune() method passes various options and parameters to the tuning interface provided by the mlr3tuning package. The mlr3 book provides a step-by-step introduction to parameter tuning.

To illustrate the parameter tuning, we generate data from a sparse partially linear regression model.

library(DoubleML)
library(mlr3)
library(data.table)

set.seed(3141)
n_obs = 200
n_vars = 200
theta = 3
X = matrix(stats::rnorm(n_obs * n_vars), nrow = n_obs, ncol = n_vars)
d = X[, 1:3, drop = FALSE] %*% c(5, 5, 5) + stats::rnorm(n_obs)
y = theta * d + X[, 1:3, drop = FALSE] %*% c(5, 5, 5)  + stats::rnorm(n_obs)
dml_data = double_ml_data_from_matrix(X = X, y = y, d = d)

The hyperparameter-tuning is performed according to options passed through a named list tune_settings. The entries in the list specify options during parameter tuning with mlr3tuning:

  • terminator is a Terminator object passed to mlr3tuning that manages the budget to solve the tuning problem.

  • algorithm is an object of class Tuner and specifies the tuning algorithm. Alternatively, algorithm can be a character() that is used as an argument in the wrapper mlr3tuning call tnr(algorithm). The corresponding chapter in the mlr3book illustrates how the Tuner class supports grid search, random search, generalized simulated annealing and non-linear optimization.

  • rsmp_tune is an object of class mlr3 resampling that specifies the resampling method for evaluation, for example rsmp(“cv”, folds = 5) implements 5-fold cross-validation. rsmp(“holdout”, ratio = 0.8) implements an evaluation based on a hold-out sample that contains 20 percent of the observations. By default, 5-fold cross-validation is performed.

  • measure is a named list containing the measures used for tuning of the nuisance components. The names of the entries must match the learner names (see method learner_names()). The entries in the list must either be objects of class Measure or keys passed to msr(). If measure is not provided by the user, default measures are used, i.e., mean squared error for regression models and classification error for binary outcomes.

In the following example, we tune the penalty parameter \(\lambda\) (lambda) for lasso with the R package glmnet. To tune the value of lambda, a grid search is performed over a grid of values that range from 0.05 to 0.1 at a resolution of 10. Using a resolution of 10 splits the grid of values in 10 equally spaced values ranging from a minimum of 0.05 to a maximum of 0.1. To evaluate the predictive performance in both nuisance parts, the cross-validated mean squared error is used.

Setting the option tune_on_folds=FALSE, the tuning is performed on the whole sample. Hence, the cross-validated errors are obtained from a random split of the whole sample into 5 folds. As a result, one set of lambda values are obtained which are later used in the fitting stage for all folds.

Alternatively, setting the option tune_on_folds=TRUE would assign the tuning resampling scheme rsmp_tune to each fold. For example, if we set n_folds=2 at initialization of the DoubleMLPLR object and use a 5-fold cross-validated error for tuning, each of the two folds would be split up into 5 subfolds and the error would be evaluated on these subfolds.

library(DoubleML)
library(mlr3)
library(data.table)
library(mlr3learners)
library(mlr3tuning)
library(paradox)
lgr::get_logger("mlr3")$set_threshold("warn")
lgr::get_logger("bbotk")$set_threshold("warn")

set.seed(1234)
ml_l = lrn("regr.glmnet")
ml_m = lrn("regr.glmnet")
dml_plr_obj = DoubleMLPLR$new(dml_data, ml_l, ml_m)

par_grids = list(
  "ml_l" = ps(lambda = p_dbl(lower = 0.05, upper = 0.1)),
  "ml_m" = ps(lambda = p_dbl(lower = 0.05, upper = 0.1)))

tune_settings = list(terminator = trm("evals", n_evals = 100),
                      algorithm = tnr("grid_search", resolution = 10),
                      rsmp_tune = rsmp("cv", folds = 5),
                      measure = list("ml_l" = msr("regr.mse"),
                                     "ml_m" = msr("regr.mse")))
dml_plr_obj$tune(param_set=par_grids, tune_settings=tune_settings, tune_on_fold=TRUE)
dml_plr_obj$params

dml_plr_obj$fit()
dml_plr_obj$summary()
$ml_l
$d =
    1. $family
      'gaussian'
      $lambda
      0.1
    2. $family
      'gaussian'
      $lambda
      0.0777777777777778
    3. $family
      'gaussian'
      $lambda
      0.1
    4. $family
      'gaussian'
      $lambda
      0.1
    5. $family
      'gaussian'
      $lambda
      0.1
$ml_m
$d =
    1. $family
      'gaussian'
      $lambda
      0.1
    2. $family
      'gaussian'
      $lambda
      0.1
    3. $family
      'gaussian'
      $lambda
      0.1
    4. $family
      'gaussian'
      $lambda
      0.1
    5. $family
      'gaussian'
      $lambda
      0.1
Estimates and significance testing of the effect of target variables
  Estimate. Std. Error t value Pr(>|t|)    
d    3.0425     0.1424   21.37   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Hyperparameter tuning can also be done with more sophisticated methods, for example by using built-in tuning paths of learners. For example, the learner regr.cv_glmnet performs an internal cross-validated choice of the parameter lambda. Alternatively, the powerful functionalities of the mlr3tuning package can be used for external parameter tuning of the nuisance parts. The optimally chosen parameters can then be passed to the DoubleML models using the set_ml_nuisance_params() method.

library(DoubleML)
library(mlr3)
library(data.table)
library(mlr3learners)
library(mlr3tuning)
lgr::get_logger("mlr3")$set_threshold("warn")
lgr::get_logger("bbotk")$set_threshold("warn")

set.seed(1234)
ml_l = lrn("regr.cv_glmnet", s="lambda.min")
ml_m = lrn("regr.cv_glmnet", s="lambda.min")
dml_plr_obj = DoubleMLPLR$new(dml_data, ml_l, ml_m)

dml_plr_obj$fit()
dml_plr_obj$summary()
Estimates and significance testing of the effect of target variables
  Estimate. Std. Error t value Pr(>|t|)    
d   3.08848    0.07366   41.93   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


The following code chunk illustrates another example for global parameter tuning with random forests as provided by the ranger package. In this example, we use random search to find optimal parameters mtry and max.depth of a random forest. Evaluation is based on 3-fold cross-validation.

library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)
library(mlr3tuning)
library(paradox)
lgr::get_logger("mlr3")$set_threshold("warn")
lgr::get_logger("bbotk")$set_threshold("warn")

# set up a mlr3 learner
learner = lrn("regr.ranger")
ml_l = learner$clone()
ml_m = learner$clone()

set.seed(3141)
obj_dml_data = make_plr_CCDDHNR2018(alpha=0.5)
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l, ml_m)

# set up a list of parameter grids
param_grid = list("ml_l" = ps(mtry = p_int(lower = 2 , upper = 20),
                              max.depth = p_int(lower = 2, upper = 5)),
                  "ml_m" = ps(mtry = p_int(lower = 2 , upper = 20),
                              max.depth = p_int(lower = 2, upper = 5)))

tune_settings = list(terminator = mlr3tuning::trm("evals", n_evals = 20),
                      algorithm = tnr("random_search"),
                      rsmp_tune = rsmp("cv", folds = 3),
                      measure = list("ml_l" = msr("regr.mse"),
                                     "ml_m" = msr("regr.mse")))
dml_plr_obj$tune(param_set=param_grid, tune_settings=tune_settings, tune_on_folds=FALSE)
dml_plr_obj$params

dml_plr_obj$fit()
dml_plr_obj$summary()
$ml_l
$d =
$num.threads
1
$mtry
10
$max.depth
5
$ml_m
$d =
$num.threads
1
$mtry
17
$max.depth
3
Estimates and significance testing of the effect of target variables
  Estimate. Std. Error t value Pr(>|t|)    
d   0.55348    0.04559   12.14   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


7.2.5. Hyperparameter tuning with pipelines#

As an alternative to the previously presented tuning approach, it is possible to base the parameter tuning on a pipeline as provided by the mlr3pipelines package. The basic idea of this approach is to define a learner via a pipeline and then perform the tuning via the tune(). We will shortly repeat the lasso example from above. In general, the pipeline-based approach can be used to find optimal values not only for the parameters of one or multiple learners, but also for other parameters, which are, for example, involved in the data preprocessing. We refer to more details provided in the Pipelines Chapter in the mlr3book.

library(DoubleML)
library(mlr3)
library(mlr3tuning)
library(mlr3pipelines)
lgr::get_logger("mlr3")$set_threshold("warn")
lgr::get_logger("bbotk")$set_threshold("warn")

# Define learner in a pipeline
set.seed(1234)
lasso_pipe = po("learner",
    learner = lrn("regr.glmnet"))
ml_g = as_learner(lasso_pipe)
ml_m = as_learner(lasso_pipe)

# Instantiate a DoubleML object
dml_plr_obj = DoubleMLPLR$new(dml_data, ml_g, ml_m)

# Parameter grid for lambda
par_grids = ps(regr.glmnet.lambda = p_dbl(lower = 0.05, upper = 0.1))

tune_settings = list(terminator = trm("evals", n_evals = 100),
                     algorithm = tnr("grid_search", resolution = 10),
                     rsmp_tune = rsmp("cv", folds = 5),
                     measure = list("ml_g" = msr("regr.mse"),
                                    "ml_m" = msr("regr.mse")))
dml_plr_obj$tune(param_set = list("ml_g" = par_grids,
                                  "ml_m" = par_grids),
                                  tune_settings=tune_settings,
                                  tune_on_fold=TRUE)
dml_plr_obj$fit()
dml_plr_obj$summary()
Estimates and significance testing of the effect of target variables
  Estimate. Std. Error t value Pr(>|t|)    
d    3.0425     0.1424   21.37   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


7.2.5.1. References#

  • Lang, M., Binder, M., Richter, J., Schratz, P., Pfisterer, F., Coors, S., Au, Q., Casalicchio, G., Kotthoff, L., Bischl, B. (2019), mlr3: A modern object-oriented machine learing framework in R. Journal of Open Source Software, doi:10.21105/joss.01903.

  • Becker, M., Binder, M., Bischl, B., Lang, M., Pfisterer, F., Reich, N.G., Richter, J., Schratz, P., Sonabend, R. (2020), mlr3 book, available at https://mlr3book.mlr-org.com.