7. Learners, hyperparameters and hyperparameter tuning#
The estimation of a double/debiased machine learning model involves the estimation of several nuisance function with machine learning estimators. Such learners are implemented in various Python and R packages. The implementation of DoubleML is based on the meta-packages scikit-learn for Python and mlr3 for R. The interfaces to specify the learners, set hyperparameters and tune hyperparameters are described in the following separately for Python and R.
7.1. Python: Learners and hyperparameters#
7.1.1. Minimum requirements for learners#
The minimum requirement for a learner to be used for nuisance models in the DoubleML package is
The implementation of a
fit()
andpredict()
method. Some models, likedoubleml.DoubleMLIRM
anddoubleml.DoubleMLIIVM
require classifiers.In case of classifiers, the learner needs to come with a
predict_proba()
instead of, or in addition to, apredict()
method, see for examplesklearn.ensemble.RandomForestClassifier.predict_proba()
.In order to be able to use the
set_ml_nuisance_params()
method of DoubleML classes the learner additionally needs to come with aset_params()
method, see for examplesklearn.ensemble.RandomForestRegressor.set_params()
.We further rely on the function
sklearn.base.clone()
which adds the requirement of aget_params()
method for a learner in order to be used for nuisance models of DoubleML model classes.
Most learners from scikit-learn satisfy all these minimum requirements.
7.1.2. Specifying learners and set hyperparameters#
The learners are set during initialization of the DoubleML model classes
doubleml.DoubleMLPLR
, doubleml.DoubleMLPLIV
,
doubleml.DoubleMLIRM
and doubleml.DoubleMLIIVM
.
Lets simulate some data and consider the partially linear regression model.
We need to specify learners for the nuisance functions \(g_0(X) = E[Y|X]\) and \(m_0(X) = E[D|X]\),
for example sklearn.ensemble.RandomForestRegressor
.
In [1]: import doubleml as dml
In [2]: from doubleml.datasets import make_plr_CCDDHNR2018
In [3]: from sklearn.ensemble import RandomForestRegressor
In [4]: np.random.seed(1234)
In [5]: ml_l = RandomForestRegressor()
In [6]: ml_m = RandomForestRegressor()
In [7]: data = make_plr_CCDDHNR2018(alpha=0.5, return_type='DataFrame')
In [8]: obj_dml_data = dml.DoubleMLData(data, 'y', 'd')
In [9]: dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_l, ml_m)
In [10]: dml_plr_obj.fit().summary
Out[10]:
coef std err t P>|t| 2.5 % 97.5 %
d 0.503504 0.045993 10.947466 6.833227e-28 0.41336 0.593648
Without further specification of the hyperparameters, default values are used. To set hyperparameters:
We can also use pre-parametrized learners, like
RandomForestRegressor(n_estimators=10)
.Alternatively, hyperparameters can also be set after initialization via the method
set_ml_nuisance_params(learner, treat_var, params)
In [11]: np.random.seed(1234)
In [12]: dml_plr_obj = dml.DoubleMLPLR(obj_dml_data,
....: RandomForestRegressor(n_estimators=10),
....: RandomForestRegressor())
....:
In [13]: print(dml_plr_obj.fit().summary)
coef std err t P>|t| 2.5 % 97.5 %
d 0.53257 0.046922 11.350165 7.402301e-30 0.440605 0.624535
In [14]: np.random.seed(1234)
In [15]: dml_plr_obj = dml.DoubleMLPLR(obj_dml_data,
....: RandomForestRegressor(),
....: RandomForestRegressor())
....:
In [16]: dml_plr_obj.set_ml_nuisance_params('ml_l', 'd', {'n_estimators': 10});
In [17]: print(dml_plr_obj.fit().summary)
coef std err t P>|t| 2.5 % 97.5 %
d 0.53257 0.046922 11.350165 7.402301e-30 0.440605 0.624535
Setting treatment-variable-specific or fold-specific hyperparameters:
In the multiple-treatment case, the method
set_ml_nuisance_params(learner, treat_var, params)
can be used to set different hyperparameters for different treatment variables.The method
set_ml_nuisance_params(learner, treat_var, params)
accepts dicts and lists forparams
. A dict should be provided if for each fold the same hyperparameters should be used. Fold-specific parameters are supported. To do so, provide a nested list asparams
, where the outer list is of lengthn_rep
and the inner list of lengthn_folds
.
7.1.3. Hyperparameter tuning#
Parameter tuning of learners for the nuisance functions of DoubleML models can be done via
the tune()
method.
To illustrate the parameter tuning, we generate data from a sparse partially linear regression model.
In [18]: import doubleml as dml
In [19]: import numpy as np
In [20]: np.random.seed(3141)
In [21]: n_obs = 200
In [22]: n_vars = 200
In [23]: theta = 3
In [24]: X = np.random.normal(size=(n_obs, n_vars))
In [25]: d = np.dot(X[:, :3], np.array([5, 5, 5])) + np.random.standard_normal(size=(n_obs,))
In [26]: y = theta * d + np.dot(X[:, :3], np.array([5, 5, 5])) + np.random.standard_normal(size=(n_obs,))
In [27]: dml_data = dml.DoubleMLData.from_arrays(X, y, d)
The hyperparameter-tuning is performed using either an exhaustive search over specified parameter values
implemented in sklearn.model_selection.GridSearchCV
or via a randomized search implemented in
sklearn.model_selection.RandomizedSearchCV
.
In [28]: import doubleml as dml
In [29]: from sklearn.linear_model import Lasso
In [30]: ml_l = Lasso()
In [31]: ml_m = Lasso()
In [32]: dml_plr_obj = dml.DoubleMLPLR(dml_data, ml_l, ml_m)
In [33]: par_grids = {'ml_l': {'alpha': np.arange(0.05, 1., 0.1)},
....: 'ml_m': {'alpha': np.arange(0.05, 1., 0.1)}}
....:
In [34]: dml_plr_obj.tune(par_grids, search_mode='grid_search');
In [35]: print(dml_plr_obj.params)
{'ml_l': {'d': [[{'alpha': np.float64(0.45000000000000007)}, {'alpha': np.float64(0.45000000000000007)}, {'alpha': np.float64(0.45000000000000007)}, {'alpha': np.float64(0.45000000000000007)}, {'alpha': np.float64(0.45000000000000007)}]]}, 'ml_m': {'d': [[{'alpha': np.float64(0.15000000000000002)}, {'alpha': np.float64(0.15000000000000002)}, {'alpha': np.float64(0.15000000000000002)}, {'alpha': np.float64(0.15000000000000002)}, {'alpha': np.float64(0.15000000000000002)}]]}}
In [36]: print(dml_plr_obj.fit().summary)
coef std err t P>|t| 2.5 % 97.5 %
d 3.031134 0.071777 42.229759 0.0 2.890454 3.171815
In [37]: np.random.seed(1234)
In [38]: par_grids = {'ml_l': {'alpha': np.arange(0.05, 1., 0.01)},
....: 'ml_m': {'alpha': np.arange(0.05, 1., 0.01)}}
....:
In [39]: dml_plr_obj.tune(par_grids, search_mode='randomized_search', n_iter_randomized_search=20);
In [40]: print(dml_plr_obj.params)
{'ml_l': {'d': [[{'alpha': np.float64(0.4000000000000001)}, {'alpha': np.float64(0.4000000000000001)}, {'alpha': np.float64(0.4000000000000001)}, {'alpha': np.float64(0.4000000000000001)}, {'alpha': np.float64(0.4000000000000001)}]]}, 'ml_m': {'d': [[{'alpha': np.float64(0.09000000000000001)}, {'alpha': np.float64(0.09000000000000001)}, {'alpha': np.float64(0.09000000000000001)}, {'alpha': np.float64(0.09000000000000001)}, {'alpha': np.float64(0.09000000000000001)}]]}}
In [41]: print(dml_plr_obj.fit().summary)
coef std err t P>|t| 2.5 % 97.5 %
d 2.96582 0.086679 34.216207 1.388216e-256 2.795932 3.135707
Hyperparameter tuning can also be done with more sophisticated methods, like for example an iterative fitting along
a regularization path implemented in sklearn.linear_model.LassoCV
.
In this case the tuning should be done externally and the parameters can then be set via the
set_ml_nuisance_params()
method.
In [42]: import doubleml as dml
In [43]: from sklearn.linear_model import LassoCV
In [44]: np.random.seed(1234)
In [45]: ml_l_tune = LassoCV().fit(dml_data.x, dml_data.y)
In [46]: ml_m_tune = LassoCV().fit(dml_data.x, dml_data.d)
In [47]: ml_l = Lasso()
In [48]: ml_m = Lasso()
In [49]: dml_plr_obj = dml.DoubleMLPLR(dml_data, ml_l, ml_m)
In [50]: dml_plr_obj.set_ml_nuisance_params('ml_l', 'd', {'alpha': ml_l_tune.alpha_});
In [51]: dml_plr_obj.set_ml_nuisance_params('ml_m', 'd', {'alpha': ml_m_tune.alpha_});
In [52]: print(dml_plr_obj.params)
{'ml_l': {'d': [[{'alpha': np.float64(0.4311947070055128)}, {'alpha': np.float64(0.4311947070055128)}, {'alpha': np.float64(0.4311947070055128)}, {'alpha': np.float64(0.4311947070055128)}, {'alpha': np.float64(0.4311947070055128)}]]}, 'ml_m': {'d': [[{'alpha': np.float64(0.14281403493938022)}, {'alpha': np.float64(0.14281403493938022)}, {'alpha': np.float64(0.14281403493938022)}, {'alpha': np.float64(0.14281403493938022)}, {'alpha': np.float64(0.14281403493938022)}]]}}
In [53]: print(dml_plr_obj.fit().summary)
coef std err t P>|t| 2.5 % 97.5 %
d 3.048723 0.075869 40.183855 0.0 2.900021 3.197424
7.1.4. Evaluate learners#
To compare different learners it is possible to evaluate the out-of-sample performance of each learner. The summary
already displays either the root-mean-squared error (for regressions) or log-loss (for classifications) for each learner
and each corresponding repetition of cross-fitting (n_rep
argument).
To illustrate the parameter tuning, we work with the following example.
In [54]: import doubleml as dml
In [55]: from doubleml.datasets import make_plr_CCDDHNR2018
In [56]: from sklearn.ensemble import RandomForestRegressor
In [57]: np.random.seed(1234)
In [58]: ml_l = RandomForestRegressor()
In [59]: ml_m = RandomForestRegressor()
In [60]: data = make_plr_CCDDHNR2018(alpha=0.5, return_type='DataFrame')
In [61]: obj_dml_data = dml.DoubleMLData(data, 'y', 'd')
In [62]: dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_l, ml_m)
In [63]: dml_plr_obj.fit()
Out[63]: <doubleml.plm.plr.DoubleMLPLR at 0x7fb2035ee510>
In [64]: print(dml_plr_obj)
================== DoubleMLPLR Object ==================
------------------ Data summary ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7', 'X8', 'X9', 'X10', 'X11', 'X12', 'X13', 'X14', 'X15', 'X16', 'X17', 'X18', 'X19', 'X20']
Instrument variable(s): None
No. Observations: 500
------------------ Score & algorithm ------------------
Score function: partialling out
------------------ Machine learner ------------------
Learner ml_l: RandomForestRegressor()
Learner ml_m: RandomForestRegressor()
Out-of-sample Performance:
Regression:
Learner ml_l RMSE: [[1.17385178]]
Learner ml_m RMSE: [[1.03244552]]
------------------ Resampling ------------------
No. folds: 5
No. repeated sample splits: 1
------------------ Fit summary ------------------
coef std err t P>|t| 2.5 % 97.5 %
d 0.503504 0.045993 10.947466 6.833227e-28 0.41336 0.593648
The loss of each learner are also stored in the nuisance_loss
attribute.
Further, the evaluate_learners()
method allows to evalute customized evaluation metrics as e.g. the mean absolute error.
The default option is still the root-mean-squared error for evaluation.
In [65]: print(dml_plr_obj.nuisance_loss)
{'ml_l': array([[1.17385178]]), 'ml_m': array([[1.03244552]])}
In [66]: print(dml_plr_obj.evaluate_learners())
{'ml_l': array([[1.17385178]]), 'ml_m': array([[1.03244552]])}
To evaluate a customized metric one has to define a callable
. For some models (e.g. the IRM model) it is important that
the metric can handle nan
values as not all target values are known.
In [67]: from sklearn.metrics import mean_absolute_error
In [68]: def mae(y_true, y_pred):
....: subset = np.logical_not(np.isnan(y_true))
....: return mean_absolute_error(y_true[subset], y_pred[subset])
....:
In [69]: dml_plr_obj.evaluate_learners(learners=['ml_l'], metric=mae)
Out[69]: {'ml_l': array([[0.95559917]])}
A more detailed notebook on the choice of learners is available in the example gallery.
7.1.5. Advanced: External Predictions#
Since there might be cases where the user wants to use a learner that is not supported by DoubleML or do some extensive hyperparameter tuning, it is possible to use external predictions for the nuisance functions. Remark that this requires the user to take care of the cross-fitting procedure and learner evaluation.
To illustrate the use of external predictions, we work with the following example.
In [70]: import numpy as np
In [71]: import doubleml as dml
In [72]: from doubleml.datasets import make_irm_data
In [73]: from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
In [74]: np.random.seed(3333)
In [75]: data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type='DataFrame')
In [76]: obj_dml_data = dml.DoubleMLData(data, 'y', 'd')
# DoubleML with interal predictions
In [77]: ml_g = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
In [78]: ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
In [79]: dml_irm_obj = dml.DoubleMLIRM(obj_dml_data, ml_g, ml_m)
In [80]: dml_irm_obj.fit()
Out[80]: <doubleml.irm.irm.DoubleMLIRM at 0x7fb20369a750>
In [81]: print(dml_irm_obj.summary)
coef std err t P>|t| 2.5 % 97.5 %
d 0.050856 0.664276 0.076559 0.938975 -1.251101 1.352813
The doubleml.DoubleMLIRM
model class saves nuisance predictions in the predictions
attribute as a nested dictionary.
To rely on external predictions, the user has to provide a nested dictionary, where the outer level keys correspond to the treatment
variable names and the inner level keys correspond to the nuisance learner names. Further the values have to be numpy
arrays of shape
(n_obs, n_rep)
. Here we generate an external predictions dictionary from the internal predictions
attribute.
In [82]: pred_dict = {"d": {
....: "ml_g0": dml_irm_obj.predictions["ml_g0"][:, :, 0],
....: "ml_g1": dml_irm_obj.predictions["ml_g1"][:, :, 0],
....: "ml_m": dml_irm_obj.predictions["ml_m"][:, :, 0]
....: }
....: }
....:
The external predictions can be passed to the fit()
method of the doubleml.DoubleML
class via the external_predictions
argument.
In [83]: ml_g = dml.utils.DMLDummyRegressor()
In [84]: ml_m = dml.utils.DMLDummyClassifier()
In [85]: dml_irm_obj_ext = dml.DoubleMLIRM(obj_dml_data, ml_g, ml_m)
In [86]: dml_irm_obj_ext.fit(external_predictions=pred_dict)
Out[86]: <doubleml.irm.irm.DoubleMLIRM at 0x7fb2087c2870>
In [87]: print(dml_irm_obj_ext.summary)
coef std err t P>|t| 2.5 % 97.5 %
d 0.050856 0.664276 0.076559 0.938975 -1.251101 1.352813
Both model have identical estimates. Remark that doubleml.DoubleML
class usually require learners for initialization.
With external predictions these learners are not used. The DMLDummyRegressor
and DMLDummyClassifier
are dummy learners which
are used to initialize the doubleml.DoubleML
class. Both dummy learners raise errors if specific methods are called to safeguard against
undesired behavior. Further, the doubleml.DoubleMLData
class requires features (e.g. via the x_cols
argument) which are not used.
This can be handled by adding a dummy column to the data.
7.2. R: Learners and hyperparameters#
7.2.1. Minimum requirements for learners#
The minimum requirement for a learner to be used for nuisance models in the DoubleML package is
The implementation as a learner for regression or classification in the mlr3 package or its extension packages mlr3learners and mlr3extralearners . A guide on how to add a learner is provided in the chapter on extending learners in the mlr3 book .
The mlr3 package makes sure that the learners satisfy some core functionalities. To specify a specific learner in DoubleML users can pass objects of the class Learner. A fast way to construct these objects is to use the mlr3 function lrn(). An introduction to learners in mlr3 is provided in the chapter on learners of the mlr3 book.
It is also possible to pass learners that have been constructed from a pipeline with the mlr3pipelines package.
The models DoubleML::DoubleMLIRM and DoubleML::DoubleMLIIVM require classifiers. Users can also specify classifiers in the DoubleML::DoubleMLPLR in cases with binary treatment variables.
Hyperparameters of learners can either be set at instantiation in mlr3 or after instantiation using the
set_ml_nuisance_params()
method.
An interactive list of provided learners in the mlr3 and extension packages can be found on the website of the mlr3extralearners package.
7.2.2. Specifying learners and set hyperparameters#
The learners are set during initialization of the DoubleML model classes
DoubleML::DoubleMLPLR,
DoubleML::DoubleMLPLIV ,
DoubleML::DoubleMLIRM
and DoubleML::DoubleMLIIVM.
Lets simulate some data and consider the partially linear regression model.
We need to specify learners for the nuisance functions \(g_0(X) = E[Y|X]\) and \(m_0(X) = E[D|X]\),
for example LearnerRegrRanger
(lrn("regr.ranger")
) for regression with random forests based on the ranger
package for R.
library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)
lgr::get_logger("mlr3")$set_threshold("warn")
# set up a mlr3 learner
learner = lrn("regr.ranger")
ml_l = learner$clone()
ml_m = learner$clone()
set.seed(3141)
data = make_plr_CCDDHNR2018(alpha=0.5, return_type='data.table')
obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l, ml_m)
dml_plr_obj$fit()
dml_plr_obj$summary()
Estimates and significance testing of the effect of target variables
Estimate. Std. Error t value Pr(>|t|)
d 0.5748 0.0445 12.92 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Without further specification of the hyperparameters, default values are used. To set hyperparameters:
We can also use pre-parametrized learners
lrn("regr.ranger", num.trees=10)
.Alternatively, hyperparameters can be set after initialization via the method
set_ml_nuisance_params(learner, treat_var, params, set_fold_specific)
.
set.seed(3141)
ml_l = lrn("regr.ranger", num.trees=10)
ml_m = lrn("regr.ranger")
obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l, ml_m)
dml_plr_obj$fit()
dml_plr_obj$summary()
set.seed(3141)
ml_l = lrn("regr.ranger")
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l , ml_m)
dml_plr_obj$set_ml_nuisance_params("ml_l", "d", list("num.trees"=10))
dml_plr_obj$fit()
dml_plr_obj$summary()
Estimates and significance testing of the effect of target variables
Estimate. Std. Error t value Pr(>|t|)
d 0.58812 0.04502 13.06 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Estimates and significance testing of the effect of target variables
Estimate. Std. Error t value Pr(>|t|)
d 0.58812 0.04502 13.06 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Setting treatment-variable-specific or fold-specific hyperparameters:
In the multiple-treatment case, the method
set_ml_nuisance_params(learner, treat_var, params, set_fold_specific)
can be used to set different hyperparameters for different treatment variables.The method
set_ml_nuisance_params(learner, treat_var, params, set_fold_specific)
accepts lists forparams
. The structure of the list depends on whether the same parameters should be provided for all folds or separate values are passed for specific folds.Global parameter passing: The values in
params
are used for estimation on all folds. The named list in the argumentparams
should have entries with names corresponding to the parameters of the learners. It is required that optionset_fold_specific
is set toFALSE
(default).Fold-specific parameter passing:
params
is a nested list. The outer list needs to be of lengthn_rep
and the inner list of lengthn_folds
. The innermost list must have named entries that correspond to the parameters of the learner. It is required that optionset_fold_specific
is set toTRUE
. Moreover, fold-specific parameter passing is only supported, if all parameters are set fold-specific.External setting of parameters will override previously set parameters. To assert the choice of parameters, access the fields
$learner
and$params
.
set.seed(3141)
ml_l = lrn("regr.ranger")
ml_m = lrn("regr.ranger")
obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")
n_rep = 2
n_folds = 3
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l, ml_m, n_rep=n_rep, n_folds=n_folds)
# Set globally
params = list("num.trees"=10)
dml_plr_obj$set_ml_nuisance_params("ml_l", "d", params=params)
dml_plr_obj$set_ml_nuisance_params("ml_m", "d", params=params)
dml_plr_obj$learner
dml_plr_obj$params
dml_plr_obj$fit()
dml_plr_obj$summary()
$ml_l
<LearnerRegrRanger:regr.ranger>: Random Forest
* Model: -
* Parameters: num.threads=1
* Packages: mlr3, mlr3learners, ranger
* Predict Types: [response], se, quantiles
* Feature Types: logical, integer, numeric, character, factor, ordered
* Properties: hotstart_backward, importance, missings, oob_error,
weights
$ml_m
<LearnerRegrRanger:regr.ranger>: Random Forest
* Model: -
* Parameters: num.threads=1
* Packages: mlr3, mlr3learners, ranger
* Predict Types: [response], se, quantiles
* Feature Types: logical, integer, numeric, character, factor, ordered
* Properties: hotstart_backward, importance, missings, oob_error,
weights
- $ml_l
- $d = $num.trees = 10
- $ml_m
- $d = $num.trees = 10
Estimates and significance testing of the effect of target variables
Estimate. Std. Error t value Pr(>|t|)
d 0.52732 0.04586 11.5 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The following example illustrates how to set parameters for each fold.
learner = lrn("regr.ranger")
ml_l = learner$clone()
ml_m = learner$clone()
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l, ml_m, n_rep=n_rep, n_folds=n_folds)
# Set values for each fold
params_exact = rep(list(rep(list(params), n_folds)), n_rep)
dml_plr_obj$set_ml_nuisance_params("ml_l", "d", params=params_exact,
set_fold_specific=TRUE)
dml_plr_obj$set_ml_nuisance_params("ml_m", "d", params=params_exact,
set_fold_specific=TRUE)
dml_plr_obj$learner
dml_plr_obj$params
dml_plr_obj$fit()
dml_plr_obj$summary()
$ml_l
<LearnerRegrRanger:regr.ranger>: Random Forest
* Model: -
* Parameters: num.threads=1
* Packages: mlr3, mlr3learners, ranger
* Predict Types: [response], se, quantiles
* Feature Types: logical, integer, numeric, character, factor, ordered
* Properties: hotstart_backward, importance, missings, oob_error,
weights
$ml_m
<LearnerRegrRanger:regr.ranger>: Random Forest
* Model: -
* Parameters: num.threads=1
* Packages: mlr3, mlr3learners, ranger
* Predict Types: [response], se, quantiles
* Feature Types: logical, integer, numeric, character, factor, ordered
* Properties: hotstart_backward, importance, missings, oob_error,
weights
- $ml_l
- $d =
- $num.trees = 10
- $num.trees = 10
- $num.trees = 10
- $num.trees = 10
- $num.trees = 10
- $num.trees = 10
- $ml_m
- $d =
- $num.trees = 10
- $num.trees = 10
- $num.trees = 10
- $num.trees = 10
- $num.trees = 10
- $num.trees = 10
Estimates and significance testing of the effect of target variables
Estimate. Std. Error t value Pr(>|t|)
d 0.49693 0.04387 11.33 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
7.2.3. Using pipelines to construct learners#
Users can also specify learners that have been constructed from a pipeline using the mlr3pipelines package. In general, pipelines can be used to perform data preprocessing, feature selection, combine learners and even to perform hyperparameter tuning. In the following, we provide two examples on how to construct a single learner and how to stack different learners via a pipeline. For a more detailed introduction to mlr3pipelines, we refer to the Pipelines Chapter in the mlr3book. Moreover, a notebook on how to use mlr3pipelines in combination with DoubleML is available in the example gallery.
library(mlr3pipelines)
set.seed(3141)
# Define random forest learner in a pipeline
single_learner_pipeline = po("learner", lrn("regr.ranger", num.trees = 10))
# Use pipeline to create a new instance of a learner
ml_g = as_learner(single_learner_pipeline)
ml_m = as_learner(single_learner_pipeline)
obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")
n_rep = 2
n_folds = 3
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_g, ml_m, n_rep=n_rep, n_folds=n_folds)
dml_plr_obj$learner
dml_plr_obj$fit()
dml_plr_obj$summary()
set.seed(3141)
# Define ensemble learner in a pipeline
ensemble_learner_pipeline = gunion(list(
po("learner", lrn("regr.cv_glmnet", s = "lambda.min")),
po("learner", lrn("regr.ranger")),
po("learner", lrn("regr.rpart", cp = 0.01)))) %>>%
po("regravg", 3)
# Use pipeline to create a new instance of a learner
ml_g = as_learner(ensemble_learner_pipeline)
ml_m = as_learner(ensemble_learner_pipeline)
obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")
n_rep = 2
n_folds = 3
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_g, ml_m, n_rep=n_rep, n_folds=n_folds)
dml_plr_obj$learner
dml_plr_obj$fit()
dml_plr_obj$summary()
$ml_l
<GraphLearner:regr.ranger>
* Model: -
* Parameters: regr.ranger.num.threads=1, regr.ranger.num.trees=10
* Validate: NULL
* Packages: mlr3, mlr3pipelines, mlr3learners, ranger
* Predict Types: [response], se, quantiles, distr
* Feature Types: logical, integer, numeric, character, factor, ordered,
POSIXct
* Properties: featureless, hotstart_backward, hotstart_forward,
importance, marshal, missings, oob_error, selected_features, weights
$ml_m
<GraphLearner:regr.ranger>
* Model: -
* Parameters: regr.ranger.num.threads=1, regr.ranger.num.trees=10
* Validate: NULL
* Packages: mlr3, mlr3pipelines, mlr3learners, ranger
* Predict Types: [response], se, quantiles, distr
* Feature Types: logical, integer, numeric, character, factor, ordered,
POSIXct
* Properties: featureless, hotstart_backward, hotstart_forward,
importance, marshal, missings, oob_error, selected_features, weights
Estimates and significance testing of the effect of target variables
Estimate. Std. Error t value Pr(>|t|)
d 0.52732 0.04586 11.5 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
$ml_l
<GraphLearner:regr.cv_glmnet.regr.ranger.regr.rpart.regravg>
* Model: -
* Parameters: regr.cv_glmnet.family=gaussian,
regr.cv_glmnet.s=lambda.min, regr.ranger.num.threads=1,
regr.rpart.cp=0.01, regr.rpart.xval=0, regravg.weights=1
* Validate: NULL
* Packages: mlr3, mlr3pipelines, mlr3learners, glmnet, ranger, rpart
* Predict Types: [response], se, quantiles, distr
* Feature Types: logical, integer, numeric, character, factor, ordered,
POSIXct
* Properties: featureless, hotstart_backward, hotstart_forward,
importance, marshal, missings, oob_error, selected_features, weights
$ml_m
<GraphLearner:regr.cv_glmnet.regr.ranger.regr.rpart.regravg>
* Model: -
* Parameters: regr.cv_glmnet.family=gaussian,
regr.cv_glmnet.s=lambda.min, regr.ranger.num.threads=1,
regr.rpart.cp=0.01, regr.rpart.xval=0, regravg.weights=1
* Validate: NULL
* Packages: mlr3, mlr3pipelines, mlr3learners, glmnet, ranger, rpart
* Predict Types: [response], se, quantiles, distr
* Feature Types: logical, integer, numeric, character, factor, ordered,
POSIXct
* Properties: featureless, hotstart_backward, hotstart_forward,
importance, marshal, missings, oob_error, selected_features, weights
Estimates and significance testing of the effect of target variables
Estimate. Std. Error t value Pr(>|t|)
d 0.55176 0.04625 11.93 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
7.2.4. Hyperparameter tuning#
Parameter tuning of learners for the nuisance functions of DoubleML models can be done via the tune()
method.
The tune()
method passes various options and parameters to the tuning interface provided by the
mlr3tuning package. The mlr3 book provides a
step-by-step introduction to parameter tuning.
To illustrate the parameter tuning, we generate data from a sparse partially linear regression model.
library(DoubleML)
library(mlr3)
library(data.table)
set.seed(3141)
n_obs = 200
n_vars = 200
theta = 3
X = matrix(stats::rnorm(n_obs * n_vars), nrow = n_obs, ncol = n_vars)
d = X[, 1:3, drop = FALSE] %*% c(5, 5, 5) + stats::rnorm(n_obs)
y = theta * d + X[, 1:3, drop = FALSE] %*% c(5, 5, 5) + stats::rnorm(n_obs)
dml_data = double_ml_data_from_matrix(X = X, y = y, d = d)
The hyperparameter-tuning is performed according to options passed through a named list tune_settings
.
The entries in the list specify options during parameter tuning with mlr3tuning:
terminator
is a Terminator object passed to mlr3tuning that manages the budget to solve the tuning problem.algorithm
is an object of class Tuner and specifies the tuning algorithm. Alternatively,algorithm
can be acharacter()
that is used as an argument in the wrapper mlr3tuning call tnr(algorithm). The corresponding chapter in the mlr3book illustrates how the Tuner class supports grid search, random search, generalized simulated annealing and non-linear optimization.rsmp_tune
is an object of class mlr3 resampling that specifies the resampling method for evaluation, for example rsmp(“cv”, folds = 5) implements 5-fold cross-validation. rsmp(“holdout”, ratio = 0.8) implements an evaluation based on a hold-out sample that contains 20 percent of the observations. By default, 5-fold cross-validation is performed.measure
is a named list containing the measures used for tuning of the nuisance components. The names of the entries must match the learner names (see methodlearner_names()
). The entries in the list must either be objects of class Measure or keys passed to msr(). Ifmeasure
is not provided by the user, default measures are used, i.e., mean squared error for regression models and classification error for binary outcomes.
In the following example, we tune the penalty parameter \(\lambda\) (lambda
) for lasso with the R package
glmnet. To tune the value of lambda
, a grid search is performed over a grid of values that range from 0.05
to 0.1 at a resolution of 10. Using a resolution of 10 splits the grid of values in 10 equally spaced values ranging from a minimum of 0.05
to a maximum of 0.1. To evaluate the predictive performance in both nuisance parts, the cross-validated mean squared error is used.
Setting the option tune_on_folds=FALSE
, the tuning is performed on the whole sample. Hence, the cross-validated errors
are obtained from a random split of the whole sample into 5 folds. As a result, one set of lambda
values are obtained
which are later used in the fitting stage for all folds.
Alternatively, setting the option tune_on_folds=TRUE
would assign the tuning resampling scheme rsmp_tune
to each fold.
For example, if we set n_folds=2
at initialization of the DoubleMLPLR
object and use a 5-fold cross-validated error
for tuning, each of the two folds would be split up into 5 subfolds and the error would be evaluated on these subfolds.
library(DoubleML)
library(mlr3)
library(data.table)
library(mlr3learners)
library(mlr3tuning)
library(paradox)
lgr::get_logger("mlr3")$set_threshold("warn")
lgr::get_logger("bbotk")$set_threshold("warn")
set.seed(1234)
ml_l = lrn("regr.glmnet")
ml_m = lrn("regr.glmnet")
dml_plr_obj = DoubleMLPLR$new(dml_data, ml_l, ml_m)
par_grids = list(
"ml_l" = ps(lambda = p_dbl(lower = 0.05, upper = 0.1)),
"ml_m" = ps(lambda = p_dbl(lower = 0.05, upper = 0.1)))
tune_settings = list(terminator = trm("evals", n_evals = 100),
algorithm = tnr("grid_search", resolution = 10),
rsmp_tune = rsmp("cv", folds = 5),
measure = list("ml_l" = msr("regr.mse"),
"ml_m" = msr("regr.mse")))
dml_plr_obj$tune(param_set=par_grids, tune_settings=tune_settings, tune_on_fold=TRUE)
dml_plr_obj$params
dml_plr_obj$fit()
dml_plr_obj$summary()
- $ml_l
- $d =
- $family
- 'gaussian'
- $lambda
- 0.1
- $family
- 'gaussian'
- $lambda
- 0.0777777777777778
- $family
- 'gaussian'
- $lambda
- 0.1
- $family
- 'gaussian'
- $lambda
- 0.1
- $family
- 'gaussian'
- $lambda
- 0.1
- $ml_m
- $d =
- $family
- 'gaussian'
- $lambda
- 0.1
- $family
- 'gaussian'
- $lambda
- 0.1
- $family
- 'gaussian'
- $lambda
- 0.1
- $family
- 'gaussian'
- $lambda
- 0.1
- $family
- 'gaussian'
- $lambda
- 0.1
Estimates and significance testing of the effect of target variables
Estimate. Std. Error t value Pr(>|t|)
d 3.0425 0.1424 21.37 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Hyperparameter tuning can also be done with more sophisticated methods, for example by using built-in tuning
paths of learners. For example, the learner regr.cv_glmnet
performs an internal cross-validated choice of the parameter lambda
.
Alternatively, the powerful functionalities of the mlr3tuning package can be used for
external parameter tuning of the nuisance parts. The optimally chosen parameters can then be passed to the
DoubleML models using the set_ml_nuisance_params()
method.
library(DoubleML)
library(mlr3)
library(data.table)
library(mlr3learners)
library(mlr3tuning)
lgr::get_logger("mlr3")$set_threshold("warn")
lgr::get_logger("bbotk")$set_threshold("warn")
set.seed(1234)
ml_l = lrn("regr.cv_glmnet", s="lambda.min")
ml_m = lrn("regr.cv_glmnet", s="lambda.min")
dml_plr_obj = DoubleMLPLR$new(dml_data, ml_l, ml_m)
dml_plr_obj$fit()
dml_plr_obj$summary()
Estimates and significance testing of the effect of target variables
Estimate. Std. Error t value Pr(>|t|)
d 3.08848 0.07366 41.93 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The following code chunk illustrates another example for global parameter tuning with random forests
as provided by the ranger package. In this example, we use random search to find optimal
parameters mtry
and max.depth
of a random forest. Evaluation is based on 3-fold cross-validation.
library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)
library(mlr3tuning)
library(paradox)
lgr::get_logger("mlr3")$set_threshold("warn")
lgr::get_logger("bbotk")$set_threshold("warn")
# set up a mlr3 learner
learner = lrn("regr.ranger")
ml_l = learner$clone()
ml_m = learner$clone()
set.seed(3141)
obj_dml_data = make_plr_CCDDHNR2018(alpha=0.5)
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l, ml_m)
# set up a list of parameter grids
param_grid = list("ml_l" = ps(mtry = p_int(lower = 2 , upper = 20),
max.depth = p_int(lower = 2, upper = 5)),
"ml_m" = ps(mtry = p_int(lower = 2 , upper = 20),
max.depth = p_int(lower = 2, upper = 5)))
tune_settings = list(terminator = mlr3tuning::trm("evals", n_evals = 20),
algorithm = tnr("random_search"),
rsmp_tune = rsmp("cv", folds = 3),
measure = list("ml_l" = msr("regr.mse"),
"ml_m" = msr("regr.mse")))
dml_plr_obj$tune(param_set=param_grid, tune_settings=tune_settings, tune_on_folds=FALSE)
dml_plr_obj$params
dml_plr_obj$fit()
dml_plr_obj$summary()
- $ml_l
- $d =
- $num.threads
- 1
- $mtry
- 10
- $max.depth
- 5
- $ml_m
- $d =
- $num.threads
- 1
- $mtry
- 17
- $max.depth
- 3
Estimates and significance testing of the effect of target variables
Estimate. Std. Error t value Pr(>|t|)
d 0.55307 0.04563 12.12 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
7.2.5. Hyperparameter tuning with pipelines#
As an alternative to the previously presented tuning approach, it is possible to base the parameter tuning on a pipeline
as provided by the mlr3pipelines package. The basic idea of this approach is to
define a learner via a pipeline and then perform the tuning via the tune()
. We will shortly repeat the lasso example
from above. In general, the pipeline-based approach can be used to find optimal values not only for the parameters of
one or multiple learners, but also for other parameters, which are, for example, involved in the data preprocessing. We
refer to more details provided in the Pipelines Chapter in the mlr3book.
library(DoubleML)
library(mlr3)
library(mlr3tuning)
library(mlr3pipelines)
lgr::get_logger("mlr3")$set_threshold("warn")
lgr::get_logger("bbotk")$set_threshold("warn")
# Define learner in a pipeline
set.seed(1234)
lasso_pipe = po("learner",
learner = lrn("regr.glmnet"))
ml_g = as_learner(lasso_pipe)
ml_m = as_learner(lasso_pipe)
# Instantiate a DoubleML object
dml_plr_obj = DoubleMLPLR$new(dml_data, ml_g, ml_m)
# Parameter grid for lambda
par_grids = ps(regr.glmnet.lambda = p_dbl(lower = 0.05, upper = 0.1))
tune_settings = list(terminator = trm("evals", n_evals = 100),
algorithm = tnr("grid_search", resolution = 10),
rsmp_tune = rsmp("cv", folds = 5),
measure = list("ml_g" = msr("regr.mse"),
"ml_m" = msr("regr.mse")))
dml_plr_obj$tune(param_set = list("ml_g" = par_grids,
"ml_m" = par_grids),
tune_settings=tune_settings,
tune_on_fold=TRUE)
dml_plr_obj$fit()
dml_plr_obj$summary()
Estimates and significance testing of the effect of target variables
Estimate. Std. Error t value Pr(>|t|)
d 3.0425 0.1424 21.37 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
7.2.5.1. References#
Lang, M., Binder, M., Richter, J., Schratz, P., Pfisterer, F., Coors, S., Au, Q., Casalicchio, G., Kotthoff, L., Bischl, B. (2019), mlr3: A modern object-oriented machine learing framework in R. Journal of Open Source Software, doi:10.21105/joss.01903.
Becker, M., Binder, M., Bischl, B., Lang, M., Pfisterer, F., Reich, N.G., Richter, J., Schratz, P., Sonabend, R. (2020), mlr3 book, available at https://mlr3book.mlr-org.com.