{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "126e2f75", "metadata": {}, "source": [ "# R: Ensemble Learners and More with `mlr3pipelines`" ] }, { "attachments": {}, "cell_type": "markdown", "id": "aa30571a", "metadata": {}, "source": [ "This notebook illustrates how to exploit the powerful tools provided by the [mlr3pipelines](https://mlr3pipelines.mlr-org.com/) package (Binder et al. 2021). For example, [mlr3pipelines](https://mlr3pipelines.mlr-org.com/) can be used in combination with [DoubleML](https://docs.doubleml.org/stable/index.html) for feature engineering, combination of learners (ensemble learners, stacking), subsampling and hyperparameter tuning. The underlying idea of [mlr3pipelines](https://mlr3pipelines.mlr-org.com/) is to define a pipeline that incorporates a user's desired operations. As a result, the pipeline returns an object of class [Learner](https://mlr3.mlr-org.com/reference/Learner.html) which can easily be passed to [DoubleML](https://docs.doubleml.org/stable/index.html). For an introduction to [mlr3pipelines](https://mlr3pipelines.mlr-org.com/), we refer to the [Pipelines Chapter](https://mlr3book.mlr-org.com/pipelines.html) in the [mlr3book](https://mlr3book.mlr-org.com) (Becker et al. 2020) and to the [package website](https://mlr3pipelines.mlr-org.com/).\n", "\n", "\n", "We intend to illustrate the major idea of how to use [mlr3pipelines](https://mlr3pipelines.mlr-org.com/) in combination with [DoubleML](https://docs.doubleml.org/stable/index.html) in very simple examples. We use pipelines that are identical or very similar to the ones in the [Pipelines Chapter](https://mlr3book.mlr-org.com/pipelines.html) in the [mlr3book](https://mlr3book.mlr-org.com). Hence, we do not claim that the proposed learners are optimal in terms of their performance.\n", "\n", "We start with the simple simulated data example and the [Bonus data set](https://docs.doubleml.org/r/stable/reference/fetch_bonus.html) from the [Getting Started Section in the DoubleML user guide](https://docs.doubleml.org/dev/intro/intro.html)." ] }, { "cell_type": "code", "execution_count": 1, "id": "fc314dcf", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "text/plain": [ "================= DoubleMLData Object ==================\n", "\n", "\n", "------------------ Data summary ------------------\n", "Outcome variable: y\n", "Treatment variable(s): d\n", "Covariates: X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X18, X19, X20, X21, X22, X23, X24, X25, X26, X27, X28, X29, X30, X31, X32, X33, X34, X35, X36, X37, X38, X39, X40, X41, X42, X43, X44, X45, X46, X47, X48, X49, X50, X51, X52, X53, X54, X55, X56, X57, X58, X59, X60, X61, X62, X63, X64, X65, X66, X67, X68, X69, X70, X71, X72, X73, X74, X75, X76, X77, X78, X79, X80, X81, X82, X83, X84, X85, X86, X87, X88, X89, X90, X91, X92, X93, X94, X95, X96, X97, X98, X99, X100\n", "Instrument(s): \n", "No. Observations: 500" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "library(DoubleML)\n", "\n", "# Simulate data\n", "set.seed(3141)\n", "n_obs = 500\n", "n_vars = 100\n", "theta = 3\n", "X = matrix(rnorm(n_obs*n_vars), nrow=n_obs, ncol=n_vars)\n", "d = X[,1:3]%*%c(5,5,5) + rnorm(n_obs)\n", "y = theta*d + X[, 1:3]%*%c(5,5,5) + rnorm(n_obs)\n", "\n", "\n", "# Specify the data and variables for the causal model\n", "# matrix interface to DoubleMLData\n", "dml_data_sim = double_ml_data_from_matrix(X=X, y=y, d=d)\n", "dml_data_sim" ] }, { "attachments": {}, "cell_type": "markdown", "id": "5913f88c", "metadata": {}, "source": [ "To have an example with a classification learner, we load the [Bonus data set](https://docs.doubleml.org/r/stable/reference/fetch_bonus.html)." ] }, { "cell_type": "code", "execution_count": 2, "id": "dea5e13a", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "text/html": [ "
inuidur1 | female | black | othrace | dep1 | dep2 | q2 | q3 | q4 | q5 | q6 | agelt35 | agegt54 | durable | lusd | husd | tg |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
<dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> |
2.890372 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
3.295837 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
2.197225 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
3.295837 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
3.295837 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |