Installation
Please read the following installation instructions and make sure you installed the latest release of DoubleML
on your local machine prior to our tutorial.
In case you want to learn more about DoubleML
upfront, feel free to read through our user guide.
Install latest release from CRAN
Install latest release from CRAN via
install.packages("DoubleML")
Install development version from GitHub
The DoubleML package for R can be downloaded using the command (previous installation of the remotes package is required).
remotes::install_github("DoubleML/doubleml-for-r")
Load DoubleML
Load the package after completed installation.
library(DoubleML)
Install packages for learners
As described in our user guide section on learners and the corresponding chapter of the mlr3book, we have to install the packages that are required for using the ML learners. In our tutorial, we will use the R packages ranger
, glmnet
and xgboost
.
install.packages("ranger")
install.packages("glmnet")
install.packages("xgboost")
Example
Once you installed all packages, try to run the following example. Load the DoubleML
package.
library(DoubleML)
Load the Bonus data set.
df_bonus = fetch_bonus(return_type="data.table")
head(df_bonus)
## inuidur1 female black othrace dep1 dep2 q2 q3 q4 q5 q6 agelt35 agegt54
## 1: 2.890372 0 0 0 0 1 0 0 0 1 0 0 0
## 2: 0.000000 0 0 0 0 0 0 0 0 1 0 0 0
## 3: 3.295837 0 0 0 0 0 0 0 1 0 0 0 0
## 4: 2.197225 0 0 0 0 0 0 1 0 0 0 1 0
## 5: 3.295837 0 0 0 1 0 0 0 0 1 0 0 1
## 6: 3.295837 1 0 0 0 0 0 0 0 1 0 0 1
## durable lusd husd tg
## 1: 0 0 1 0
## 2: 0 1 0 0
## 3: 0 1 0 0
## 4: 0 0 0 1
## 5: 1 1 0 0
## 6: 0 1 0 0
Create a data backend.
# Specify the data and variables for the causal model
dml_data_bonus = DoubleMLData$new(df_bonus,
y_col = "inuidur1",
d_cols = "tg",
x_cols = c("female", "black", "othrace", "dep1", "dep2",
"q2", "q3", "q4", "q5", "q6", "agelt35", "agegt54",
"durable", "lusd", "husd"))
print(dml_data_bonus)
## ================= DoubleMLData Object ==================
##
##
## ------------------ Data summary ------------------
## Outcome variable: inuidur1
## Treatment variable(s): tg
## Covariates: female, black, othrace, dep1, dep2, q2, q3, q4, q5, q6, agelt35, agegt54, durable, lusd, husd
## Instrument(s):
## No. Observations: 5099
Create two learners for the nuisance components using mlr3
and mlr3learners
.
library(mlr3)
library(mlr3learners)
# surpress messages from mlr3 package during fitting
lgr::get_logger("mlr3")$set_threshold("warn")
learner = lrn("regr.ranger", num.trees=500, max.depth=5, min.node.size=2)
ml_l_bonus = learner$clone()
ml_m_bonus = learner$clone()
Create a new instance of a causal model, here a partially linear regression model via DoubleMLPLR
.
set.seed(3141)
obj_dml_plr_bonus = DoubleMLPLR$new(dml_data_bonus, ml_l=ml_l_bonus, ml_m=ml_m_bonus)
obj_dml_plr_bonus$fit()
print(obj_dml_plr_bonus)
## ================= DoubleMLPLR Object ==================
##
##
## ------------------ Data summary ------------------
## Outcome variable: inuidur1
## Treatment variable(s): tg
## Covariates: female, black, othrace, dep1, dep2, q2, q3, q4, q5, q6, agelt35, agegt54, durable, lusd, husd
## Instrument(s):
## No. Observations: 5099
##
## ------------------ Score & algorithm ------------------
## Score function: partialling out
## DML algorithm: dml2
##
## ------------------ Machine learner ------------------
## ml_l: regr.ranger
## ml_m: regr.ranger
##
## ------------------ Resampling ------------------
## No. folds: 5
## No. repeated sample splits: 1
## Apply cross-fitting: TRUE
##
## ------------------ Fit summary ------------------
## Estimates and significance testing of the effect of target variables
## Estimate. Std. Error t value Pr(>|t|)
## tg -0.07561 0.03536 -2.139 0.0325 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Ready to go :-)
Once you are able to run this code, you are ready for our tutorial!