Abstract class DoubleML

Abstract base class that can't be initialized.

Format

R6::R6Class object.

Active bindings

all_coef: (matrix())
Estimates of the causal parameter(s) for the n_rep different sample splits after calling fit().
all_dml1_coef: (array())
Estimates of the causal parameter(s) for the n_rep different sample splits after calling fit() with dml_procedure = "dml1".
all_se: (matrix())
Standard errors of the causal parameter(s) for the n_rep different sample splits after calling fit().
apply_cross_fitting: (logical(1))
Indicates whether cross-fitting should be applied. Default is TRUE.
boot_coef: (matrix())
Bootstrapped coefficients for the causal parameter(s) after calling fit() and bootstrap().
boot_t_stat: (matrix())
Bootstrapped t-statistics for the causal parameter(s) after calling fit() and bootstrap().
coef: (numeric())
Estimates for the causal parameter(s) after calling fit().
data: (data.table)
Data object.
dml_procedure: (character(1))
A character() ("dml1" or "dml2") specifying the double machine learning algorithm. Default is "dml2".
draw_sample_splitting: (logical(1))
Indicates whether the sample splitting should be drawn during initialization of the object. Default is TRUE.
learner: (named list())
The machine learners for the nuisance functions.
n_folds: (integer(1))
Number of folds. Default is 5.
n_rep: (integer(1))
Number of repetitions for the sample splitting. Default is 1.
params: (named list())
The hyperparameters of the learners.
psi: (array())
Value of the score function $\psi(W;\theta, \eta)=\psi_a(W;\eta) \theta + \psi_b (W; \eta)$ after calling fit().
psi_a: (array())
Value of the score function component $\psi_a(W;\eta)$ after calling fit().
psi_b: (array())
Value of the score function component $\psi_b(W;\eta)$ after calling fit().
predictions: (array())
Predictions of the nuisance models after calling fit(store_predictions=TRUE).
models: (array())
The fitted nuisance models after calling fit(store_models=TRUE).
pval: (numeric())
p-values for the causal parameter(s) after calling fit().
score: (character(1), function())
A character(1) or function() specifying the score function.
se: (numeric())
Standard errors for the causal parameter(s) after calling fit().
smpls: (list())
The partition used for cross-fitting.
smpls_cluster: (list())
The partition of clusters used for cross-fitting.
t_stat: (numeric())
t-statistics for the causal parameter(s) after calling fit().
tuning_res: (named list())
Results from hyperparameter tuning.

Methods

Method `new()`

DoubleML is an abstract class that can't be initialized.

Usage

DoubleML$new()

Method `print()`

Print DoubleML objects.

Usage

DoubleML$print()

Method `fit()`

Estimate DoubleML models.

Usage

DoubleML$fit(store_predictions = FALSE, store_models = FALSE)

Arguments

store_predictions: (logical(1))
Indicates whether the predictions for the nuisance functions should be stored in field predictions. Default is FALSE.
store_models: (logical(1))
Indicates whether the fitted models for the nuisance functions should be stored in field models if you want to analyze the models or extract information like variable importance. Default is FALSE.

Returns

self

Method `bootstrap()`

Multiplier bootstrap for DoubleML models.

Usage

DoubleML$bootstrap(method = "normal", n_rep_boot = 500)

Arguments

method: (character(1))
A character(1) ("Bayes", "normal" or "wild") specifying the multiplier bootstrap method.
n_rep_boot: (integer(1))
The number of bootstrap replications.

Returns

self

Method `split_samples()`

Draw sample splitting for DoubleML models.

The samples are drawn according to the attributes n_folds, n_rep and apply_cross_fitting.

Usage

DoubleML$split_samples()

Returns

self

Method `set_sample_splitting()`

Set the sample splitting for DoubleML models.

The attributes n_folds and n_rep are derived from the provided partition.

Usage

DoubleML$set_sample_splitting(smpls)

Arguments

smpls: (list())
A nested list(). The outer lists needs to provide an entry per repeated sample splitting (length of the list is set as n_rep). The inner list is a named list() with names train_ids and test_ids. The entries in train_ids and test_ids must be partitions per fold (length of train_ids and test_ids is set as n_folds).

Returns

self

Examples

library(DoubleML)
library(mlr3)
set.seed(2)
obj_dml_data = make_plr_CCDDHNR2018(n_obs=10)
dml_plr_obj = DoubleMLPLR$new(obj_dml_data,
                              lrn("regr.rpart"), lrn("regr.rpart"))

# simple sample splitting with two folds and without cross-fitting
smpls = list(list(train_ids = list(c(1, 2, 3, 4, 5)),
                  test_ids = list(c(6, 7, 8, 9, 10))))
dml_plr_obj$set_sample_splitting(smpls)

# sample splitting with two folds and cross-fitting but no repeated cross-fitting
smpls = list(list(train_ids = list(c(1, 2, 3, 4, 5), c(6, 7, 8, 9, 10)),
                  test_ids = list(c(6, 7, 8, 9, 10), c(1, 2, 3, 4, 5))))
dml_plr_obj$set_sample_splitting(smpls)

# sample splitting with two folds and repeated cross-fitting with n_rep = 2
smpls = list(list(train_ids = list(c(1, 2, 3, 4, 5), c(6, 7, 8, 9, 10)),
                  test_ids = list(c(6, 7, 8, 9, 10), c(1, 2, 3, 4, 5))),
             list(train_ids = list(c(1, 3, 5, 7, 9), c(2, 4, 6, 8, 10)),
                  test_ids = list(c(2, 4, 6, 8, 10), c(1, 3, 5, 7, 9))))
dml_plr_obj$set_sample_splitting(smpls)

Method `tune()`

Hyperparameter-tuning for DoubleML models.

The hyperparameter-tuning is performed using the tuning methods provided in the mlr3tuning package. For more information on tuning in mlr3, we refer to the section on parameter tuning in the mlr3 book.

Usage

DoubleML$tune(
  param_set,
  tune_settings = list(n_folds_tune = 5, rsmp_tune = mlr3::rsmp("cv", folds = 5), measure
    = NULL, terminator = mlr3tuning::trm("evals", n_evals = 20), algorithm =
    mlr3tuning::tnr("grid_search"), resolution = 5),
  tune_on_folds = FALSE
)

Arguments

param_set

(named list())
A named list with a parameter grid for each nuisance model/learner (see method learner_names()). The parameter grid must be an object of class ParamSet.

tune_settings

(named list())
A named list() with arguments passed to the hyperparameter-tuning with mlr3tuning to set up TuningInstance objects. tune_settings has entries

terminator (Terminator)
A Terminator object. Specification of terminator is required to perform tuning.
algorithm (Tuner or character(1))
A Tuner object (recommended) or key passed to the respective dictionary to specify the tuning algorithm used in tnr(). algorithm is passed as an argument to tnr(). If algorithm is not specified by the users, default is set to "grid_search". If set to "grid_search", then additional argument "resolution" is required.
rsmp_tune (Resampling or character(1))
A Resampling object (recommended) or option passed to rsmp() to initialize a Resampling for parameter tuning in mlr3. If not specified by the user, default is set to "cv" (cross-validation).
n_folds_tune (integer(1), optional)
If rsmp_tune = "cv", number of folds used for cross-validation. If not specified by the user, default is set to 5.
measure (NULL, named list(), optional)
Named list containing the measures used for parameter tuning. Entries in list must either be Measure objects or keys to be passed to passed to msr(). The names of the entries must match the learner names (see method learner_names()). If set to NULL, default measures are used, i.e., "regr.mse" for continuous outcome variables and "classif.ce" for binary outcomes.
resolution (character(1))
The key passed to the respective dictionary to specify the tuning algorithm used in tnr(). resolution is passed as an argument to tnr().

tune_on_folds

(logical(1))
Indicates whether the tuning should be done fold-specific or globally. Default is FALSE.

Returns

self

Method `summary()`

Summary for DoubleML models after calling fit().

Usage

DoubleML$summary(digits = max(3L, getOption("digits") - 3L))

Arguments

digits: (integer(1))
The number of significant digits to use when printing.

Method `confint()`

Confidence intervals for DoubleML models.

Usage

DoubleML$confint(parm, joint = FALSE, level = 0.95)

Arguments

parm: (numeric() or character())
A specification of which parameters are to be given confidence intervals among the variables for which inference was done, either a vector of numbers or a vector of names. If missing, all parameters are considered (default).
joint: (logical(1))
Indicates whether joint confidence intervals are computed. Default is FALSE.
level: (numeric(1))
The confidence level. Default is 0.95.

Returns

A matrix() with the confidence interval(s).

Method `learner_names()`

Returns the names of the learners.

Usage

DoubleML$learner_names()

Returns

character() with names of learners.

Method `params_names()`

Returns the names of the nuisance models with hyperparameters.

Usage

DoubleML$params_names()

Returns

character() with names of nuisance models with hyperparameters.

Method `set_ml_nuisance_params()`

Set hyperparameters for the nuisance models of DoubleML models.

Note that in the current implementation, either all parameters have to be set globally or all parameters have to be provided fold-specific.

Usage

DoubleML$set_ml_nuisance_params(
  learner = NULL,
  treat_var = NULL,
  params,
  set_fold_specific = FALSE
)

Arguments

learner: (character(1))
The nuisance model/learner (see method params_names).
treat_var: (character(1))
The treatment varaible (hyperparameters can be set treatment-variable specific).
params: (named list())
A named list() with estimator parameters. Parameters are used for all folds by default. Alternatively, parameters can be passed in a fold-specific way if option fold_specificis TRUE. In this case, the outer list needs to be of length n_rep and the inner list of length n_folds.
set_fold_specific: (logical(1))
Indicates if the parameters passed in params should be passed in fold-specific way. Default is FALSE. If TRUE, the outer list needs to be of length n_rep and the inner list of length n_folds. Note that in the current implementation, either all parameters have to be set globally or all parameters have to be provided fold-specific.

Returns

self

Method `p_adjust()`

Multiple testing adjustment for DoubleML models.

Usage

DoubleML$p_adjust(method = "romano-wolf", return_matrix = TRUE)

Arguments

method: (character(1))
A character(1)("romano-wolf", "bonferroni", "holm", etc) specifying the adjustment method. In addition to "romano-wolf", all methods implemented in p.adjust() can be applied. Default is "romano-wolf".
return_matrix: (logical(1))
Indicates if the output is returned as a matrix with corresponding coefficient names.

Returns

numeric() with adjusted p-values. If return_matrix = TRUE, a matrix() with adjusted p_values.

Method `get_params()`

Get hyperparameters for the nuisance model of DoubleML models.

Usage

DoubleML$get_params(learner)

Arguments

learner: (character(1))
The nuisance model/learner (see method params_names())

Returns

named list()with paramers for the nuisance model/learner.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

DoubleML$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples


## ------------------------------------------------
## Method `DoubleML$set_sample_splitting`
## ------------------------------------------------

library(DoubleML)
library(mlr3)
set.seed(2)
obj_dml_data = make_plr_CCDDHNR2018(n_obs=10)
dml_plr_obj = DoubleMLPLR$new(obj_dml_data,
                              lrn("regr.rpart"), lrn("regr.rpart"))

# simple sample splitting with two folds and without cross-fitting
smpls = list(list(train_ids = list(c(1, 2, 3, 4, 5)),
                  test_ids = list(c(6, 7, 8, 9, 10))))
dml_plr_obj$set_sample_splitting(smpls)

# sample splitting with two folds and cross-fitting but no repeated cross-fitting
smpls = list(list(train_ids = list(c(1, 2, 3, 4, 5), c(6, 7, 8, 9, 10)),
                  test_ids = list(c(6, 7, 8, 9, 10), c(1, 2, 3, 4, 5))))
dml_plr_obj$set_sample_splitting(smpls)

# sample splitting with two folds and repeated cross-fitting with n_rep = 2
smpls = list(list(train_ids = list(c(1, 2, 3, 4, 5), c(6, 7, 8, 9, 10)),
                  test_ids = list(c(6, 7, 8, 9, 10), c(1, 2, 3, 4, 5))),
             list(train_ids = list(c(1, 3, 5, 7, 9), c(2, 4, 6, 8, 10)),
                  test_ids = list(c(2, 4, 6, 8, 10), c(1, 3, 5, 7, 9))))
dml_plr_obj$set_sample_splitting(smpls)

Format

See also

Active bindings

Methods

Public methods

Method new()

Usage

Method print()

Usage

Method fit()

Usage

Arguments

Returns

Method bootstrap()

Usage

Arguments

Returns

Method split_samples()

Usage

Returns

Method set_sample_splitting()

Usage

Arguments

Returns

Examples

Method tune()

Usage

Arguments

Returns

Method summary()

Usage

Arguments

Method confint()

Usage

Arguments

Returns

Method learner_names()

Usage

Returns

Method params_names()

Usage

Returns

Method set_ml_nuisance_params()

Usage

Arguments

Returns

Method p_adjust()

Usage

Arguments

Returns

Method get_params()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Examples

Method `new()`

Method `print()`

Method `fit()`

Method `bootstrap()`

Method `split_samples()`

Method `set_sample_splitting()`

Method `tune()`

Method `summary()`

Method `confint()`

Method `learner_names()`

Method `params_names()`

Method `set_ml_nuisance_params()`

Method `p_adjust()`

Method `get_params()`

Method `clone()`