Causal Machine Learning with DoubleML

.title[
# Causal Machine Learning with DoubleML
]
.subtitle[
## Introduction to the R Package DoubleML
]
.author[
### UseR!2022, June 20, 2022, online
]
.date[
### Philipp Bach, Martin Spindler, Oliver Schacht (Uni Hamburg)
]

---

# Introduction to DoubleML

---

### Building Principles

- **Orthogonal Score**
  - Object-oriented implementation with `R6`
  - Exploit common structure being centered around a (linear) score function `$\psi(\cdot)$`

- **High-quality ML**
  - State-of-the-art ML prediction and tuning methods
  - Provided by `mlr3` ecosystem

- **Sample Splitting**
  - Built-in resampling schemes of `mlr3`

]

#### Dependency

]

---

## Dependencies and Installation

#### DoubleML package dependencies

- `mlr3`

- `mlr3learners`

- `mlr3tuning`

- `R6`

- `data.table`

]

]

---

## Why an Object-Orientated Implementation?

*  Given the components `$\psi^a(\cdot)$` & `$\psi^b(\cdot)$` of a linear Neyman orthogonal score function `$\psi(\cdot)$`, a **general implementation** is possible for
  - The estimation of the **orthogonal parameters**
  - The computation of the **score** `$\psi(W; \theta, \eta)$`
  - The estimation of **standard errors**
  - The computation of **confidence intervals**
  - A **multiplier bootstrap** procedure for simultaneous inference

*  The **sample splitting** can be implemented in general as well

`$\rightarrow$` Implemented in the **abstract base class** `DoubleML`

* The **score components** and the estimation of the **nuisance models** have to be implemented **model-specifically**

`$\rightarrow$` Implemented in **model-specific classes** inherited from `DoubleML`

---

## Class Structure and Causal Models

---

## Advantages of the Object-Orientation

* `DoubleML` gives the user a **high flexibility** with regard to the specification of DML models:
  - Choice of ML methods for approximating the nuisance functions
  - Different resampling schemes (repeated cross-fitting)
  - DML algorithms DML1 and DML2
  - Different Neyman orthogonal score functions

* `DoubleML` can be **easily extended**
  - New model classes with appropriate Neyman orthogonal score function can be inherited from `DoubleML`
  - The package features `callables` as score functions which makes it easy to extend existing model classes
  - The resampling schemes are customizable in a flexible way

---

# Getting started with DoubleML!

---

## Installation

- **Latest *CRAN* release**

```r
install.packages("DoubleML")
```

- **Development version**

```r
remotes::install_github("DoubleML/doubleml-for-r")
```

- See the **Getting Started** page of the tutorial website for more information on prerequisites.

---

## Data Example: Demand Estimation

#### Data Source

* Data example based on a [**blogpost by Lars Roemheld (Roemheld, 2021)**](https://towardsdatascience.com/causal-inference-example-elasticity-de4a3e2e621b)

* Original real data set publicly available via [**kaggle**](https://www.kaggle.com/vijayuv/onlineretail), [**preprocessing notebook available online**](https://github.com/DoubleML/doubleml-docs/blob/master/doc/examples/py_elasticity_preprocessing.ipynb)

]

#### Causal Problem

* **Price elasticity of demand:** What is the **effect** of a **price change**, `$dLnP$`, on **demanded quantity**, `$dLnQ$`?

* **Observational study**: Flexibly adjust for confounding variables `$X$`, e.g. product characteristics

#### Causal Diagram (DAG)

]

---

# Hands On! Interactive Breakout Sessions

---

## Data Example: A/B Testing

#### Data Source

* Data example based on a randomly chosen DGP created for the [**2019 ACIC Data Challenge**](https://sites.google.com/view/acic2019datachallenge/data-challenge).

]

#### Causal Problem

* **Online shop:** What is the **effect** of a **new ad design** `$A$` on **sales** `$Y$` (in $100 )?

* **Observational study**: Necessary to adjust for confounding variables `$V$`

#### Causal Diagram (DAG)

]

---

## Online Resources

* The notebook is organized according to the [**DoubleML Workflow**](https://docs.doubleml.org/stable/workflow/workflow.html)

* Extensive [**User Guide**](https://docs.doubleml.org/stable/guide/guide.html) available via [**docs.doubleml.org**](https://docs.doubleml.org)

* [**Documentation for the R Package DoubleML**](https://docs.doubleml.org/r/stable/) available via [**docs.doubleml.org/r/stable/**](https://docs.doubleml.org/r/stable/)

* R vignette, Bach et al. (2021) available via [**arxiv**](https://arxiv.org/abs/2103.09603)

---

## Quickstart to R6

* A short introduction to the `R6` packages is [**available here**](https://r6.r-lib.org/articles/Introduction.html).

* To create a new instance of a class, call the `$new()` method.

```r
# Example create a backend (class DoubleMLData)
library(DoubleML)
df = make_plr_CCDDHNR2018(return_type = "data.table")
obj_dml_data = DoubleMLData$new(df,
                                y_col = "y",
                                d_cols = "d")
```

---

## Quickstart to R6

* Call methods and access fields

```r
obj_dml_data$n_obs
```

```
## [1] 500
```

```r
obj_dml_data$print()
```

```
## ================= DoubleMLData Object ==================
## 
## 
## ------------------ Data summary      ------------------
## Outcome variable: y
## Treatment variable(s): d
## Covariates: X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X18, X19, X20
## Instrument(s): 
## No. Observations: 500
```

---

## Quickstart to R6

* A guide on how to debug with R6 is [**avaialable online**](https://r6.r-lib.org/articles/Debugging.html)

```r
DoubleMLData$debug("initialize")
obj_dml_data = DoubleMLData$new(df,
                                y_col = "y",
                                d_cols = "d")
```

* Debugging methods in individual objects

```r
debug(obj_dml_data$print)
obj_dml_data$print()
```

---

## Quickstart: Creating learners in mlr3

* Install and load `mlr3` package

```r
install.packages("mlr3")
library(mlr3)
```

* Create a learner

```r
lm_learner = LearnerRegrLM$new()
```

```r
lm_learner = lrn("regr.lm")
lm_learner
```

```
## <LearnerRegrLM:regr.lm>
## * Model: -
## * Parameters: list()
## * Packages: mlr3, mlr3learners, stats
## * Predict Type: response
## * Feature types: logical, integer, numeric, factor, character
## * Properties: loglik, weights
```