# 3. Models#

The DoubleML includes the following models.

## 3.1. Partially linear regression model (PLR)#

Partially linear regression (PLR) models take the form

\begin{align}\begin{aligned}Y = D \theta_0 + g_0(X) + \zeta, & &\mathbb{E}(\zeta | D,X) = 0,\\D = m_0(X) + V, & &\mathbb{E}(V | X) = 0,\end{aligned}\end{align}

where $$Y$$ is the outcome variable and $$D$$ is the policy variable of interest. The high-dimensional vector $$X = (X_1, \ldots, X_p)$$ consists of other confounding covariates, and $$\zeta$$ and $$V$$ are stochastic errors.

DoubleMLPLR implements PLR models. Estimation is conducted via its fit() method:

In : import numpy as np

In : import doubleml as dml

In : from doubleml.datasets import make_plr_CCDDHNR2018

In : from sklearn.ensemble import RandomForestRegressor

In : from sklearn.base import clone

In : learner = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)

In : ml_l = clone(learner)

In : ml_m = clone(learner)

In : np.random.seed(1111)

In : data = make_plr_CCDDHNR2018(alpha=0.5, n_obs=500, dim_x=20, return_type='DataFrame')

In : obj_dml_data = dml.DoubleMLData(data, 'y', 'd')

In : dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_l, ml_m)

In : print(dml_plr_obj.fit())
================== DoubleMLPLR Object ==================

------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7', 'X8', 'X9', 'X10', 'X11', 'X12', 'X13', 'X14', 'X15', 'X16', 'X17', 'X18', 'X19', 'X20']
Instrument variable(s): None
No. Observations: 500

------------------ Score & algorithm ------------------
Score function: partialling out
DML algorithm: dml2

------------------ Machine learner   ------------------
Learner ml_l: RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2)
Learner ml_m: RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2)
Out-of-sample Performance:
Learner ml_l RMSE: [[1.18168748]]
Learner ml_m RMSE: [[1.06080804]]

------------------ Resampling        ------------------
No. folds: 5
No. repeated sample splits: 1
Apply cross-fitting: True

------------------ Fit summary       ------------------
coef   std err        t         P>|t|     2.5 %   97.5 %
d  0.512189  0.044828  11.4256  3.114954e-30  0.424327  0.60005

library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)
lgr::get_logger("mlr3")$set_threshold("warn") learner = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5) ml_l = learner$clone()
ml_m = learner$clone() set.seed(1111) data = make_plr_CCDDHNR2018(alpha=0.5, n_obs=500, dim_x=20, return_type='data.table') obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l, ml_m) dml_plr_obj$fit()
print(dml_plr_obj)

================= DoubleMLPLR Object ==================

------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): d
Covariates: X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X18, X19, X20
Instrument(s):
No. Observations: 500

------------------ Score & algorithm ------------------
Score function: partialling out
DML algorithm: dml2

------------------ Machine learner   ------------------
ml_l: regr.ranger
ml_m: regr.ranger

------------------ Resampling        ------------------
No. folds: 5
No. repeated sample splits: 1
Apply cross-fitting: TRUE

------------------ Fit summary       ------------------
Estimates and significance testing of the effect of target variables
Estimate. Std. Error t value Pr(>|t|)
d   0.47659    0.04166   11.44   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



## 3.2. Partially linear IV regression model (PLIV)#

Partially linear IV regression (PLIV) models take the form

\begin{align}\begin{aligned}Y - D \theta_0 = g_0(X) + \zeta, & &\mathbb{E}(\zeta | Z, X) = 0,\\Z = m_0(X) + V, & &\mathbb{E}(V | X) = 0.\end{aligned}\end{align}

where $$Y$$ is the outcome variable, $$D$$ is the policy variable of interest and $$Z$$ denotes one or multiple instrumental variables. The high-dimensional vector $$X = (X_1, \ldots, X_p)$$ consists of other confounding covariates, and $$\zeta$$ and $$V$$ are stochastic errors.

DoubleMLPLIV implements PLIV models. Estimation is conducted via its fit() method:

In : import numpy as np

In : import doubleml as dml

In : from doubleml.datasets import make_pliv_CHS2015

In : from sklearn.ensemble import RandomForestRegressor

In : from sklearn.base import clone

In : learner = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)

In : ml_l = clone(learner)

In : ml_m = clone(learner)

In : ml_r = clone(learner)

In : np.random.seed(2222)

In : data = make_pliv_CHS2015(alpha=0.5, n_obs=500, dim_x=20, dim_z=1, return_type='DataFrame')

In : obj_dml_data = dml.DoubleMLData(data, 'y', 'd', z_cols='Z1')

In : dml_pliv_obj = dml.DoubleMLPLIV(obj_dml_data, ml_l, ml_m, ml_r)

In : print(dml_pliv_obj.fit())
================== DoubleMLPLIV Object ==================

------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7', 'X8', 'X9', 'X10', 'X11', 'X12', 'X13', 'X14', 'X15', 'X16', 'X17', 'X18', 'X19', 'X20']
Instrument variable(s): ['Z1']
No. Observations: 500

------------------ Score & algorithm ------------------
Score function: partialling out
DML algorithm: dml2

------------------ Machine learner   ------------------
Learner ml_l: RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2)
Learner ml_m: RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2)
Learner ml_r: RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2)
Out-of-sample Performance:
Learner ml_l RMSE: [[1.48525966]]
Learner ml_m RMSE: [[0.53220754]]
Learner ml_r RMSE: [[1.25240852]]

------------------ Resampling        ------------------
No. folds: 5
No. repeated sample splits: 1
Apply cross-fitting: True

------------------ Fit summary       ------------------
coef   std err         t         P>|t|     2.5 %    97.5 %
d  0.48025  0.084792  5.663878  1.479901e-08  0.314061  0.646439

library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)

learner = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5)
ml_l = learner$clone() ml_m = learner$clone()
ml_r = learner$clone() set.seed(2222) data = make_pliv_CHS2015(alpha=0.5, n_obs=500, dim_x=20, dim_z=1, return_type="data.table") obj_dml_data = DoubleMLData$new(data, y_col="y", d_col = "d", z_cols= "Z1")
dml_pliv_obj = DoubleMLPLIV$new(obj_dml_data, ml_l, ml_m, ml_r) dml_pliv_obj$fit()
print(dml_pliv_obj)

================= DoubleMLPLIV Object ==================

------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): d
Covariates: X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X18, X19, X20
Instrument(s): Z1
No. Observations: 500

------------------ Score & algorithm ------------------
Score function: partialling out
DML algorithm: dml2

------------------ Machine learner   ------------------
ml_l: regr.ranger
ml_m: regr.ranger
ml_r: regr.ranger

------------------ Resampling        ------------------
No. folds: 5
No. repeated sample splits: 1
Apply cross-fitting: TRUE

------------------ Fit summary       ------------------
Estimates and significance testing of the effect of target variables
Estimate. Std. Error t value Pr(>|t|)
d   0.66184    0.07786     8.5   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



## 3.3. Interactive regression model (IRM)#

Interactive regression (IRM) models take the form

\begin{align}\begin{aligned}Y = g_0(D, X) + U, & &\mathbb{E}(U | X, D) = 0,\\D = m_0(X) + V, & &\mathbb{E}(V | X) = 0,\end{aligned}\end{align}

where the treatment variable is binary, $$D \in \lbrace 0,1 \rbrace$$. We consider estimation of the average treatment effects when treatment effects are fully heterogeneous. Target parameters of interest in this model are the average treatment effect (ATE),

$\theta_0 = \mathbb{E}[g_0(1, X) - g_0(0,X)]$

and the average treatment effect of the treated (ATTE),

$\theta_0 = \mathbb{E}[g_0(1, X) - g_0(0,X) | D=1].$

DoubleMLIRM implements IRM models. Estimation is conducted via its fit() method:

In : import numpy as np

In : import doubleml as dml

In : from doubleml.datasets import make_irm_data

In : from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier

In : ml_g = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)

In : ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)

In : np.random.seed(3333)

In : data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type='DataFrame')

In : obj_dml_data = dml.DoubleMLData(data, 'y', 'd')

In : dml_irm_obj = dml.DoubleMLIRM(obj_dml_data, ml_g, ml_m)

In : print(dml_irm_obj.fit())
================== DoubleMLIRM Object ==================

------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7', 'X8', 'X9', 'X10', 'X11', 'X12', 'X13', 'X14', 'X15', 'X16', 'X17', 'X18', 'X19', 'X20']
Instrument variable(s): None
No. Observations: 500

------------------ Score & algorithm ------------------
Score function: ATE
DML algorithm: dml2

------------------ Machine learner   ------------------
Learner ml_g: RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2)
Learner ml_m: RandomForestClassifier(max_depth=5, max_features=20, min_samples_leaf=2)
Out-of-sample Performance:
Learner ml_g0 RMSE: [[1.11796234]]
Learner ml_g1 RMSE: [[1.10906512]]
Learner ml_m RMSE: [[0.41907525]]

------------------ Resampling        ------------------
No. folds: 5
No. repeated sample splits: 1
Apply cross-fitting: True

------------------ Fit summary       ------------------
coef   std err         t     P>|t|    2.5 %    97.5 %
d  0.59284  0.195596  3.030945  0.002438  0.20948  0.976201

library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)

set.seed(3333)
ml_g = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5)
ml_m = lrn("classif.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5)
data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type="data.table")
obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d") dml_irm_obj = DoubleMLIRM$new(obj_dml_data, ml_g, ml_m)
dml_irm_objfit() print(dml_irm_obj)  ================= DoubleMLIRM Object ================== ------------------ Data summary ------------------ Outcome variable: y Treatment variable(s): d Covariates: X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X18, X19, X20 Instrument(s): No. Observations: 500 ------------------ Score & algorithm ------------------ Score function: ATE DML algorithm: dml2 ------------------ Machine learner ------------------ ml_g: regr.ranger ml_m: classif.ranger ------------------ Resampling ------------------ No. folds: 5 No. repeated sample splits: 1 Apply cross-fitting: TRUE ------------------ Fit summary ------------------ Estimates and significance testing of the effect of target variables Estimate. Std. Error t value Pr(>|t|) d 0.6695 0.2097 3.192 0.00141 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1  ## 3.4. Interactive IV model (IIVM)# Interactive IV regression (IIVM) models take the form \begin{align}\begin{aligned}Y = \ell_0(D, X) + \zeta, & &\mathbb{E}(\zeta | Z, X) = 0,\\Z = m_0(X) + V, & &\mathbb{E}(V | X) = 0,\end{aligned}\end{align} where the treatment variable is binary, $$D \in \lbrace 0,1 \rbrace$$ and the instrument is binary, $$Z \in \lbrace 0,1 \rbrace$$. Consider the functions $$g_0$$, $$r_0$$ and $$m_0$$, where $$g_0$$ maps the support of $$(Z,X)$$ to $$\mathbb{R}$$ and $$r_0$$ and $$m_0$$ respectively map the support of $$(Z,X)$$ and $$X$$ to $$(\varepsilon, 1-\varepsilon)$$ for some $$\varepsilon \in (0, 1/2)$$, such that \begin{align}\begin{aligned}Y = g_0(Z, X) + \nu, & &\mathbb{E}(\nu | Z, X) = 0,\\D = r_0(Z, X) + U, & &\mathbb{E}(U | Z, X) = 0,\\Z = m_0(X) + V, & &\mathbb{E}(V | X) = 0.\end{aligned}\end{align} The target parameter of interest in this model is the local average treatment effect (LATE), $\theta_0 = \frac{\mathbb{E}[g_0(1, X)] - \mathbb{E}[g_0(0,X)]}{\mathbb{E}[r_0(1, X)] - \mathbb{E}[r_0(0,X)]}.$ Causal diagram# DoubleMLIIVM implements IIVM models. Estimation is conducted via its fit() method: In : import numpy as np In : import doubleml as dml In : from doubleml.datasets import make_iivm_data In : from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier In : ml_g = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2) In : ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2) In : ml_r = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2) In : np.random.seed(4444) In : data = make_iivm_data(theta=0.5, n_obs=1000, dim_x=20, alpha_x=1.0, return_type='DataFrame') In : obj_dml_data = dml.DoubleMLData(data, 'y', 'd', z_cols='z') In : dml_iivm_obj = dml.DoubleMLIIVM(obj_dml_data, ml_g, ml_m, ml_r) In : print(dml_iivm_obj.fit()) ================== DoubleMLIIVM Object ================== ------------------ Data summary ------------------ Outcome variable: y Treatment variable(s): ['d'] Covariates: ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7', 'X8', 'X9', 'X10', 'X11', 'X12', 'X13', 'X14', 'X15', 'X16', 'X17', 'X18', 'X19', 'X20'] Instrument variable(s): ['z'] No. Observations: 1000 ------------------ Score & algorithm ------------------ Score function: LATE DML algorithm: dml2 ------------------ Machine learner ------------------ Learner ml_g: RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2) Learner ml_m: RandomForestClassifier(max_depth=5, max_features=20, min_samples_leaf=2) Learner ml_r: RandomForestClassifier(max_depth=5, max_features=20, min_samples_leaf=2) Out-of-sample Performance: Learner ml_g0 RMSE: [[1.12318274]] Learner ml_g1 RMSE: [[1.1264285]] Learner ml_m RMSE: [[0.49825336]] Learner ml_r0 RMSE: [[0.50420135]] Learner ml_r1 RMSE: [[0.36566158]] ------------------ Resampling ------------------ No. folds: 5 No. repeated sample splits: 1 Apply cross-fitting: True ------------------ Fit summary ------------------ coef std err t P>|t| 2.5 % 97.5 % d 0.44921 0.224522 2.000743 0.04542 0.009156 0.889265  library(DoubleML) library(mlr3) library(mlr3learners) library(data.table) set.seed(4444) ml_g = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5) ml_m = lrn("classif.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5) ml_r = ml_mclone()
data = make_iivm_data(theta=0.5, n_obs=1000, dim_x=20, alpha_x=1, return_type="data.table")
obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d", z_cols="z") dml_iivm_obj = DoubleMLIIVM$new(obj_dml_data, ml_g, ml_m, ml_r)
dml_iivm_obj\$fit()
print(dml_iivm_obj)

================= DoubleMLIIVM Object ==================

------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): d
Covariates: x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, x13, x14, x15, x16, x17, x18, x19, x20
Instrument(s): z
No. Observations: 1000

------------------ Score & algorithm ------------------
Score function: LATE
DML algorithm: dml2

------------------ Machine learner   ------------------
ml_g: regr.ranger
ml_m: classif.ranger
ml_r: classif.ranger

------------------ Resampling        ------------------
No. folds: 5
No. repeated sample splits: 1
Apply cross-fitting: TRUE

------------------ Fit summary       ------------------
Estimates and significance testing of the effect of target variables
Estimate. Std. Error t value Pr(>|t|)
d    0.3569     0.1990   1.793    0.073 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



## 3.5. Difference-in-Differences Models (DID)#

Difference-in-Differences Models (DID) implemented in the package focus on the the binary treatment case with with two treatment periods.

Adopting the notation from Sant’Anna and Zhao (2020), let $$Y_{it}$$ be the outcome of interest for unit $$i$$ at time $$t$$. Further, let $$D_{it}=1$$ indicate if unit $$i$$ is treated before time $$t$$ (otherwise $$D_{it}=0$$). Since all units start as untreated ($$D_{i0}=0$$), define $$D_{i}=D_{i1}.$$ Relying on the potential outcome notation, denote $$Y_{it}(0)$$ as the outcome of unit $$i$$ at time $$t$$ if the unit did not receive treatment up until time $$t$$ and analogously for $$Y_{it}(1)$$ with treatment. Consequently, the observed outcome for unit is $$i$$ at time $$t$$ is $$Y_{it}=D_{it} Y_{it}(1) + (1-D_{it}) Y_{it}(0)$$. Further, let $$X_i$$ be a vector of pre-treatment covariates.

Target parameter of interest is the average treatment effect on the treated (ATTE)

$\theta_0 = \mathbb{E}[Y_{i1}(1)- Y_{i1}(0)|D_i=1].$

The corresponding identifying assumptions are

• (Cond.) Parallel Trends: $$\mathbb{E}[Y_{i1}(0) - Y_{i0}(0)|X_i, D_i=1] = \mathbb{E}[Y_{i1}(0) - Y_{i0}(0)|X_i, D_i=0]\quad a.s.$$

• Overlap: $$\exists\epsilon > 0$$: $$P(D_i=1) > \epsilon$$ and $$P(D_i=1|X_i) \le 1-\epsilon\quad a.s.$$

Note

For a more detailed introduction and recent developments of the difference-in-differences literature see e.g. Roth et al. (2022).

### 3.5.1. Panel data#

If panel data are available, the observations are assumed to be iid. of form $$(Y_{i0}, Y_{i1}, D_i, X_i)$$. Remark that the difference $$\Delta Y_i= Y_{i1}-Y_{i0}$$ has to be defined as the outcome y in the DoubleMLData object.

DoubleMLIDID implements difference-in-differences models for panel data. Estimation is conducted via its fit() method:

In : import numpy as np

In : import doubleml as dml

In : from doubleml.datasets import make_did_SZ2020

In : from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier

In : ml_g = RandomForestRegressor(n_estimators=100, max_depth=5, min_samples_leaf=5)

In : ml_m = RandomForestClassifier(n_estimators=100, max_depth=5, min_samples_leaf=5)

In : np.random.seed(42)

In : data = make_did_SZ2020(n_obs=500, return_type='DataFrame')

# y is already defined as the difference of observed outcomes
In : obj_dml_data = dml.DoubleMLData(data, 'y', 'd')

In : dml_did_obj = dml.DoubleMLDID(obj_dml_data, ml_g, ml_m)

In : print(dml_did_obj.fit())
================== DoubleMLDID Object ==================

------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['Z1', 'Z2', 'Z3', 'Z4']
Instrument variable(s): None
No. Observations: 500

------------------ Score & algorithm ------------------
Score function: observational
DML algorithm: dml2

------------------ Machine learner   ------------------
Learner ml_g: RandomForestRegressor(max_depth=5, min_samples_leaf=5)
Learner ml_m: RandomForestClassifier(max_depth=5, min_samples_leaf=5)
Out-of-sample Performance:
Learner ml_g0 RMSE: [[16.1683004]]
Learner ml_g1 RMSE: [[14.1492702]]
Learner ml_m RMSE: [[0.48467874]]

------------------ Resampling        ------------------
No. folds: 5
No. repeated sample splits: 1
Apply cross-fitting: True

------------------ Fit summary       ------------------
coef   std err         t     P>|t|    2.5 %    97.5 %
d -3.117346  2.029544 -1.535983  0.124542 -7.09518  0.860488


### 3.5.2. Repeated cross-sections#

For repeated cross-sections, the observations are assumed to be iid. of form $$(Y_{i}, D_i, X_i, T_i)$$, where $$T_i$$ is a dummy variable if unit $$i$$ is observed pre- or post-treatment period, such that the observed outcome can be defined as

$Y_i = T_i Y_{i1} + (1-T_i) Y_{i0}.$

Further, treatment and covariates are assumed to be stationary, such that the joint distribution of $$(D,X)$$ is invariant to $$T$$.

DoubleMLIDIDCS implements difference-in-differences models for repeated cross-sections. Estimation is conducted via its fit() method:

In : import numpy as np

In : import doubleml as dml

In : from doubleml.datasets import make_did_SZ2020

In : from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier

In : ml_g = RandomForestRegressor(n_estimators=100, max_depth=5, min_samples_leaf=5)

In : ml_m = RandomForestClassifier(n_estimators=100, max_depth=5, min_samples_leaf=5)

In : np.random.seed(42)

In : data = make_did_SZ2020(n_obs=500, cross_sectional_data=True, return_type='DataFrame')

In : obj_dml_data = dml.DoubleMLData(data, 'y', 'd', t_col='t')

In : dml_did_obj = dml.DoubleMLDIDCS(obj_dml_data, ml_g, ml_m)

In : print(dml_did_obj.fit())
================== DoubleMLDIDCS Object ==================

------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['Z1', 'Z2', 'Z3', 'Z4']
Instrument variable(s): None
Time variable: t
No. Observations: 500

------------------ Score & algorithm ------------------
Score function: observational
DML algorithm: dml2

------------------ Machine learner   ------------------
Learner ml_g: RandomForestRegressor(max_depth=5, min_samples_leaf=5)
Learner ml_m: RandomForestClassifier(max_depth=5, min_samples_leaf=5)
Out-of-sample Performance:
Learner ml_g_d0_t0 RMSE: [[17.66519949]]
Learner ml_g_d0_t1 RMSE: [[43.79590888]]
Learner ml_g_d1_t0 RMSE: [[33.27158699]]
Learner ml_g_d1_t1 RMSE: [[49.65172857]]
Learner ml_m RMSE: [[0.48909902]]

------------------ Resampling        ------------------
No. folds: 5
No. repeated sample splits: 1
Apply cross-fitting: True

------------------ Fit summary       ------------------
coef   std err         t     P>|t|     2.5 %     97.5 %
d -6.606635  8.724015 -0.757293  0.448874 -23.70539  10.492119