# 5. Score functions#

We use method-of-moments estimators for the target parameter $$\theta_0$$ based upon the empirical analog of the moment condition

$\mathbb{E}[ \psi(W; \theta_0, \eta_0)] = 0,$

where we call $$\psi$$ the score function, $$W=(Y,D,X,Z)$$, $$\theta_0$$ is the parameter of interest and $$\eta$$ denotes nuisance functions with population value $$\eta_0$$. We use score functions $$\psi(W; \theta, \eta)$$ that satisfy $$\mathbb{E}[ \psi(W; \theta_0, \eta_0)] = 0$$ with $$\theta_0$$ being the unique solution and that obey the Neyman orthogonality condition

$\partial_{\eta} \mathbb{E}[ \psi(W; \theta_0, \eta)] \bigg|_{\eta=\eta_0} = 0.$

The score functions of many double machine learning models (PLR, PLIV, IRM, IIVM) are linear in the parameter $$\theta$$, i.e.,

$\psi(W; \theta, \eta) = \psi_a(W; \eta) \theta + \psi_b(W; \eta).$

Hence the estimator can be written as

$\tilde{\theta}_0 = - \frac{\mathbb{E}_N[\psi_b(W; \eta)]}{\mathbb{E}_N[\psi_a(W; \eta)]}.$

The linearity of the score function in the parameter $$\theta$$ allows the implementation of key components in a very general way. The methods and algorithms to estimate the causal parameters, to estimate their standard errors, to perform a multiplier bootstrap, to obtain confidence intervals and many more are implemented in the abstract base class DoubleML. The object-oriented architecture therefore allows for easy extension to new model classes for double machine learning. This is doable with very minor effort.

If the linearity of the score function is not satisfied, the computations are more involved. In the Python package DoubleML, the functionality around the score functions is implemented in mixin classes called LinearScoreMixin and NonLinearScoreMixin. The R package currently only comes with an implementation for linear score functions. In case of a non-linear score function, the parameter estimate $$\tilde{\theta}_0$$ is obtained via numerical root search of the empirical analog of the moment condition $$\mathbb{E}[ \psi(W; \theta_0, \eta_0)] = 0$$.

## 5.1. Implementation of the score function and the estimate of the causal parameter#

As an example we consider a partially linear regression model (PLR) implemented in DoubleMLPLR.

In [1]: import doubleml as dml

In [2]: from doubleml.datasets import make_plr_CCDDHNR2018

In [3]: from sklearn.ensemble import RandomForestRegressor

In [4]: from sklearn.base import clone

In [5]: np.random.seed(3141)

In [6]: learner = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)

In [7]: ml_l = clone(learner)

In [8]: ml_m = clone(learner)

In [9]: data = make_plr_CCDDHNR2018(alpha=0.5, return_type='DataFrame')

In [10]: obj_dml_data = dml.DoubleMLData(data, 'y', 'd')

In [11]: dml_plr_obj = dml.DoubleMLPLR(obj_dml_data, ml_l, ml_m)

In [12]: dml_plr_obj.fit();

In [13]: print(dml_plr_obj)
================== DoubleMLPLR Object ==================

------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7', 'X8', 'X9', 'X10', 'X11', 'X12', 'X13', 'X14', 'X15', 'X16', 'X17', 'X18', 'X19', 'X20']
Instrument variable(s): None
No. Observations: 500

------------------ Score & algorithm ------------------
Score function: partialling out

------------------ Machine learner   ------------------
Learner ml_l: RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2)
Learner ml_m: RandomForestRegressor(max_depth=5, max_features=20, min_samples_leaf=2)
Out-of-sample Performance:
Regression:
Learner ml_l RMSE: [[1.12705095]]
Learner ml_m RMSE: [[1.03917696]]

------------------ Resampling        ------------------
No. folds: 5
No. repeated sample splits: 1

------------------ Fit summary       ------------------
coef   std err          t         P>|t|     2.5 %    97.5 %
d  0.480691  0.040533  11.859129  1.929729e-32  0.401247  0.560135

library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)
lgr::get_logger("mlr3")$set_threshold("warn") learner = lrn("regr.ranger", num.trees = 100, mtry = 20, min.node.size = 2, max.depth = 5) ml_l = learner$clone()
ml_m = learner$clone() set.seed(3141) data = make_plr_CCDDHNR2018(alpha=0.5, return_type='data.table') obj_dml_data = DoubleMLData$new(data, y_col="y", d_cols="d")
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_l, ml_m) dml_plr_obj$fit()
print(dml_plr_obj)

================= DoubleMLPLR Object ==================

------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): d
Covariates: X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X18, X19, X20
Instrument(s):
No. Observations: 500

------------------ Score & algorithm ------------------
Score function: partialling out
DML algorithm: dml2

------------------ Machine learner   ------------------
ml_l: regr.ranger
ml_m: regr.ranger

------------------ Resampling        ------------------
No. folds: 5
No. repeated sample splits: 1
Apply cross-fitting: TRUE

------------------ Fit summary       ------------------
Estimates and significance testing of the effect of target variables
Estimate. Std. Error t value Pr(>|t|)
d   0.54483    0.04502    12.1   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



The fit() method of DoubleMLPLR stores the estimate $$\tilde{\theta}_0$$ in its coef attribute.

In [14]: print(dml_plr_obj.coef)
[0.48069071]

print(dml_plr_obj$coef)   d 0.5448331  The values of the score function components $$\psi_a(W_i; \hat{\eta}_0)$$ and $$\psi_b(W_i; \hat{\eta}_0)$$ are stored in the attributes psi_elements['psi_a'] and psi_elements['psi_b'] (Python package DoubleML) and psi_a and psi_b (R package DoubleML). In the attribute psi the values of the score function $$\psi(W_i; \tilde{\theta}_0, \hat{\eta}_0)$$ are stored. In [15]: print(dml_plr_obj.psi[:5]) [[[ 0.02052929]] [[-0.00409412]] [[ 0.00138944]] [[-0.11208236]] [[-0.29678199]]]  print(dml_plr_obj$psi[1:5, ,1])

[1]  0.009329847  0.749854893  0.008883698 -0.403771948  0.863982270


## 5.2. Implemented Neyman orthogonal score functions#

### 5.2.1. Partially linear models (PLM)#

#### 5.2.1.1. Partially linear regression model (PLR)#

For the PLR model implemented in DoubleMLPLR one can choose between score='partialling out' and score='IV-type'.

score='partialling out' implements the score function:

\begin{align}\begin{aligned}\psi(W; \theta, \eta) &:= [Y - \ell(X) - \theta (D - m(X))] [D - m(X)]\\&= - (D - m(X)) (D - m(X)) \theta + (Y - \ell(X)) (D - m(X))\\&= \psi_a(W; \eta) \theta + \psi_b(W; \eta)\end{aligned}\end{align}

with $$\eta=(\ell,m)$$ and where the components of the linear score are

\begin{align}\begin{aligned}\psi_a(W; \eta) &= - (D - m(X)) (D - m(X)),\\\psi_b(W; \eta) &= (Y - \ell(X)) (D - m(X)).\end{aligned}\end{align}

score='IV-type' implements the score function:

\begin{align}\begin{aligned}\psi(W; \theta, \eta) &:= [Y - D \theta - g(X)] [D - m(X)]\\&= - D (D - m(X)) \theta + (Y - g(X)) (D - m(X))\\&= \psi_a(W; \eta) \theta + \psi_b(W; \eta)\end{aligned}\end{align}

with $$\eta=(g,m)$$ and where the components of the linear score are

\begin{align}\begin{aligned}\psi_a(W; \eta) &= - D (D - m(X)),\\\psi_b(W; \eta) &= (Y - g(X)) (D - m(X)).\end{aligned}\end{align}

#### 5.2.1.2. Partially linear IV regression model (PLIV)#

For the PLIV model implemented in DoubleMLPLIV one can choose between score='IV-type' and score='partialling out'.

score='partialling out' implements the score function:

\begin{align}\begin{aligned}\psi(W; \theta, \eta) &:= [Y - \ell(X) - \theta (D - r(X))] [Z - m(X)]\\&= - (D - r(X)) (Z - m(X)) \theta + (Y - \ell(X)) (Z - m(X))\\&= \psi_a(W; \eta) \theta + \psi_b(W; \eta)\end{aligned}\end{align}

with $$\eta=(\ell, m, r)$$ and where the components of the linear score are

\begin{align}\begin{aligned}\psi_a(W; \eta) &= - (D - r(X)) (Z - m(X)),\\\psi_b(W; \eta) &= (Y - \ell(X)) (Z - m(X)).\end{aligned}\end{align}

score='IV-type' implements the score function:

\begin{align}\begin{aligned}\psi(W; \theta, \eta) &:= [Y - D \theta - g(X)] [Z - m(X)]\\&= - D (Z - m(X)) \theta + (Y - g(X)) (Z - m(X))\\&= \psi_a(W; \eta) \theta + \psi_b(W; \eta)\end{aligned}\end{align}

with $$\eta=(g,m)$$ and where the components of the linear score are

\begin{align}\begin{aligned}\psi_a(W; \eta) &= - D (Z - m(X)),\\\psi_b(W; \eta) &= (Y - g(X)) (Z - m(X)).\end{aligned}\end{align}

### 5.2.2. Interactive regression models (IRM)#

#### 5.2.2.1. Binary Interactive Regression Model (IRM)#

For the IRM model implemented in DoubleMLIRM one can choose between score='ATE' and score='ATTE'. Furthermore, weights $$\omega(Y,D,X)$$ and

$\bar{\omega}(X) = \mathbb{E}[\omega(Y,D,X)|X]$

can be specified. The general score function takes the form

\begin{align}\begin{aligned}\psi(W; \theta, \eta) :=\; &\omega(Y,D,X) \cdot (g(1,X) - g(0,X))\\& + \bar{\omega}(X)\cdot \bigg(\frac{D (Y - g(1,X))}{m(X)} - \frac{(1 - D)(Y - g(0,X))}{1 - m(X)}\bigg) - \theta\\=& \psi_a(W; \eta) \theta + \psi_b(W; \eta)\end{aligned}\end{align}

with $$\eta=(g,m)$$ and where the components of the linear score are

\begin{align}\begin{aligned}\psi_a(W; \eta) =& - 1,\\\psi_b(W; \eta) =\; &\omega(Y,D,X) \cdot (g(1,X) - g(0,X))\\& + \bar{\omega}(X)\cdot \bigg(\frac{D (Y - g(1,X))}{m(X)} - \frac{(1 - D)(Y - g(0,X))}{1 - m(X)}\bigg).\end{aligned}\end{align}

If no weights are specified, score='ATE' sets the weights

\begin{align}\begin{aligned}\omega(Y,D,X) &= 1\\\bar{\omega}(X) &= 1\end{aligned}\end{align}

whereas score='ATTE' changes weights to:

\begin{align}\begin{aligned}\omega(Y,D,X) &= \frac{D}{\mathbb{E}_n[D]}\\\omega(Y,D,X) &= \frac{m(X)}{\mathbb{E}_n[D]}.\end{aligned}\end{align}

For more details on other weight specifications, see Weighted Average Treatment Effects.

#### 5.2.2.2. Average Potential Outcomes (APOs)#

For the average potential outcomes (APO) models implemented in DoubleMLAPO and DoubleMLAPOS the score='APO' is implemented. Furthermore, weights $$\omega(Y,D,X)$$ and

$\bar{\omega}(X) = \mathbb{E}[\omega(Y,D,X)|X]$

can be specified. For a given treatment level $$d$$ the general score function takes the form

\begin{align}\begin{aligned}\psi(W; \theta, \eta) :=\; &\omega(Y,D,X) \cdot g(d,X) + \bar{\omega}(X)\cdot \frac{1\lbrace D = d\rbrace }{m(X)}(Y - g(d,X)) - \theta\\=& \psi_a(W; \eta) \theta + \psi_b(W; \eta)\end{aligned}\end{align}

with $$\eta=(g,m)$$, where the true nuisance elements are

\begin{align}\begin{aligned}g_0(D, X) &= \mathbb{E}[Y | D, X],\\m_{0,d}(X) &= \mathbb{E}[1\lbrace D = d\rbrace | X] = P(D=d|X).\end{aligned}\end{align}

The components of the linear score are

\begin{align}\begin{aligned}\psi_a(W; \eta) =& - 1,\\\psi_b(W; \eta) =\; &\omega(Y,D,X) \cdot g(d,X) + \bar{\omega}(X)\cdot \frac{1\lbrace D = d\rbrace }{m(X)}(Y - g(d,X)).\end{aligned}\end{align}

If no weights are specified, the weights are set to

\begin{align}\begin{aligned}\omega(Y,D,X) &= 1\\\bar{\omega}(X) &= 1.\end{aligned}\end{align}

#### 5.2.2.3. Interactive IV model (IIVM)#

For the IIVM model implemented in DoubleMLIIVM we employ for score='LATE' the score function:

score='LATE' implements the score function:

\begin{align}\begin{aligned}\psi(W; \theta, \eta) :=\; &g(1,X) - g(0,X) + \frac{Z (Y - g(1,X))}{m(X)} - \frac{(1 - Z)(Y - g(0,X))}{1 - m(X)}\\&- \bigg(r(1,X) - r(0,X) + \frac{Z (D - r(1,X))}{m(X)} - \frac{(1 - Z)(D - r(0,X))}{1 - m(X)} \bigg) \theta\\=\; &\psi_a(W; \eta) \theta + \psi_b(W; \eta)\end{aligned}\end{align}

with $$\eta=(g, m, r)$$ and where the components of the linear score are

\begin{align}\begin{aligned}\psi_a(W; \eta) &= - \bigg(r(1,X) - r(0,X) + \frac{Z (D - r(1,X))}{m(X)} - \frac{(1 - Z)(D - r(0,X))}{1 - m(X)} \bigg),\\\psi_b(W; \eta) &= g(1,X) - g(0,X) + \frac{Z (Y - g(1,X))}{m(X)} - \frac{(1 - Z)(Y - g(0,X))}{1 - m(X)}.\end{aligned}\end{align}

#### 5.2.2.4. Potential quantiles (PQs)#

For DoubleMLPQ the only valid option is score='PQ'. For treatment=d with $$d\in\{0,1\}$$ and a quantile $$\tau\in (0,1)$$ this implements the nonlinear score function:

$\psi(W; \theta, \eta) := g_{d}(X, \tilde{\theta}) + \frac{1\{D=d\}}{m(X)}(1\{Y\le \theta\} - g_d(X, \tilde{\theta})) - \tau$

where $$\eta=(g_d,m)$$ with true values

\begin{align}\begin{aligned}g_{d,0}(X, \theta_0) &= \mathbb{E}[1\{Y\le \theta_0\}|X, D=d]\\m_0(X) &= P(D=d|X).\end{aligned}\end{align}

Remark that $$g_{d,0}(X,\theta_0)$$ depends on the target parameter $$\theta_0$$, such that the score is estimated with a preliminary estimate $$\tilde{\theta}$$. For further details, see Kallus et al. (2019).

#### 5.2.2.5. Local potential quantiles (LPQs)#

For DoubleMLLPQ the only valid option is score='LPQ'. For treatment=d with $$d\in\{0,1\}$$, instrument $$Z$$ and a quantile $$\tau\in (0,1)$$ this implements the nonlinear score function:

\begin{align}\begin{aligned}\psi(W; \theta, \eta) :=& \Big(g_{d, Z=1}(X, \tilde{\theta}) - g_{d, Z=0}(X, \tilde{\theta}) + \frac{Z}{m(X)}(1\{D=d\} \cdot 1\{Y\le \theta\} - g_{d, Z=1}(X, \tilde{\theta}))\\&\quad - \frac{1-Z}{1-m(X)}(1\{D=d\} \cdot 1\{Y\le \theta\} - g_{d, Z=0}(X, \tilde{\theta}))\Big) \cdot \frac{2d -1}{\gamma} - \tau\end{aligned}\end{align}

where $$\eta=(g_{d,Z=1}, g_{d,Z=0}, m, \gamma)$$ with true values

\begin{align}\begin{aligned}g_{d,Z=z,0}(X, \theta_0) &= \mathbb{E}[1\{D=d\} \cdot 1\{Y\le \theta_0\}|X, Z=z],\quad z\in\{0,1\}\\m_{Z=z,0}(X) &= P(D=d|X, Z=z),\quad z\in\{0,1\}\\m_0(X) &= P(Z=1|X)\\\gamma_0 &= \mathbb{E}[P(D=d|X, Z=1) - P(D=d|X, Z=0)].\end{aligned}\end{align}

Further, the compliance probability $$\gamma_0$$ is estimated with the two additional nuisance components

$m_{Z=z,0}(X) = P(D=d|X, Z=z),\quad z\in\{0,1\}.$

Remark that $$g_{d,Z=z,0}(X, \theta_0)$$ depends on the target parameter $$\theta_0$$, such that the score is estimated with a preliminary estimate $$\tilde{\theta}$$. For further details, see Kallus et al. (2019).

#### 5.2.2.6. Conditional value at risk (CVaR)#

For DoubleMLCVAR the only valid option is score='CVaR'. For treatment=d with $$d\in\{0,1\}$$ and a quantile $$\tau\in (0,1)$$ this implements the score function:

$\psi(W; \theta, \eta) := g_{d}(X, \gamma) + \frac{1\{D=d\}}{m(X)}(\max(\gamma, (1 - \tau)^{-1}(Y - \tau \gamma)) - g_d(X, \gamma)) - \theta$

where $$\eta=(g_d,m,\gamma)$$ with true values

\begin{align}\begin{aligned}g_{d,0}(X, \gamma_0) &= \mathbb{E}[\max(\gamma_0, (1 - \tau)^{-1}(Y - \tau \gamma_0))|X, D=d]\\m_0(X) &= P(D=d|X)\end{aligned}\end{align}

and $$\gamma_0$$ being the potential quantile of $$Y(d)$$. As for potential quantiles, the estimate $$g_d$$ is constructed via a preliminary estimate of $$\gamma_0$$. For further details, see Kallus et al. (2019).

### 5.2.3. Difference-in-Differences Models#

#### 5.2.3.1. Panel Data#

For the difference-in-differences model implemented in DoubleMLDID one can choose between score='observational' and score='experimental'.

score='observational' implements the score function (dropping the unit index $$i$$):

\begin{align}\begin{aligned}\psi(W,\theta, \eta) :&= -\frac{D}{\mathbb{E}_n[D]}\theta + \left(\frac{D}{\mathbb{E}_n[D]} - \frac{\frac{m(X) (1-D)}{1-m(X)}}{\mathbb{E}_n\left[\frac{m(X) (1-D)}{1-m(X)}\right]}\right) \left(Y_1 - Y_0 - g(0,X)\right)\\&= \psi_a(W; \eta) \theta + \psi_b(W; \eta)\end{aligned}\end{align}

where the components of the linear score are

\begin{align}\begin{aligned}\psi_a(W; \eta) &= - \frac{D}{\mathbb{E}_n[D]},\\\psi_b(W; \eta) &= \left(\frac{D}{\mathbb{E}_n[D]} - \frac{\frac{m(X) (1-D)}{1-m(X)}}{\mathbb{E}_n\left[\frac{m(X) (1-D)}{1-m(X)}\right]}\right) \left(Y_1 - Y_0 - g(0,X)\right)\end{aligned}\end{align}

and the nuisance elements $$\eta=(g, m)$$ are defined as

\begin{align}\begin{aligned}g_{0}(0, X) &= \mathbb{E}[Y_1 - Y_0|D=0, X]\\m_0(X) &= P(D=1|X).\end{aligned}\end{align}

If in_sample_normalization='False', the score is set to

\begin{align}\begin{aligned}\psi(W,\theta,\eta) &= - \frac{D}{p}\theta + \frac{D - m(X)}{p(1-m(X))}\left(Y_1 - Y_0 -g(0,X)\right)\\&= \psi_a(W; \eta) \theta + \psi_b(W; \eta)\end{aligned}\end{align}

with $$\eta=(g, m, p)$$, where $$p_0 = \mathbb{E}[D]$$ is estimated on the cross-fitting folds. Remark that this will result in the same score, but just uses slightly different normalization.

score='experimental' assumes that the treatment probability is independent of the covariates $$X$$ and implements the score function:

\begin{align}\begin{aligned}\psi(W,\theta, \eta) :=\; &-\theta + \left(\frac{D}{\mathbb{E}_n[D]} - \frac{1-D}{\mathbb{E}_n[1-D]}\right)\left(Y_1 - Y_0 -g(0,X)\right)\\&+ \left(1 - \frac{D}{\mathbb{E}_n[D]}\right) \left(g(1,X) - g(0,X)\right)\\=\; &\psi_a(W; \eta) \theta + \psi_b(W; \eta)\end{aligned}\end{align}

where the components of the linear score are

\begin{align}\begin{aligned}\psi_a(W; \eta) \;= &- 1,\\\psi_b(W; \eta) \;= &\left(\frac{D}{\mathbb{E}_n[D]} - \frac{1-D}{\mathbb{E}_n[1-D]}\right)\left(Y_1 - Y_0 -g(0,X)\right)\\&+ \left(1 - \frac{D}{\mathbb{E}_n[D]}\right) \left(g(1,X) - g(0,X)\right)\end{aligned}\end{align}

and the nuisance elements $$\eta=(g)$$ are defined as

\begin{align}\begin{aligned}g_{0}(0, X) &= \mathbb{E}[Y_1 - Y_0|D=0, X]\\g_{0}(1, X) &= \mathbb{E}[Y_1 - Y_0|D=1, X]\end{aligned}\end{align}

Analogously, if in_sample_normalization='False', the score is set to

\begin{align}\begin{aligned}\psi(W,\theta, \eta) :=\; &-\theta + \frac{D - p}{p(1-p)}\left(Y_1 - Y_0 -g(0,X)\right)\\&+ \left(1 - \frac{D}{p}\right) \left(g(1,X) - g(0,X)\right)\\=\; &\psi_a(W; \eta) \theta + \psi_b(W; \eta)\end{aligned}\end{align}

with $$\eta=(g, p)$$, where $$p_0 = \mathbb{E}[D]$$ is estimated on the cross-fitting folds. Remark that this will result in the same score, but just uses slightly different normalization.

#### 5.2.3.2. Repeated Cross-Sectional Data#

For the difference-in-differences model implemented in DoubleMLDIDCS one can choose between score='observational' and score='experimental'.

score='observational' implements the score function (dropping the unit index $$i$$):

\begin{align}\begin{aligned}\psi(W,\theta,\eta) :=\; & - \frac{D}{\mathbb{E}_n[D]}\theta + \frac{D}{\mathbb{E}_n[D]}\Big(g(1,1,X) - g(1,0,X) - (g(0,1,X) - g(0,0,X))\Big)\\& + \frac{DT}{\mathbb{E}_n[DT]} (Y - g(1,1,X))\\& - \frac{D(1-T)}{\mathbb{E}_n[D(1-T)]}(Y - g(1,0,X))\\& - \frac{m(X) (1-D)T}{1-m(X)} \mathbb{E}_n\left[\frac{m(X) (1-D)T}{1-m(X)}\right]^{-1} (Y-g(0,1,X))\\& + \frac{m(X) (1-D)(1-T)}{1-m(X)} \mathbb{E}_n\left[\frac{m(X) (1-D)(1-T)}{1-m(X)}\right]^{-1} (Y-g(0,0,X))\\=\; &\psi_a(W; \eta) \theta + \psi_b(W; \eta)\end{aligned}\end{align}

where the components of the linear score are

\begin{align}\begin{aligned}\psi_a(W; \eta) =\; &- \frac{D}{\mathbb{E}_n[D]},\\\psi_b(W; \eta) =\; &\frac{D}{\mathbb{E}_n[D]}\Big(g(1,1,X) - g(1,0,X) - (g(0,1,X) - g(0,0,X))\Big)\\& + \frac{DT}{\mathbb{E}_n[DT]} (Y - g(1,1,X))\\& - \frac{D(1-T)}{\mathbb{E}_n[D(1-T)]}(Y - g(1,0,X))\\& - \frac{m(X) (1-D)T}{1-m(X)} \mathbb{E}_n\left[\frac{m(X) (1-D)T}{1-m(X)}\right]^{-1} (Y-g(0,1,X))\\& + \frac{m(X) (1-D)(1-T)}{1-m(X)} \mathbb{E}_n\left[\frac{m(X) (1-D)(1-T)}{1-m(X)}\right]^{-1} (Y-g(0,0,X))\end{aligned}\end{align}

and the nuisance elements $$\eta=(g)$$ are defined as

$g_{0}(d, t, X) = \mathbb{E}[Y|D=d, T=t, X].$

If in_sample_normalization='False', the score is set to

\begin{align}\begin{aligned}\psi(W,\theta,\eta) :=\; & - \frac{D}{p}\theta + \frac{D}{p}\Big(g(1,1,X) - g(1,0,X) - (g(0,1,X) - g(0,0,X))\Big)\\& + \frac{DT}{p\lambda} (Y - g(1,1,X))\\& - \frac{D(1-T)}{p(1-\lambda)}(Y - g(1,0,X))\\& - \frac{m(X) (1-D)T}{p(1-m(X))\lambda} (Y-g(0,1,X))\\& + \frac{m(X) (1-D)(1-T)}{p(1-m(X))(1-\lambda)} (Y-g(0,0,X))\\=\; &\psi_a(W; \eta) \theta + \psi_b(W; \eta)\end{aligned}\end{align}

with $$\eta=(g, p, \lambda)$$, where $$p_0 = \mathbb{E}[D]$$ and $$\lambda_0 = \mathbb{E}[T]$$ are estimated on the cross-fitting folds. Remark that this will result in the same score, but just uses slightly different normalization.

score='experimental' assumes that the treatment probability is independent of the covariates $$X$$ and implements the score function:

\begin{align}\begin{aligned}\psi(W,\theta,\eta) :=\; & - \theta + \Big(g(1,1,X) - g(1,0,X) - (g(0,1,X) - g(0,0,X))\Big)\\& + \frac{DT}{\mathbb{E}_n[DT]} (Y - g(1,1,X))\\& - \frac{D(1-T)}{\mathbb{E}_n[D(1-T)]}(Y - g(1,0,X))\\& - \frac{(1-D)T}{\mathbb{E}_n[(1-D)T]} (Y-g(0,1,X))\\& + \frac{(1-D)(1-T)}{\mathbb{E}_n[(1-D)(1-T)]} (Y-g(0,0,X))\\=\; &\psi_a(W; \eta) \theta + \psi_b(W; \eta)\end{aligned}\end{align}

where the components of the linear score are

\begin{align}\begin{aligned}\psi_a(W; \eta) \;= &- 1,\\\psi_b(W; \eta) \;= &\Big(g(1,1,X) - g(1,0,X) - (g(0,1,X) - g(0,0,X))\Big)\\& + \frac{DT}{\mathbb{E}_n[DT]} (Y - g(1,1,X))\\& - \frac{D(1-T)}{\mathbb{E}_n[D(1-T)]}(Y - g(1,0,X))\\& - \frac{(1-D)T}{\mathbb{E}_n[(1-D)T]} (Y-g(0,1,X))\\& + \frac{(1-D)(1-T)}{\mathbb{E}_n[(1-D)(1-T)]} (Y-g(0,0,X))\end{aligned}\end{align}

and the nuisance elements $$\eta=(g, m)$$ are defined as

\begin{align}\begin{aligned}g_{0}(d, t, X) &= \mathbb{E}[Y|D=d, T=t, X]\\m_0(X) &= P(D=1|X).\end{aligned}\end{align}

Analogously, if in_sample_normalization='False', the score is set to

\begin{align}\begin{aligned}\psi(W,\theta,\eta) :=\; & - \theta + \Big(g(1,1,X) - g(1,0,X) - (g(0,1,X) - g(0,0,X))\Big)\\& + \frac{DT}{p\lambda} (Y - g(1,1,X))\\& - \frac{D(1-T)}{p(1-\lambda)}(Y - g(1,0,X))\\& - \frac{(1-D)T}{(1-p)\lambda} (Y-g(0,1,X))\\& + \frac{(1-D)(1-T)}{(1-p)(1-\lambda)} (Y-g(0,0,X))\\=\; &\psi_a(W; \eta) \theta + \psi_b(W; \eta)\end{aligned}\end{align}

with $$\eta=(g, m, p, \lambda)$$, where $$p_0 = \mathbb{E}[D]$$ and $$\lambda_0 = \mathbb{E}[T]$$ are estimated on the cross-fitting folds. Remark that this will result in the same score, but just uses slightly different normalization.

### 5.2.4. Sample Selection Models#

#### 5.2.4.1. Missingness at Random#

For DoubleMLSSM the score='missing-at-random' implements the score function:

$\psi(W; \theta, \eta) := \tilde{\psi}_1(W; \eta) - \tilde{\psi}_0(W; \eta) - \theta$

where

\begin{align}\begin{aligned}\tilde{\psi}_1(W; \eta) &= \frac{D \cdot S \cdot [Y - g(1,1,X)]}{m(X) \cdot \pi(1, X)} + g(1,1,X)\\\tilde{\psi}_0(W; \eta) &= \frac{(1-D) \cdot S \cdot [Y - g(0,1,X)]}{(1-m(X)) \cdot \pi(0, X)} + g(0,1,X)\end{aligned}\end{align}

for $$d\in\{0,1\}$$ and $$\eta=(g, m, \pi)$$ with true values

\begin{align}\begin{aligned}g_0(d,s,X) &= \mathbb{E}[Y|D=d, S=s, X]\\m_0(X) &= P(D=1|X)\\\pi_0(d, X) &= P(S=1|D=d, X).\end{aligned}\end{align}

For further details, see Bia, Huber and Lafférs (2023).

#### 5.2.4.2. Nonignorable Nonresponse#

For DoubleMLSSM the score='nonignorable' implements the score function:

$\psi(W; \theta, \eta) := \tilde{\psi}_1(W; \eta) - \tilde{\psi}_0(W; \eta) - \theta$

where

\begin{align}\begin{aligned}\tilde{\psi}_1(W; \eta) &= \frac{D \cdot S \cdot [Y - g(1,1,X,\Pi)]}{m(X, \Pi) \cdot \pi(1,X,Z)} + g(1,1,X,\Pi)\\\tilde{\psi}_0(W; \eta) &= \frac{(1-D) \cdot S \cdot [Y - g(0,1,X,\Pi)]}{(1-m(X,\Pi)) \cdot \pi(0,X,Z)} + g(0,1,X,\Pi)\end{aligned}\end{align}

for $$d\in\{0,1\}$$ and $$\eta=(g, m, \pi, \Pi)$$ with true values

\begin{align}\begin{aligned}\pi_0(d, X, Z) &= P(S=1|D=d, X, Z)\\\Pi_0 &:= \pi_0(D, Z, X) = P(S=1|D,X,Z)\\g_0(d,s,X) &= \mathbb{E}[Y|D=d, S=s, X, \Pi_0]\\m_0(X, \Pi_0) &= P(D=1|X, \Pi_0).\end{aligned}\end{align}

The estimate of $$\Pi_0$$ is constructed via a preliminary estimate of $$\pi_0(D,X,Z)$$ via nested cross-fitting.

For further details, see Bia, Huber and Lafférs (2023).

## 5.3. Specifying alternative score functions via callables#

Via callables user-written score functions can be used. This functionality is at the moment only implemented for specific model classes in Python. For the PLR model implemented in DoubleMLPLR an alternative score function can be set via score. Choose a callable object / function with signature score(y, d, g_hat, m_hat, smpls) which returns the two score components $$\psi_a()$$ and $$\psi_b()$$.

For example, the non-orthogonal score function

$\psi(W; \theta, \eta) = [Y - D \theta - g(X)] D$

can be obtained with

In [16]: import numpy as np

In [17]: def non_orth_score(y, d, l_hat, m_hat, g_hat, smpls):
....:     u_hat = y - g_hat
....:     psi_a = -np.multiply(d, d)
....:     psi_b = np.multiply(d, u_hat)
....:     return psi_a, psi_b
....:

non_orth_score = function(y, d, l_hat, m_hat, g_hat, smpls) {
u_hat = y - g_hat
psi_a = -1*d*d
psi_b = d*u_hat
psis = list(psi_a = psi_a, psi_b = psi_b)
return(psis)
}


Use DoubleMLPLR with inf_model=non_orth_score in order to obtain the estimator

$\tilde{\theta}_0 = - \frac{\mathbb{E}_N[D (Y-g(X))]}{\mathbb{E}_N[D^2]}$

when applying fit(). Note that this estimate will in general be prone to a regularization bias, see also Overcoming regularization bias by orthogonalization.