Python: IRM and APO Model Comparison#

In this simple example, we illustrate how the (binary) DoubleMLIRM class relates to the DoubleMLAPOS class.

More specifically, we focus on the causal_contrast() method of DoubleMLAPOS in a binary setting to highlight, when both methods coincide.

[1]:
import numpy as np
import pandas as pd
import doubleml as dml

from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.preprocessing import PolynomialFeatures

from matplotlib import pyplot as plt

from doubleml.datasets import make_irm_data

Data#

We rely on the make_irm_data go generate data with a binary treatment.

[2]:
n_obs = 2000

np.random.seed(42)
df = make_irm_data(
    n_obs=n_obs,
    dim_x=10,
    theta=5.0,
    return_type='DataFrame'
)

df.head()
[2]:
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 y d
0 0.541060 0.195963 0.835750 1.180271 0.655959 1.044239 -1.214769 -2.160836 -2.196655 -2.037529 4.704558 1.0
1 1.340235 2.319420 2.193375 1.139508 1.458814 0.493195 1.313870 0.127337 0.418400 1.070433 6.113952 1.0
2 -0.563563 -1.480199 0.943548 -0.400113 0.757559 0.131483 -0.574160 0.067212 -0.427654 -1.342117 -0.226479 0.0
3 -0.044176 -2.122421 -1.526582 -1.892828 -1.777867 -1.425325 0.020272 0.487524 -0.579197 -0.752909 0.367366 0.0
4 -1.896263 -1.198493 -1.285483 -1.361623 -2.921778 -2.026966 -1.107156 -0.498122 0.568287 1.004542 0.913585 0.0

First, define the DoubleMLData object.

[3]:
dml_data = dml.DoubleMLData(
    df,
    y_col='y',
    d_cols='d'
)

Learners and Hyperparameters#

To simplify the comparison and keep the variation in learners as small as possible, we will use linear models.

[4]:
n_folds = 5
n_rep = 1

dml_kwargs = {
    "obj_dml_data": dml_data,
    "ml_g": LinearRegression(),
    "ml_m": LogisticRegression(random_state=42),
    "n_folds": n_folds,
    "n_rep": n_rep,
    "normalize_ipw": False,
    "trimming_threshold": 1e-2,
    "draw_sample_splitting": False,
}

Remark: All results rely on the exact same predictions for the machine learning algorithms. If the more than two treatment levels exists the DoubleMLAPOS model fit multiple binary models such that the combined model might differ.

Further, to remove all uncertainty from sample splitting, we will rely on externally provided sample splits.

[5]:
from doubleml.utils import DoubleMLResampling

rskf = DoubleMLResampling(
    n_folds=n_folds,
    n_rep=n_rep,
    n_obs=n_obs,
    stratify=df['d'],
)
all_smpls = rskf.split_samples()

Average Treatment Effect#

Comparing the effect estimates for the DoubleMLIRM and causal_contrasts of the DoubleMLAPOS model, we can numerically equivalent results for the ATE.

[6]:
dml_irm = dml.DoubleMLIRM(**dml_kwargs)
dml_irm.set_sample_splitting(all_smpls)
print("Training IRM Model")
dml_irm.fit()

print(dml_irm.summary)
Training IRM Model
       coef   std err          t  P>|t|     2.5 %    97.5 %
d  5.001907  0.066295  75.449677    0.0  4.871972  5.131842
[7]:
dml_apos = dml.DoubleMLAPOS(treatment_levels=[0,1], **dml_kwargs)
dml_apos.set_sample_splitting(all_smpls)
print("Training APOS Model")
dml_apos.fit()
print(dml_apos.summary)

print("Evaluate Causal Contrast")
causal_contrast = dml_apos.causal_contrast(reference_levels=[0])
print(causal_contrast.summary)
Training APOS Model
       coef   std err           t    P>|t|     2.5 %    97.5 %
0  0.038103  0.045984    0.828618  0.40732 -0.052023  0.128229
1  5.040010  0.047873  105.278454  0.00000  4.946180  5.133839
Evaluate Causal Contrast
            coef   std err          t  P>|t|     2.5 %    97.5 %
1 vs 0  5.001907  0.066295  75.449677    0.0  4.871972  5.131842

For a direct comparison, see

[8]:
print("IRM Model")
print(dml_irm.summary)
print("Causal Contrast")
print(causal_contrast.summary)
IRM Model
       coef   std err          t  P>|t|     2.5 %    97.5 %
d  5.001907  0.066295  75.449677    0.0  4.871972  5.131842
Causal Contrast
            coef   std err          t  P>|t|     2.5 %    97.5 %
1 vs 0  5.001907  0.066295  75.449677    0.0  4.871972  5.131842

Average Treatment Effect on the Treated#

For the average treatment effect on the treated we can adjust the score in DoubleMLIRM model to score="ATTE".

[9]:
dml_irm_atte = dml.DoubleMLIRM(score="ATTE", **dml_kwargs)
dml_irm_atte.set_sample_splitting(all_smpls)
print("Training IRM Model")
dml_irm_atte.fit()

print(dml_irm_atte.summary)
Training IRM Model
       coef  std err          t  P>|t|     2.5 %    97.5 %
d  5.540549  0.08364  66.242427    0.0  5.376617  5.704482

In order to consider weighted effects in the DoubleMLAPOS model, we have to specify the correct weight, see User Guide.

As these weights include the propensity score, we will use the predicted propensity score from the previous DoubleMLIRM model.

[10]:
p_hat = df["d"].mean()
m_hat = dml_irm_atte.predictions["ml_m"][:, :, 0]

weights_dict = {
    "weights": df["d"] / p_hat,
    "weights_bar": m_hat / p_hat,
}

dml_apos_atte = dml.DoubleMLAPOS(treatment_levels=[0,1], weights=weights_dict, **dml_kwargs)
dml_apos_atte.set_sample_splitting(all_smpls)
print("Training APOS Model")
dml_apos_atte.fit()
print(dml_apos_atte.summary)

print("Evaluate Causal Contrast")
causal_contrast_atte = dml_apos_atte.causal_contrast(reference_levels=[0])
print(causal_contrast_atte.summary)
Training APOS Model
       coef   std err           t     P>|t|     2.5 %    97.5 %
0  0.044929  0.073384    0.612246  0.540375 -0.098901  0.188760
1  5.585479  0.039895  140.003045  0.000000  5.507285  5.663672
Evaluate Causal Contrast
            coef  std err          t  P>|t|     2.5 %    97.5 %
1 vs 0  5.540549  0.08364  66.242427    0.0  5.376617  5.704482

The same results can be achieved by specifying the weights for DoubleMLIRM class with score='ATE'.

[11]:
dml_irm_weighted_atte = dml.DoubleMLIRM(score="ATE", weights=weights_dict, **dml_kwargs)
dml_irm_weighted_atte.set_sample_splitting(all_smpls)
print("Training IRM Model")
dml_irm_weighted_atte.fit()

print(dml_irm_weighted_atte.summary)
Training IRM Model
       coef  std err          t  P>|t|     2.5 %    97.5 %
d  5.540549  0.08364  66.242427    0.0  5.376617  5.704482

In summary, see

[12]:
print("IRM Model ATTE Score")
print(dml_irm_atte.summary.round(4))
print("IRM Model (Weighted)")
print(dml_irm_weighted_atte.summary.round(4))
print("Causal Contrast (Weighted)")
print(causal_contrast_atte.summary.round(4))
IRM Model ATTE Score
     coef  std err        t  P>|t|   2.5 %  97.5 %
d  5.5405   0.0836  66.2424    0.0  5.3766  5.7045
IRM Model (Weighted)
     coef  std err        t  P>|t|   2.5 %  97.5 %
d  5.5405   0.0836  66.2424    0.0  5.3766  5.7045
Causal Contrast (Weighted)
          coef  std err        t  P>|t|   2.5 %  97.5 %
1 vs 0  5.5405   0.0836  66.2424    0.0  5.3766  5.7045

Sensitivity Analysis#

The sensitvity analysis gives identical results.

[13]:
dml_irm.sensitivity_analysis()
print(dml_irm.sensitivity_summary)
================== Sensitivity Analysis ==================

------------------ Scenario          ------------------
Significance Level: level=0.95
Sensitivity parameters: cf_y=0.03; cf_d=0.03, rho=1.0

------------------ Bounds with CI    ------------------
   CI lower  theta lower     theta  theta upper  CI upper
d  4.773339     4.890665  5.001907     5.113149  5.217684

------------------ Robustness Values ------------------
   H_0     RV (%)    RVa (%)
d  0.0  72.206256  66.239019
[14]:
causal_contrast.sensitivity_analysis()
print(causal_contrast.sensitivity_summary)
================== Sensitivity Analysis ==================

------------------ Scenario          ------------------
Significance Level: level=0.95
Sensitivity parameters: cf_y=0.03; cf_d=0.03, rho=1.0

------------------ Bounds with CI    ------------------
        CI lower  theta lower     theta  theta upper  CI upper
1 vs 0  4.773339     4.890665  5.001907     5.113149  5.217684

------------------ Robustness Values ------------------
        H_0     RV (%)    RVa (%)
1 vs 0  0.0  72.206256  66.239019

Effect Heterogeneity#

For conditional treatment effects the exact same methods do not exist. Nevertheless, we will compare the capo() variant of the DoubleMLAPO class to the corresponding cate() method of the DoubleMLIRM class.

For a simple case we will just use a polynomial basis of the first feature X1. To plot the data we will evaluate the methods on the corresponding grid of basis values.

[15]:
X = df[["X1"]]
poly = PolynomialFeatures(degree=2, include_bias=True)

basis_matrix = poly.fit_transform(X)
basis_df = pd.DataFrame(basis_matrix, columns=poly.get_feature_names_out(["X1"]))

grid = pd.DataFrame({"X1": np.linspace(np.quantile(df["X1"], 0.1), np.quantile(df["X1"], 0.9), 100)})
grid_basis = pd.DataFrame( poly.transform(grid), columns=poly.get_feature_names_out(["X1"]))

Apply the cate() method to the basis and evaluate on the transformed grid values.

[16]:
cate = dml_irm.cate(basis_df)
print(cate)
np.random.seed(42)
df_cate = cate.confint(grid_basis, level=0.95, joint=True, n_rep_boot=2000)
================== DoubleMLBLP Object ==================

------------------ Fit summary ------------------
          coef   std err          t         P>|t|    [0.025    0.975]
1     4.979966  0.065976  75.481713  0.000000e+00  4.850656  5.109277
X1    0.921198  0.072516  12.703325  5.666959e-37  0.779068  1.063327
X1^2  0.001698  0.042804   0.039661  9.683637e-01 -0.082197  0.085592

The corresponding apo() method can be used for the treatment levels \(0\) and \(1\).

[17]:
capo0 = dml_apos.modellist[0].capo(basis_df)
print(capo0)
np.random.seed(42)
df_capo0 = capo0.confint(grid_basis, level=0.95, joint=True, n_rep_boot=2000)

capo1 = dml_apos.modellist[1].capo(basis_df)
print(capo1)
np.random.seed(42)
df_capo1 = capo1.confint(grid_basis, level=0.95, joint=True, n_rep_boot=2000)
================== DoubleMLBLP Object ==================

------------------ Fit summary ------------------
          coef   std err         t     P>|t|    [0.025    0.975]
1     0.013450  0.046451  0.289555  0.772157 -0.077592  0.104492
X1    0.001471  0.056915  0.025841  0.979384 -0.110081  0.113022
X1^2  0.024266  0.037114  0.653829  0.513222 -0.048476  0.097009
================== DoubleMLBLP Object ==================

------------------ Fit summary ------------------
          coef   std err           t         P>|t|    [0.025    0.975]
1     4.993416  0.047239  105.704896  0.000000e+00  4.900829  5.086004
X1    0.922668  0.045172   20.425636  9.896182e-93  0.834133  1.011204
X1^2  0.025964  0.022258    1.166517  2.434054e-01 -0.017660  0.069589

In this example the average potential outcome of the control group is zero (as can be seen in the outcome definition, see documentation). Let us visualize the effects

[18]:
df_cate['x'] = grid_basis['X1']

plt.rcParams['figure.figsize'] = 10., 7.5
fig, (ax1, ax2) = plt.subplots(1, 2)

# Plot CATE
ax1.plot(df_cate['x'], df_cate['effect'], label='Estimated Effect')
ax1.fill_between(df_cate['x'], df_cate['2.5 %'], df_cate['97.5 %'], alpha=.3, label='Confidence Interval')
ax1.legend()
ax1.set_title('CATE')
ax1.set_xlabel('X1')
ax1.set_ylabel('Effect and 95%-CI')

# Plot Average Potential Outcomes
ax2.plot(df_cate['x'], df_capo0['effect'], label='APO(0)')
ax2.fill_between(df_cate['x'], df_capo0['2.5 %'], df_capo0['97.5 %'], alpha=.3, label='Confidence Interval')
ax2.plot(df_cate['x'], df_capo1['effect'], label='APO(1)')
ax2.fill_between(df_cate['x'], df_capo1['2.5 %'], df_capo1['97.5 %'], alpha=.3, label='Confidence Interval')
ax2.legend()
ax2.set_title('Average Potential Outcomes')
ax2.set_xlabel('X1')
ax2.set_ylabel('Effect and 95%-CI')

# Ensure the same scale on y-axis
ax1.set_ylim(min(ax1.get_ylim()[0], ax2.get_ylim()[0]), max(ax1.get_ylim()[1], ax2.get_ylim()[1]))
ax2.set_ylim(min(ax1.get_ylim()[0], ax2.get_ylim()[0]), max(ax1.get_ylim()[1], ax2.get_ylim()[1]))

plt.show()
../_images/examples_py_double_ml_irm_vs_apo_34_0.png

The causal_contrast() method does not currently not have a cate() method implemented. But the cate can be manually constructed via the the correct score function.

[19]:
orth_signal = -1.0 * (causal_contrast.scaled_psi.reshape(-1) - causal_contrast.thetas)

causal_contrast_cate = dml.utils.DoubleMLBLP(orth_signal, basis_df)
causal_contrast_cate.fit()
print(causal_contrast_cate.summary)
np.random.seed(42)
df_causal_contrast_cate = causal_contrast_cate.confint(grid_basis, level=0.95, joint=True, n_rep_boot=2000)

print("CATE (IRM) as comparison:")
print(cate.summary)
          coef   std err          t         P>|t|    [0.025    0.975]
1     4.979966  0.065976  75.481713  0.000000e+00  4.850656  5.109277
X1    0.921198  0.072516  12.703325  5.666959e-37  0.779068  1.063327
X1^2  0.001698  0.042804   0.039661  9.683637e-01 -0.082197  0.085592
CATE (IRM) as comparison:
          coef   std err          t         P>|t|    [0.025    0.975]
1     4.979966  0.065976  75.481713  0.000000e+00  4.850656  5.109277
X1    0.921198  0.072516  12.703325  5.666959e-37  0.779068  1.063327
X1^2  0.001698  0.042804   0.039661  9.683637e-01 -0.082197  0.085592
[20]:
plt.rcParams['figure.figsize'] = 10., 7.5
fig, (ax1, ax2) = plt.subplots(1, 2)

# Plot CATE
ax1.plot(df_cate['x'], df_cate['effect'], label='Estimated Effect')
ax1.fill_between(df_cate['x'], df_cate['2.5 %'], df_cate['97.5 %'], alpha=.3, label='Confidence Interval')
ax1.legend()
ax1.set_title('CATE (IRM)')
ax1.set_xlabel('X1')
ax1.set_ylabel('Effect and 95%-CI')

# Plot Average Potential Outcomes
ax2.plot(df_cate['x'], df_causal_contrast_cate['effect'], label='Estimated Effect')
ax2.fill_between(df_cate['x'], df_causal_contrast_cate['2.5 %'], df_causal_contrast_cate['97.5 %'], alpha=.3, label='Confidence Interval')
ax2.legend()
ax2.set_title('CATE (Causal Contrast)')
ax2.set_xlabel('X1')
ax2.set_ylabel('Effect and 95%-CI')

# Ensure the same scale on y-axis
ax1.set_ylim(min(ax1.get_ylim()[0], ax2.get_ylim()[0]), max(ax1.get_ylim()[1], ax2.get_ylim()[1]))
ax2.set_ylim(min(ax1.get_ylim()[0], ax2.get_ylim()[0]), max(ax1.get_ylim()[1], ax2.get_ylim()[1]))

plt.show()
../_images/examples_py_double_ml_irm_vs_apo_37_0.png