Python: First Stage and Causal Estimation#

This notebook illustrates the results from a simulation study. It shows insights on the relationship between the first stage ML predictive quality and the performance of the corresponding causal estimator. The data generating process (DGP) is based on Belloni et al. (2013). This DGP implements a high-dimensional sparse and linear model. We consider the case of \(n=100\) observations and \(p=200\) covariates. The covariates are correlated via a Toeplitz covariate structures. More details and the code are available from the GitHub repository DoubleML/DML-Hyperparameter-Tuning-Replication.

We employ a lasso learner for the first stage predictions, i.e., to predict the outcome variable \(Y\) as based on \(X\) in a partially linear model as well as predicting the (continuous) treatment variable \(D\) based on \(X\). As we employ a linear learner, this is equivalent to a linear model.

We are interested to what extent, the choice of the lasso penalty affects the first-stage predictions, e.g, as measured by the root mean squared error for the corresponding nuisance parameter, and the combined loss. Moreover, we would like to investigate how the predictive quality for the first stage translates into estimation quality of the causal parameter \(\theta_0\).

import doubleml as dml
from doubleml import DoubleMLData
import numpy as np
import pandas as pd
from itertools import product

from importlib.machinery import SourceFileLoader
from sklearn.linear_model import Lasso
from sklearn.linear_model import LassoCV

import as px
import plotly.graph_objects as go

# Load result files
path_to_res = ""

df_results = pd.read_csv(path_to_res, index_col=0)

Causal estimation vs. lasso penalty \(\lambda\)#

Estimation quality vs. \(\lambda\)#

We plot the mean squared error for the causal estimator \(MSE(\theta) = \frac{1}{R} (\hat{\theta}_0 - \theta_0)^2\) over a grid of \(\lambda=(\lambda_{\ell_0}, \lambda_{m_0})\) values for the nuisance component \(\ell_0(X) = E[Y|X]\) (predict \(Y\) based on \(X\)) and \(m_0(X) = E[D|X]\) (predict \(D\) based on \(X\)). \(R\) is the number of repetitions.

df_agg = df_results.groupby(['alpha_ml_l', 'alpha_ml_m']).mean(numeric_only = True).reset_index()
# Take logarithm of alphas
df_agg['ln_alpha_ml_l'] = np.log(df_agg['alpha_ml_l'])
df_agg['ln_alpha_ml_m'] = np.log(df_agg['alpha_ml_m'])

measure_col = 'sq_error'

# Choose columns for x-axis and y-axis (either alpha values or nuisance error)
(x_col, y_col) = 'ln_alpha_ml_m', 'ln_alpha_ml_l'

camera = dict(
    up=dict(x=0, y=0, z=1),
    center=dict(x=0, y=0, z=-0.28),
    eye=dict(x=-1.25, y=-1.25, z=1.25)

layout = go.Layout(autosize = False, width = 700, height = 700)

this_df = df_agg
val_list = this_df.loc[:, [x_col, y_col, measure_col]].pivot(index=x_col, columns=y_col, values=measure_col).values

fig = go.Figure(data=[go.Surface(contours = {"z": {"show": True, "start": 0.95, "end": 1, "size": 0.05}},
                                z=val_list, showscale = False)])
    scene = dict(xaxis_title=f"-{x_col}",
                zaxis_title=measure_col), scene_camera=camera, title=f'Mean squared error and choice of lasso penalty')