{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Python: PLM and IRM for Multiple Treatments\n", "\n", "In this example, we show the behavior of the PLM and IRM estimators when the treatment effect is heterogeneous across different treatments. We show that since the PLM estimates a convex-weighted average of treatment effects, it is not able to recover the true treatment effects when they are heterogeneous. On the other hand, the IRM estimator is able to recover the true treatment effects when they are heterogeneous. This is shown to matter substantially when ranking treatments by their estimated treatment effects.\n", "\n", "\n", "We assume basic knowledge of the potential outcomes framework and the assumptions of the PLM and IRM estimators. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "from doubleml import DoubleMLData, DoubleMLPLR, DoubleMLIRM\n", "from xgboost import XGBRegressor, XGBClassifier\n", "\n", "%matplotlib inline\n", "%config InlineBackend.figure_format = 'retina'\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ranking Treatment Effects under Treatment Propensity and Treatment Effect Heterogeneity\n", "\n", "We consider a setting with two treatments $D_1$ and $D_2$, a single binary covariate $X$ (with $p=0.5$), and a continuous outcome $Y$. The propensity scores for the two treatments are given by:\n", "\n", "| X \\ D | $D_1$ = 1 | $D_2$ = 1 |\n", "|-------|--------|--------|\n", "| X = 0 | 0.01 | 0.5 |\n", "| X = 1 | 0.5 | 0.01 |\n", "\n", "And the underlying heterogeneous treatment effects are given by:\n", "\n", "| X \\ $\\tau$ | $\\tau_1$ | $\\tau_2$ |\n", "|-------|----|----|\n", "| X = 0 | -3 | -2 |\n", "| X = 1 | 3 | 3 |\n", "| ATE | 0 | 0.5|\n", "\n", "The researcher seeks to rank the two treatments based on their average treatment effects. We compare the behavior of the PLM and IRM estimators in this setting.\n", "\n", "We implement the DGP below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "def dgp(\n", " n = 100_000,\n", " treat1_params = {\n", " \"p_x0\": 0.01, \"p_x1\": 0.50, \"τ_x0\": -3, \"τ_x1\": 3,\n", " },\n", " treat2_params = {\n", " \"p_x0\": 0.5, \"p_x1\": 0.01, \"τ_x0\": -2, \"τ_x1\": 3,\n", " },\n", " ):\n", " # utility functions to transform between the probability scale and the logit scale\n", " sigmoid = lambda x: 1 / (1 + np.exp(-x))\n", " inv_sigmoid = lambda p: np.log(p / (1 - p))\n", "\n", " df = pd.DataFrame()\n", " # noise terms\n", " eta1 = np.random.normal(0, 0.1, size=n)\n", " eta2 = np.random.normal(0, 0.1, size=n)\n", " eps = np.random.normal(0, 1, size=n)\n", " # binary covariate\n", " df[\"x\"] = np.random.randint(2, size=n) # half and half\n", " df[[\"x0\", \"x1\"]] = pd.get_dummies(df.x)\n", " # treatment 1 propensity score\n", " pscore1 = df.x0 * (inv_sigmoid(treat1_params[\"p_x0\"]) + eta1) + df.x1 * (\n", " inv_sigmoid(treat1_params[\"p_x1\"]) + eta2\n", " )\n", " # treatment 2 propensity score\n", " pscore2= df.x0 * (inv_sigmoid(treat2_params[\"p_x0\"]) + eta2) + df.x1 * (\n", " inv_sigmoid(treat2_params[\"p_x1\"]) + eta2\n", " )\n", " df['d1'] = np.random.binomial(1, sigmoid(pscore1))\n", " df['d2'] = np.random.binomial(1, sigmoid(pscore2))\n", " # outcome 1\n", " df[\"y\"] = (\n", " 0\n", " + 1 * df.x0 # arbitrary covariate coefs\n", " - 2 * df.x1\n", " + df.x0 * df.d1 * (treat1_params[\"τ_x0\"])\n", " + df.x1 * df.d1 * (treat1_params[\"τ_x1\"])\n", " + df.x0 * df.d2 * (treat2_params[\"τ_x0\"])\n", " + df.x1 * df.d2 * (treat2_params[\"τ_x1\"])\n", " + eps\n", " )\n", " return df.drop([\"x0\", \"x1\"], axis=1)\n", "df = dgp()\n", "df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(pd.crosstab(df.x, df.d1, normalize='index'))\n", "print(pd.crosstab(df.x, df.d2, normalize='index'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Partially Linear Model performs overlap weighting\n", "\n", "The partially linear model inherits the properties of linear regression under treatment effect heterogeneity (Angrist (1998), Angrist and Krueger(1999), Aronow and Samii (2016)). The probability limit of the treatment coefficient in the partially linear model $Y_i = \\tau D_i + g(X_i) + \\epsilon_i$ is given by:\n", "\n", "$$\n", "p\\!\\!-\\!\\!\\lim \\hat{\\tau} = \\mathbb{E} [\\gamma(X) \\tau(X)]\n", "$$\n", "\n", "where \n", "\n", "$$\n", "\\gamma(X) = \\frac{\\mathbb{V}(D \\mid X)}{\\mathbb{E} [\\mathbb{V}(D \\mid X)]}\n", "$$\n", "\n", "This simplifies to \n", "\n", "$$\n", "\\gamma(X) = \\frac{p(X)(1-p(X))}{\\mathbb{E}[p(X)(1-p(X))]}\n", "$$\n", "\n", "when $D$ is binary.\n", "\n", "We have deliberately generated a dataset with varying propensity scores across treatments and covariates. Since we have a single binary covariate, we can analytically estimate the plim of the above estimator. The regression coefficients are equal to:\n", "\n", "$$\n", "\\begin{align*}\n", "\\tilde{\\tau}_1 & = \\frac{-3 \\cdot 0.01 \\cdot 0.99 + 3 \\cdot 0.5 \\cdot 0.5}{\n", " 0.01 \\cdot 0.99 + 0.5 \\cdot 0.5\n", "} = 2.7714 \\\\\n", "\\tilde{\\tau}_2 & = \\frac{-2 \\cdot 0.5 \\cdot 0.5 + 3 \\cdot 0.01 \\cdot 0.99}{0.01 \\cdot 0.99 + 0.5 \\cdot 0.5} = -1.8095\n", "\\end{align*}\n", "$$\n", "\n", "So, we see that the PLM gets the ranking wrong; even though the treatment effect of $D_1$ (0) is lower than that of $D_2$ (0.5), the PLM estimates the treatment effect of $D_1$ to be higher than that of $D_2$ because it upweights the stratum with large positive effects when estimating the effect for $D_1$ and vice versa for $D_2$.\n", "\n", "We verify this numerically with the `DoubleML`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = dgp()\n", "dml_data = DoubleMLData(df, 'y', ['d1', 'd2'], 'x')\n", "plr_obj = DoubleMLPLR(dml_data,\n", " ml_l = XGBRegressor(),\n", " ml_m = XGBClassifier(),\n", " n_folds=5)\n", "plr_obj.fit()\n", "plr_est = plr_obj.summary\n", "print(plr_est)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Augmented Inverse Propensity Weighting Model estimates the ATE under arbitrary effect and propensity score heterogeneity\n", "\n", "The AIPW estimator is given by:\n", "\n", "$$\n", "\\tau_{\\text{AIPW}} = \\frac{1}{n} \\sum_i \\left( \\left[ \\hat{g}(1, X_i) - \\hat{g}(0, X_i) \\right] + \n", "\\frac{D_i(Y_i - \\hat{g}(1, X_i))}{\\hat{m}(X_i)} - \\frac{(1-D_i)(Y_i - \\hat{g}(0, X_i))}{1 - \\hat{m}(X_i)}\n", "\\right)\n", "$$\n", "\n", "and estimates the ATE consistently under arbitrary treatment effect and propensity score heterogeneity, and therefore ranks the two treatments correctly (i.e. $D_2$ has a higher treatment effect than $D_1$).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = dgp()\n", "aipw_obj_1 = DoubleMLIRM(DoubleMLData(df, 'y', ['d1'], 'x'),\n", " ml_g = XGBRegressor(),\n", " ml_m = XGBClassifier(),\n", " n_folds=5)\n", "aipw_obj_1.fit()\n", "aipw_obj_2 = DoubleMLIRM(DoubleMLData(df, 'y', ['d2'], 'x'),\n", " ml_g = XGBRegressor(),\n", " ml_m = XGBClassifier(),\n", " n_folds=5)\n", "aipw_obj_2.fit()\n", "aipw_est_1, aipw_est_2 = aipw_obj_1.summary, aipw_obj_2.summary\n", "print(aipw_est_1)\n", "print(aipw_est_2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary Figure\n", "\n", "We summarize the true treatment effects, the PLM estimates, and the IRM estimates in a figure." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "colors = ['#da1440', '#e45228', '#ec973f', '#a4a147', '#008e80', '#008dbd',\n", " '#f6ebce', '#838235', '#3f5d93', '#50768b', '#133f5a', '#afd9e4']\n", "\n", "# Plot the data\n", "x = np.array([1, 2])\n", "y = [(-3+3)/2, (-2+3)/2]\n", "fig, ax = plt.subplots(dpi = 100, figsize = (6, 4))\n", "# plm\n", "plt.errorbar(x,\n", " plr_est.coef,\n", " yerr=1.96 * plr_est['std err'], markersize = 1,\n", " fmt='o', color=colors[0], ecolor=colors[0], capsize=5, label = 'PLM Estimate')\n", "plt.errorbar(x+.05,\n", " np.array([aipw_est_1.coef, aipw_est_2.coef]).flatten(),\n", " yerr = 1.96 * np.array([aipw_est_1.iloc[:,1].values[0],\n", " aipw_est_2.iloc[:,1].values[0]]),\n", " markersize = 3,\n", " fmt='o', color=colors[8], ecolor=colors[8], capsize=5, label = 'AIPW Estimate'\n", " )\n", "# true values\n", "plt.scatter(x-.05, y, color = colors[2], label = 'ATE')\n", "plt.scatter(x-.1, [-3, -2], color = colors[5], label=r\"CATE: $x=0$\")\n", "plt.scatter(x-.12, [3, 3], color = colors[4], label=r\"CATE: $x=1$\")\n", "plt.legend(loc = 0)\n", "plt.title(\"PLMs do not estimate ATEs \\n This is bad for ranking treatments\")\n", "plt.axis((.5, 3, -3.5, 4))\n", "ax.set_xticks([])\n", "string_labels = ['Treatment 1', 'Treatment 2']\n", "ax.set_xticks(x)\n", "ax.set_xticklabels(string_labels)\n", "ax.grid(True)\n", "ax.set_ylabel(\"Treatment Effect\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Additional Results: CATE estimates\n", "\n", "As an additional comparison, we can add the CATE estimates based on the previous PLR and IRM examples." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# pandas data frame with one-hot encoded values for x\n", "groups = pd.get_dummies(df.x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Repeat estimation for each treatment var separately\n", "# d1\n", "dml_data = DoubleMLData(df, 'y', 'd1', ['d2', 'x'])\n", "plr_obj_1 = DoubleMLPLR(dml_data,\n", " ml_l = XGBRegressor(),\n", " ml_m = XGBClassifier(),\n", " n_folds=5)\n", "plr_obj_1.fit()\n", "plr_est1 = plr_obj_1.summary\n", "plr_obj_1.gate(groups=groups).summary.round(3)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# d2\n", "dml_data = DoubleMLData(df, 'y', 'd2', ['d1', 'x'])\n", "plr_obj_2 = DoubleMLPLR(dml_data,\n", " ml_l = XGBRegressor(),\n", " ml_m = XGBClassifier(),\n", " n_folds=5)\n", "plr_obj_2.fit()\n", "plr_est2 = plr_obj_2.summary\n", "plr_obj_2.gate(groups=groups).summary.round(3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# IRM models\n", "# d1\n", "aipw_obj_1.gate(groups=groups).summary.round(3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# IRM models\n", "# d2\n", "aipw_obj_2.gate(groups=groups).summary.round(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "\n", "- Lal, A., Chou, W., & Schaefer, J. (2024). Using Double Machine Learning to Rank Treatments, Working Paper. Poster available at [https://apoorvalal.github.io/files/slides/acic_2024_poster.pdf](https://apoorvalal.github.io/files/slides/acic_2024_poster.pdf). Thread available at [https://x.com/Apoorva__Lal/status/1798913180930109556](https://x.com/Apoorva__Lal/status/1798913180930109556).\n", "- Angrist, J. D. (1998). Estimating the labor market impact of voluntary military service using social security data on military applicants. Econometrica, 66(2), 249-288.\n", "- Angrist, J. D., & Krueger, A. B. (1999). Empirical strategies in labor economics. Handbook of labor economics, 3, 1277-1366.\n", "- Aronow, P. M., & Samii, C. (2016). Does regression produce representative estimates of causal effects?. American Journal of Political Science, 60(1), 250-267.\n" ] } ], "metadata": { "kernelspec": { "display_name": "metrics", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 2 }