{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Python: GATE Sensitivity Analysis\n", "\n", "In this simple example, we illustrate how the [DoubleML](https://docs.doubleml.org/stable/index.html) package can be used to perfrom a sensitivity analysis for group average treatment effects in the [DoubleMLIRM](https://docs.doubleml.org/stable/guide/models.html#interactive-regression-model-irm) model.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data\n", "\n", "For illustration purposes, we will work with generated data where the true individual treatment effects are known, to access the performance of the effect estimates." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import doubleml as dml\n", "\n", "from doubleml.datasets import make_heterogeneous_data\n", "from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will add an independent feature `V`, which is treated as a confounder in the analysis. This is an simple option to test the robustness of the effect estimates against this \"confounder\" and evaluate the sensitivity analysis." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n_obs = 1000\n", "p = 5\n", "\n", "np.random.seed(42)\n", "data_dict = make_heterogeneous_data(n_obs, p, binary_treatment=True, n_x=2)\n", "data = data_dict['data']\n", "\n", "# add random covariate\n", "data['V'] = np.random.normal(size=(n_obs, 1))\n", "ite = data_dict['effects']\n", "\n", "print(\"Average Treatment Effect: {:.2f}\".format(np.mean(ite)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As mentioned before, the independent feature `'V'`, will be added to the ``DoubleMLData`` object as a possible confounder." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dml_data = dml.DoubleMLData(data, 'y', 'd')\n", "print(dml_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ATE Estimation and Sensitivity\n", "\n", "At first start with the estimation of the ATE and perform a sensitivity analysis. Throughout this example, we will rely on random forest for nuisance estimation. Further, we use $5$ repetitions to increase stability of the results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n_rep = 5\n", "\n", "ml_g = RandomForestRegressor(n_estimators=100, min_samples_leaf=20, random_state=42)\n", "ml_m = RandomForestClassifier(n_estimators=100, min_samples_leaf=20, random_state=42)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set the weights to be explicitly `None` and fit the model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dml_irm_ate = dml.DoubleMLIRM(\n", " dml_data,\n", " ml_g,\n", " ml_m,\n", " n_rep=n_rep,\n", " weights=None)\n", "dml_irm_ate.fit()\n", "print(dml_irm_ate.summary)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The results seem very robust (also because the effect size is very large)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dml_irm_ate.sensitivity_analysis()\n", "print(dml_irm_ate.sensitivity_summary)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Benchmarking the covariate `'V'`, reveals very low confounding (which is expected, since the covariate `'V'` is independent of the treatment and outcome)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "benchmarking_variable = 'V'\n", "print(dml_irm_ate.sensitivity_benchmark(benchmarking_set=[benchmarking_variable]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## GATE Estimation and Sensitivity\n", "\n", "Next, we will estimate a GATE and perform a sensitivity analysis. Here, we base the group $G$ on feature `'X_0'`\n", "\n", "$$1\\{X\\in G\\} = 1\\{X_0 \\ge 0.5\\},$$\n", "\n", "as the treatment effect is heterogeneous in this feature." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "group = data['X_0'] >= 0.5\n", "\n", "true_group_effect = ite[group].mean()\n", "print(\"True group effect: {:.2f}\".format(true_group_effect))\n", "print(\"Group vector:\\n{}\".format(group[:4]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The correct weights, to identify the GATE, have to still be normalized by the group probabilites, such that\n", "\n", "$$\\omega(X) = \\frac{1\\{X_0 \\ge 0.5\\}}{P(X_0 \\ge 0.5)}.$$\n", "\n", "Since the weights only depend on the features $X$, we can supply them as a numpy vector." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "weights = group.to_numpy() / group.mean()\n", "print(\"Weights:\\n{}\".format(weights[:4]))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dml_irm_gate = dml.DoubleMLIRM(\n", " dml_data,\n", " ml_g,\n", " ml_m,\n", " n_rep=n_rep,\n", " weights=weights)\n", "dml_irm_gate.fit()\n", "print(dml_irm_gate.summary)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, we can repeat the sensitivity analysis for the GATE. The results are very similar to the ATE case (with slightly smaller robustness values)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dml_irm_gate.sensitivity_analysis()\n", "print(dml_irm_gate.sensitivity_summary)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Still, the benchmarking shows little evidence for confounding via the covariate `'V'`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "benchmarking_variable = 'V'\n", "print(dml_irm_gate.sensitivity_benchmark(benchmarking_set=[benchmarking_variable]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## GATET Estimation and Sensitivity\n", "\n", "Finally, we will estimate a group average treatment effect on the treated (GATET) and perform a sensitivity analysis. Here, we will use the same group as the previous section.\n", "\n", "Instead of considering the effect for all units with\n", "$X\\in G$, the GATET only refers the effect for the treated units such that $X\\in G$ and $D=1$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "group = data['X_0'] >= 0.5\n", "group_treated = group & (data['d'] == 1)\n", "\n", "true_gatet_effect = ite[group_treated].mean()\n", "print(\"True GATET effect: {:.2f}\".format(true_gatet_effect))\n", "print(\"Group vector for treated units:\\n{}\".format(group_treated[:4]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As for GATEs, the correct weights, to identify the GATET, have to still be normalized by the group probabilites, such that\n", "\n", "$$\\omega(D,X) = \\frac{D\\cdot1\\{X_0 \\ge 0.5\\}}{P(D=1, X_0 \\ge 0.5)}.$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As mentioned in the [User Guide](https://docs.doubleml.org/stable/guide/guide.html), the estimation and sensitivity analysis also relies on the conditional expectation of the weights \n", "$$\\bar{\\omega}(D,X) = \\mathbb{E}[\\omega(D,X)|X].$$\n", "\n", "Since for GATETs, the weights depend on the treatment $D$, the conditional expectation $\\bar{\\omega}(D,X)$ differs from $\\omega(D, X)$.\n", "\n", "Due to the form of the weights, the conditional expectation $\\bar{\\omega}(D,X)$ reduces to\n", "\n", "$$\\bar{\\omega}(D,X) = \\frac{\\mathbb{E}[D|X]\\cdot 1\\{X_0 \\ge 0.5\\}}{P(D=1, X_0 \\ge 0.5)} = \\frac{m_0(X)\\cdot 1\\{X_0 \\ge 0.65\\}}{P(D=1, X_0 \\ge 0.5)},$$\n", "which can be estimated by plugging in the estimated propensity score $\\hat{m}(X)$.\n", "\n", "All the previous steps of estimation are performed automatically by the `DoubleMLIRM` class if the score is set to `'ATTE'`.\n", "\n", "**Remark**: This requires the `weights` argument to be binary and refer to the group indicator $1\\{X\\in G\\} = 1\\{X_0 \\ge 0.5\\}$ not the actual group of treated individuals $1\\{D = 1, X\\in G\\} = D\\cdot 1\\{X\\in G\\} $.\n", "Further, the normalization by the group probabilities is then performed automatically by the `DoubleMLIRM` class.\n", "\n", "Consequently, we can just set `weights` to the group indicator $1\\{X\\in G\\}$ and fit the model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "group_indicator = group.to_numpy()\n", "print(\"Group indicator:\\n{}\".format(group_indicator[:4]))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dml_irm_gatet = dml.DoubleMLIRM(\n", " dml_data,\n", " ml_g,\n", " ml_m,\n", " n_rep=n_rep,\n", " score='ATTE',\n", " weights=group_indicator)\n", "dml_irm_gatet.fit()\n", "print(dml_irm_gatet.summary)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dml_irm_gatet.sensitivity_analysis()\n", "print(dml_irm_gatet.sensitivity_summary)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As for general sensitivity analysis, contour plots can be used to visualize the results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dml_irm_gatet.sensitivity_plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, the benchmarking of \"confounding\" feature `'V'` shows little evidence for confounding." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "benchmarking_variable = 'V'\n", "print(dml_irm_gatet.sensitivity_benchmark(benchmarking_set=[benchmarking_variable]))" ] } ], "metadata": { "kernelspec": { "display_name": "dml_dev", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.2" } }, "nbformat": 4, "nbformat_minor": 2 }