{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Python: Panel Data Introduction\n", "\n", "In this example, we replicate the results from the guide [Getting Started with the did Package](https://bcallaway11.github.io/did/articles/did-basics.html) of the [did-R-package](https://bcallaway11.github.io/did/index.html).\n", "\n", "As the [did-R-package](https://bcallaway11.github.io/did/index.html) the implementation of [DoubleML](https://docs.doubleml.org/stable/index.html) is based on [Callaway and Sant'Anna(2021)](https://doi.org/10.1016/j.jeconom.2020.12.001).\n", "\n", "The notebook requires the following packages:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "from sklearn.linear_model import LinearRegression, LogisticRegression\n", "\n", "from doubleml.data import DoubleMLPanelData\n", "from doubleml.did import DoubleMLDIDMulti" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data\n", "\n", "The data we will use is simulated and part of the [CSDID-Python-Package](https://d2cml-ai.github.io/csdid/index.html).\n", "\n", "A description of the data generating process can be found at the [CSDID-documentation](https://d2cml-ai.github.io/csdid/examples/csdid_basic.html#Examples-with-simulated-data).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dta = pd.read_csv(\"https://raw.githubusercontent.com/d2cml-ai/csdid/main/data/sim_data.csv\")\n", "dta.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To work with the [DoubleML-package](https://docs.doubleml.org/stable/index.html), we initialize a ``DoubleMLPanelData`` object.\n", "\n", "Therefore, we set the *never-treated* units in group column `G` to `np.inf` (we have to change the datatype to `float`)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# set dtype for G to float\n", "dta[\"G\"] = dta[\"G\"].astype(float)\n", "dta.loc[dta[\"G\"] == 0, \"G\"] = np.inf\n", "dta.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we can initialize the ``DoubleMLPanelData`` object, specifying\n", "\n", " - `y_col` : the outcome\n", " - `d_cols`: the group variable indicating the first treated period for each unit\n", " - `id_col`: the unique identification column for each unit\n", " - `t_col` : the time column\n", " - `x_cols`: the additional pre-treatment controls\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dml_data = DoubleMLPanelData(\n", " data=dta,\n", " y_col=\"Y\",\n", " d_cols=\"G\",\n", " id_col=\"id\",\n", " t_col=\"period\",\n", " x_cols=[\"X\"]\n", ")\n", "print(dml_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ATT Estimation\n", "\n", "The [DoubleML-package](https://docs.doubleml.org/stable/index.html) implements estimation of group-time average treatment effect via the `DoubleMLDIDMulti` class (see [model documentation](https://docs.doubleml.org/stable/guide/models.html#difference-in-differences-models-did)).\n", "\n", "The class basically behaves like other `DoubleML` classes and requires the specification of two learners (for more details on the regression elements, see [score documentation](https://docs.doubleml.org/stable/guide/scores.html#difference-in-differences-models)). The model will be estimated using the `fit()` method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dml_obj = DoubleMLDIDMulti(\n", " obj_dml_data=dml_data,\n", " ml_g=LinearRegression(),\n", " ml_m=LogisticRegression(),\n", " control_group=\"never_treated\",\n", ")\n", "\n", "dml_obj.fit()\n", "print(dml_obj)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The summary displays estimates of the $ATT(g,t_\\text{eval})$ effects for different combinations of $(g,t_\\text{eval})$ via $\\widehat{ATT}(\\mathrm{g},t_\\text{pre},t_\\text{eval})$, where\n", " - $\\mathrm{g}$ specifies the group\n", " - $t_\\text{pre}$ specifies the corresponding pre-treatment period\n", " - $t_\\text{eval}$ specifies the evaluation period\n", "\n", "This corresponds to the estimates given in `att_gt` function in the [did-R-package](https://bcallaway11.github.io/did/index.html), where the standard choice is $t_\\text{pre} = \\min(\\mathrm{g}, t_\\text{eval}) - 1$ (without anticipation).\n", "\n", "Remark that this includes pre-tests effects if $\\mathrm{g} > t_{eval}$, e.g. $ATT(4,2)$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As usual for the DoubleML-package, you can obtain joint confidence intervals via bootstrap." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "level = 0.95\n", "\n", "ci = dml_obj.confint(level=level)\n", "dml_obj.bootstrap(n_rep_boot=5000)\n", "ci_joint = dml_obj.confint(level=level, joint=True)\n", "ci_joint" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A visualization of the effects can be obtained via the `plot_effects()` method.\n", "\n", "Remark that the plot used joint confidence intervals per default. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "nbsphinx-thumbnail" ] }, "outputs": [], "source": [ "fig, ax = dml_obj.plot_effects()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Effect Aggregation\n", "\n", "As the [did-R-package](https://bcallaway11.github.io/did/index.html), the $ATT$'s can be aggregated to summarize multiple effects.\n", "For details on different aggregations and details on their interpretations see [Callaway and Sant'Anna(2021)](https://doi.org/10.1016/j.jeconom.2020.12.001).\n", "\n", "The aggregations are implemented via the `aggregate()` method." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Group Aggregation\n", "\n", "To obtain group-specific effects it is possible to aggregate several $\\widehat{ATT}(\\mathrm{g},t_\\text{pre},t_\\text{eval})$ values based on the group $\\mathrm{g}$ by setting the `aggregation=\"group\"` argument." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "aggregated = dml_obj.aggregate(aggregation=\"group\")\n", "print(aggregated)\n", "_ = aggregated.plot_effects()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The output is a `DoubleMLDIDAggregation` object which includes an overall aggregation summary based on group size." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Time Aggregation\n", "\n", "This aggregates $\\widehat{ATT}(\\mathrm{g},t_\\text{pre},t_\\text{eval})$, based on $t_\\text{eval}$, but weighted with respect to group size. Corresponds to *Calendar Time Effects* from the [did-R-package](https://bcallaway11.github.io/did/index.html).\n", "\n", "For calendar time effects set `aggregation=\"time\"`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "aggregated_time = dml_obj.aggregate(\"time\")\n", "print(aggregated_time)\n", "fig, ax = aggregated_time.plot_effects()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Event Study Aggregation\n", "\n", "Finally, `aggregation=\"eventstudy\"` aggregates $\\widehat{ATT}(\\mathrm{g},t_\\text{pre},t_\\text{eval})$ based on exposure time $e = t_\\text{eval} - \\mathrm{g}$ (respecting group size)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "aggregated_eventstudy = dml_obj.aggregate(\"eventstudy\")\n", "print(aggregated_eventstudy)\n", "fig, ax = aggregated_eventstudy.plot_effects()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Aggregation Details\n", "\n", "The `DoubleMLDIDAggregation` objects include several `DoubleMLFrameworks` which support methods like `bootstrap()` or `confint()`.\n", "Further, the weights can be accessed via the properties\n", "\n", " - ``overall_aggregation_weights``: weights for the overall aggregation\n", " - ``aggregation_weights``: weights for the aggregation\n", "\n", "To clarify, e.g. for the eventstudy aggregation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(aggregated_eventstudy)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, the overall effect aggregation aggregates each effect with positive exposure" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(aggregated_eventstudy.overall_aggregation_weights)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If one would like to consider how the aggregated effect with $e=0$ is computed, one would have to look at the third set of weights within the ``aggregation_weights`` property" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "aggregated_eventstudy.aggregation_weights[2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Taking a look at the original `dml_obj`, one can see that this combines the following estimates:\n", "\n", " - $\\widehat{ATT}(2,1,2)$\n", " - $\\widehat{ATT}(3,2,3)$\n", " - $\\widehat{ATT}(4,3,4)$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(dml_obj.summary)" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.3" } }, "nbformat": 4, "nbformat_minor": 2 }