{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Python: Panel Data with Multiple Time Periods\n", "\n", "In this example, a detailed guide on Difference-in-Differences with multiple time periods using the [DoubleML-package](https://docs.doubleml.org/stable/index.html). The implementation is based on [Callaway and Sant'Anna(2021)](https://doi.org/10.1016/j.jeconom.2020.12.001).\n", "\n", "The notebook requires the following packages:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import numpy as np\n", "\n", "from lightgbm import LGBMRegressor, LGBMClassifier\n", "from sklearn.linear_model import LinearRegression, LogisticRegression\n", "\n", "from doubleml.did import DoubleMLDIDMulti\n", "from doubleml.data import DoubleMLPanelData\n", "\n", "from doubleml.did.datasets import make_did_CS2021" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data\n", "\n", "We will rely on the `make_did_CS2021` DGP, which is inspired by [Callaway and Sant'Anna(2021)](https://doi.org/10.1016/j.jeconom.2020.12.001) (Appendix SC) and [Sant'Anna and Zhao (2020)](https://doi.org/10.1016/j.jeconom.2020.06.003).\n", "\n", "We will observe `n_obs` units over `n_periods`. Remark that the dataframe includes observations of the potential outcomes `y0` and `y1`, such that we can use oracle estimates as comparisons. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n_obs = 5000\n", "n_periods = 6\n", "\n", "df = make_did_CS2021(n_obs, dgp_type=4, n_periods=n_periods, n_pre_treat_periods=3, time_type=\"datetime\")\n", "df[\"ite\"] = df[\"y1\"] - df[\"y0\"]\n", "\n", "print(df.shape)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Details\n", "\n", "*Here, we slightly abuse the definition of the potential outcomes. $Y_{i,t}(1)$ corresponds to the (potential) outcome if unit $i$ would have received treatment at time period $\\mathrm{g}$ (where the group $\\mathrm{g}$ is drawn with probabilities based on $Z$).*\n", "\n", "More specifically\n", "\n", "$$\n", "\\begin{align*}\n", "Y_{i,t}(0)&:= f_t(Z) + \\delta_t + \\eta_i + \\varepsilon_{i,t,0}\\\\\n", "Y_{i,t}(1)&:= Y_{i,t}(0) + \\theta_{i,t,\\mathrm{g}} + \\epsilon_{i,t,1} - \\epsilon_{i,t,0}\n", "\\end{align*}\n", "$$\n", "\n", "where\n", " - $f_t(Z)$ depends on pre-treatment observable covariates $Z_1,\\dots, Z_4$ and time $t$\n", " - $\\delta_t$ is a time fixed effect\n", " - $\\eta_i$ is a unit fixed effect\n", " - $\\epsilon_{i,t,\\cdot}$ are time varying unobservables (iid. $N(0,1)$)\n", " - $\\theta_{i,t,\\mathrm{g}}$ correponds to the exposure effect of unit $i$ based on group $\\mathrm{g}$ at time $t$\n", "\n", "For the pre-treatment periods the exposure effect is set to\n", "$$\n", "\\theta_{i,t,\\mathrm{g}}:= 0 \\text{ for } t<\\mathrm{g}\n", "$$\n", "such that \n", "\n", "$$\n", "\\mathbb{E}[Y_{i,t}(1) - Y_{i,t}(0)] = \\mathbb{E}[\\epsilon_{i,t,1} - \\epsilon_{i,t,0}]=0 \\text{ for } t<\\mathrm{g}\n", "$$\n", "\n", "The [DoubleML Coverage Repository](https://docs.doubleml.org/doubleml-coverage/) includes coverage simulations based on this DGP." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Description" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data is a balanced panel where each unit is observed over `n_periods` starting Janary 2025." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.groupby(\"t\").size()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The treatment column `d` indicates first treatment period of the corresponding unit, whereas `NaT` units are never treated.\n", "\n", "*Generally, never treated units should take either on the value `np.inf` or `pd.NaT` depending on the data type (`float` or `datetime`).*\n", "\n", "The individual units are roughly uniformly divided between the groups, where treatment assignment depends on the pre-treatment covariates `Z1` to `Z4`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.groupby(\"d\", dropna=False).size()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, the group indicates the first treated period and `NaT` units are never treated. To simplify plotting and pands" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.groupby(\"d\", dropna=False).size()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get a better understanding of the underlying data and true effects, we will compare the unconditional averages and the true effects based on the oracle values of individual effects `ite`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# rename for plotting\n", "df[\"First Treated\"] = df[\"d\"].dt.strftime(\"%Y-%m\").fillna(\"Never Treated\")\n", "\n", "# Create aggregation dictionary for means\n", "def agg_dict(col_name):\n", " return {\n", " f'{col_name}_mean': (col_name, 'mean'),\n", " f'{col_name}_lower_quantile': (col_name, lambda x: x.quantile(0.05)),\n", " f'{col_name}_upper_quantile': (col_name, lambda x: x.quantile(0.95))\n", " }\n", "\n", "# Calculate means and confidence intervals\n", "agg_dictionary = agg_dict(\"y\") | agg_dict(\"ite\")\n", "\n", "agg_df = df.groupby([\"t\", \"First Treated\"]).agg(**agg_dictionary).reset_index()\n", "agg_df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def plot_data(df, col_name='y'):\n", " \"\"\"\n", " Create an improved plot with colorblind-friendly features\n", " \n", " Parameters:\n", " -----------\n", " df : DataFrame\n", " The dataframe containing the data\n", " col_name : str, default='y'\n", " Column name to plot (will use '{col_name}_mean')\n", " \"\"\"\n", " plt.figure(figsize=(12, 7))\n", " n_colors = df[\"First Treated\"].nunique()\n", " color_palette = sns.color_palette(\"colorblind\", n_colors=n_colors)\n", "\n", " sns.lineplot(\n", " data=df,\n", " x='t',\n", " y=f'{col_name}_mean',\n", " hue='First Treated',\n", " style='First Treated',\n", " palette=color_palette,\n", " markers=True,\n", " dashes=True,\n", " linewidth=2.5,\n", " alpha=0.8\n", " )\n", " \n", " plt.title(f'Average Values {col_name} by Group Over Time', fontsize=16)\n", " plt.xlabel('Time', fontsize=14)\n", " plt.ylabel(f'Average Value {col_name}', fontsize=14)\n", " \n", "\n", " plt.legend(title='First Treated', title_fontsize=13, fontsize=12, \n", " frameon=True, framealpha=0.9, loc='best')\n", " \n", " plt.grid(alpha=0.3, linestyle='-')\n", " plt.tight_layout()\n", "\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So let us take a look at the average values over time" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_data(agg_df, col_name='y')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instead the true average treatment treatment effects can be obtained by averaging (usually unobserved) the `ite` values.\n", "\n", "The true effect just equals the exposure time (in months):\n", "\n", "$$\n", "ATT(\\mathrm{g}, t) = \\min(\\mathrm{t} - \\mathrm{g} + 1, 0) =: e\n", "$$\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "source_hidden": true } }, "outputs": [], "source": [ "plot_data(agg_df, col_name='ite')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### DoubleMLPanelData\n", "\n", "Finally, we can construct our `DoubleMLPanelData`, specifying\n", "\n", " - `y_col` : the outcome\n", " - `d_cols`: the group variable indicating the first treated period for each unit\n", " - `id_col`: the unique identification column for each unit\n", " - `t_col` : the time column\n", " - `x_cols`: the additional pre-treatment controls\n", " - `datetime_unit`: unit required for `datetime` columns and plotting" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dml_data = DoubleMLPanelData(\n", " data=df,\n", " y_col=\"y\",\n", " d_cols=\"d\",\n", " id_col=\"id\",\n", " t_col=\"t\",\n", " x_cols=[\"Z1\", \"Z2\", \"Z3\", \"Z4\"],\n", " datetime_unit=\"M\"\n", ")\n", "print(dml_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ATT Estimation\n", "\n", "The [DoubleML-package](https://docs.doubleml.org/stable/index.html) implements estimation of group-time average treatment effect via the `DoubleMLDIDMulti` class (see [model documentation](https://docs.doubleml.org/stable/guide/models.html#difference-in-differences-models-did))." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Basics\n", "\n", "The class basically behaves like other `DoubleML` classes and requires the specification of two learners (for more details on the regression elements, see [score documentation](https://docs.doubleml.org/stable/guide/scores.html#difference-in-differences-models)).\n", "\n", "The basic arguments of a `DoubleMLDIDMulti` object include\n", "\n", " - `ml_g` \"outcome\" regression learner\n", " - `ml_m` propensity Score learner\n", " - `control_group` the control group for the parallel trend assumption\n", " - `gt_combinations` combinations of $(\\mathrm{g},t_\\text{pre}, t_\\text{eval})$\n", " - `anticipation_periods` number of anticipation periods\n", "\n", "We will construct a `dict` with \"default\" arguments." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "default_args = {\n", " \"ml_g\": LGBMRegressor(n_estimators=500, learning_rate=0.01, verbose=-1, random_state=123),\n", " \"ml_m\": LGBMClassifier(n_estimators=500, learning_rate=0.01, verbose=-1, random_state=123),\n", " \"control_group\": \"never_treated\",\n", " \"gt_combinations\": \"standard\",\n", " \"anticipation_periods\": 0,\n", " \"n_folds\": 5,\n", " \"n_rep\": 1,\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " The model will be estimated using the `fit()` method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dml_obj = DoubleMLDIDMulti(dml_data, **default_args)\n", "dml_obj.fit()\n", "print(dml_obj)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The summary displays estimates of the $ATT(g,t_\\text{eval})$ effects for different combinations of $(g,t_\\text{eval})$ via $\\widehat{ATT}(\\mathrm{g},t_\\text{pre},t_\\text{eval})$, where\n", " - $\\mathrm{g}$ specifies the group\n", " - $t_\\text{pre}$ specifies the corresponding pre-treatment period\n", " - $t_\\text{eval}$ specifies the evaluation period\n", "\n", "The choice `gt_combinations=\"standard\"`, used estimates all possible combinations of $ATT(g,t_\\text{eval})$ via $\\widehat{ATT}(\\mathrm{g},t_\\text{pre},t_\\text{eval})$,\n", "where the standard choice is $t_\\text{pre} = \\min(\\mathrm{g}, t_\\text{eval}) - 1$ (without anticipation).\n", "\n", "Remark that this includes pre-tests effects if $\\mathrm{g} > t_{eval}$, e.g. $\\widehat{ATT}(g=\\text{2025-04}, t_{\\text{pre}}=\\text{2025-01}, t_{\\text{eval}}=\\text{2025-02})$ which estimates the pre-trend from January to February even if the actual treatment occured in April." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As usual for the DoubleML-package, you can obtain joint confidence intervals via bootstrap." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "level = 0.95\n", "\n", "ci = dml_obj.confint(level=level)\n", "dml_obj.bootstrap(n_rep_boot=5000)\n", "ci_joint = dml_obj.confint(level=level, joint=True)\n", "ci_joint" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A visualization of the effects can be obtained via the `plot_effects()` method.\n", "\n", "Remark that the plot used joint confidence intervals per default. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "nbsphinx-thumbnail" ] }, "outputs": [], "source": [ "dml_obj.plot_effects()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sensitivity Analysis\n", "\n", "As descripted in the [Sensitivity Guide](https://docs.doubleml.org/stable/guide/sensitivity.html), robustness checks on omitted confounding/parallel trend violations are available, via the standard `sensitivity_analysis()` method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dml_obj.sensitivity_analysis()\n", "print(dml_obj.sensitivity_summary)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example one can clearly, distinguish the robustness of the non-zero effects vs. the pre-treatment periods." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Control Groups\n", "\n", "The current implementation support the following control groups\n", "\n", " - ``\"never_treated\"``\n", " - ``\"not_yet_treated\"``\n", "\n", "Remark that the ``\"not_yet_treated\" depends on anticipation.\n", "\n", "For differences and recommendations, we refer to [Callaway and Sant'Anna(2021)](https://doi.org/10.1016/j.jeconom.2020.12.001)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dml_obj_nyt = DoubleMLDIDMulti(dml_data, **(default_args | {\"control_group\": \"not_yet_treated\"}))\n", "dml_obj_nyt.fit()\n", "dml_obj_nyt.bootstrap(n_rep_boot=5000)\n", "dml_obj_nyt.plot_effects()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Linear Covariate Adjustment\n", "\n", "Remark that we relied on boosted trees to adjust for conditional parallel trends which allow for a nonlinear adjustment. In comparison to linear adjustment, we could rely on linear learners.\n", "\n", "**Remark that the DGP (`dgp_type=4`) is based on nonlinear conditional expectations such that the estimates will be biased**\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "linear_learners = {\n", " \"ml_g\": LinearRegression(),\n", " \"ml_m\": LogisticRegression(),\n", "}\n", "\n", "dml_obj_linear = DoubleMLDIDMulti(dml_data, **(default_args | linear_learners))\n", "dml_obj_linear.fit()\n", "dml_obj_linear.bootstrap(n_rep_boot=5000)\n", "dml_obj_linear.plot_effects()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Aggregated Effects\n", "As the [did-R-package](https://bcallaway11.github.io/did/index.html), the $ATT$'s can be aggregated to summarize multiple effects.\n", "For details on different aggregations and details on their interpretations see [Callaway and Sant'Anna(2021)](https://doi.org/10.1016/j.jeconom.2020.12.001).\n", "\n", "The aggregations are implemented via the `aggregate()` method." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Group Aggregation\n", "\n", "\n", "To obtain group-specific effects one can would like to average $ATT(\\mathrm{g}, t_\\text{eval})$ over $t_\\text{eval}$.\n", "As a sample oracle we will combine all `ite`'s based on group $\\mathrm{g}$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_post_treatment = df[df[\"t\"] >= df[\"d\"]]\n", "df_post_treatment.groupby(\"d\")[\"ite\"].mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To obtain group-specific effects it is possible to aggregate several $\\widehat{ATT}(\\mathrm{g},t_\\text{pre},t_\\text{eval})$ values based on the group $\\mathrm{g}$ by setting the `aggregation=\"group\"` argument." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "aggregated_group = dml_obj.aggregate(aggregation=\"group\")\n", "print(aggregated_group)\n", "_ = aggregated_group.plot_effects()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The output is a `DoubleMLDIDAggregation` object which includes an overall aggregation summary based on group size." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Time Aggregation\n", "\n", "To obtain time-specific effects one can would like to average $ATT(\\mathrm{g}, t_\\text{eval})$ over $\\mathrm{g}$ (respecting group size).\n", "As a sample oracle we will combine all `ite`'s based on group $\\mathrm{g}$. As oracle values, we obtain" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_post_treatment.groupby(\"t\")[\"ite\"].mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To aggregate $\\widehat{ATT}(\\mathrm{g},t_\\text{pre},t_\\text{eval})$, based on $t_\\text{eval}$, but weighted with respect to group size. Corresponds to *Calendar Time Effects* from the [did-R-package](https://bcallaway11.github.io/did/index.html).\n", "\n", "For calendar time effects set `aggregation=\"time\"`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "aggregated_time = dml_obj.aggregate(\"time\")\n", "print(aggregated_time)\n", "fig, ax = aggregated_time.plot_effects()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Event Study Aggregation\n", "\n", "To obtain event-study-type effects one can would like to aggregate $ATT(\\mathrm{g}, t_\\text{eval})$ over $e = t_\\text{eval} - \\mathrm{g}$ (respecting group size).\n", "As a sample oracle we will combine all `ite`'s based on group $\\mathrm{g}$. As oracle values, we obtain" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df[\"e\"] = pd.to_datetime(df[\"t\"]).values.astype(\"datetime64[M]\") - \\\n", " pd.to_datetime(df[\"d\"]).values.astype(\"datetime64[M]\")\n", "df.groupby(\"e\")[\"ite\"].mean()[1:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Analogously, `aggregation=\"eventstudy\"` aggregates $\\widehat{ATT}(\\mathrm{g},t_\\text{pre},t_\\text{eval})$ based on exposure time $e = t_\\text{eval} - \\mathrm{g}$ (respecting group size)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "aggregated_eventstudy = dml_obj.aggregate(\"eventstudy\")\n", "print(aggregated_eventstudy)\n", "aggregated_eventstudy.plot_effects()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Aggregation Details\n", "\n", "The `DoubleMLDIDAggregation` objects include several `DoubleMLFrameworks` which support methods like `bootstrap()` or `confint()`.\n", "Further, the weights can be accessed via the properties\n", "\n", " - ``overall_aggregation_weights``: weights for the overall aggregation\n", " - ``aggregation_weights``: weights for the aggregation\n", "\n", "To clarify, e.g. for the eventstudy aggregation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(aggregated_eventstudy)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, the overall effect aggregation aggregates each effect with positive exposure" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(aggregated_eventstudy.overall_aggregation_weights)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If one would like to consider how the aggregated effect with $e=0$ is computed, one would have to look at the corresponding set of weights within the ``aggregation_weights`` property" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# the weights for e=0 correspond to the fifth element of the aggregation weights\n", "aggregated_eventstudy.aggregation_weights[4]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Taking a look at the original `dml_obj`, one can see that this combines the following estimates (only show month):\n", "\n", " - $\\widehat{ATT}(04,03,04)$\n", " - $\\widehat{ATT}(05,04,05)$\n", " - $\\widehat{ATT}(06,05,06)$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(dml_obj.summary[\"coef\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Anticipation\n", "\n", "As described in the [Model Guide](https://docs.doubleml.org/stable/guide/models.html#difference-in-differences-models-did), one can include anticipation periods $\\delta>0$ by setting the `anticipation_periods` parameter." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data with Anticipation\n", "\n", "The DGP allows to include anticipation periods via the `anticipation_periods` parameter.\n", "In this case the observations will be \"shifted\" such that units anticipate the effect earlier and the exposure effect is increased by the number of periods where the effect is anticipated." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n_obs = 4000\n", "n_periods = 6\n", "\n", "df_anticipation = make_did_CS2021(n_obs, dgp_type=4, n_periods=n_periods, n_pre_treat_periods=3, time_type=\"datetime\", anticipation_periods=1)\n", "\n", "print(df_anticipation.shape)\n", "df_anticipation.head()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To visualize the anticipation, we will again plot the \"oracle\" values" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_anticipation[\"ite\"] = df_anticipation[\"y1\"] - df_anticipation[\"y0\"]\n", "df_anticipation[\"First Treated\"] = df_anticipation[\"d\"].dt.strftime(\"%Y-%m\").fillna(\"Never Treated\")\n", "agg_df_anticipation = df_anticipation.groupby([\"t\", \"First Treated\"]).agg(**agg_dictionary).reset_index()\n", "agg_df_anticipation.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One can see that the effect is already anticipated one period before the actual treatment assignment." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_data(agg_df_anticipation, col_name='ite')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Initialize a corresponding `DoubleMLPanelData` object." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dml_data_anticipation = DoubleMLPanelData(\n", " data=df_anticipation,\n", " y_col=\"y\",\n", " d_cols=\"d\",\n", " id_col=\"id\",\n", " t_col=\"t\",\n", " x_cols=[\"Z1\", \"Z2\", \"Z3\", \"Z4\"],\n", " datetime_unit=\"M\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ATT Estimation\n", "\n", "Let us take a look at the estimation without anticipation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dml_obj_anticipation = DoubleMLDIDMulti(dml_data_anticipation, **default_args)\n", "dml_obj_anticipation.fit()\n", "dml_obj_anticipation.bootstrap(n_rep_boot=5000)\n", "dml_obj_anticipation.plot_effects()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The effects are obviously biased. To include anticipation periods, one can adjust the `anticipation_periods` parameter. Correspondingly, the outcome regression (and not yet treated units) are adjusted." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dml_obj_anticipation = DoubleMLDIDMulti(dml_data_anticipation, **(default_args| {\"anticipation_periods\": 1}))\n", "dml_obj_anticipation.fit()\n", "dml_obj_anticipation.bootstrap(n_rep_boot=5000)\n", "dml_obj_anticipation.plot_effects()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Group-Time Combinations\n", "\n", "The default option `gt_combinations=\"standard\"` includes all group time values with the specific choice of $t_\\text{pre} = \\min(\\mathrm{g}, t_\\text{eval}) - 1$ (without anticipation) which is the weakest possible parallel trend assumption.\n", "\n", "Other options are possible or only specific combinations of $(\\mathrm{g},t_\\text{pre},t_\\text{eval})$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### All combinations\n", "\n", "The option `gt_combinations=\"all\"` includes all relevant group time values with $t_\\text{pre} < \\min(\\mathrm{g}, t_\\text{eval})$, including longer parallel trend assumptions.\n", "This can result in multiple estimates for the same $ATT(\\mathrm{g},t)$, which have slightly different assumptions (length of parallel trends)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dml_obj_all = DoubleMLDIDMulti(dml_data, **(default_args| {\"gt_combinations\": \"all\"}))\n", "dml_obj_all.fit()\n", "dml_obj_all.bootstrap(n_rep_boot=5000)\n", "dml_obj_all.plot_effects()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Selected Combinations\n", "\n", "Instead it is also possible to just submit a list of tuples containing $(\\mathrm{g}, t_\\text{pre}, t_\\text{eval})$ combinations. E.g. only two combinations" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "gt_dict = {\n", " \"gt_combinations\": [\n", " (np.datetime64('2025-04'),\n", " np.datetime64('2025-01'),\n", " np.datetime64('2025-02')),\n", " (np.datetime64('2025-04'),\n", " np.datetime64('2025-02'),\n", " np.datetime64('2025-03')),\n", " ]\n", "}\n", "\n", "dml_obj_all = DoubleMLDIDMulti(dml_data, **(default_args| gt_dict))\n", "dml_obj_all.fit()\n", "dml_obj_all.bootstrap(n_rep_boot=5000)\n", "dml_obj_all.plot_effects()" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.3" } }, "nbformat": 4, "nbformat_minor": 2 }