{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Python: Panel Data with Multiple Time Periods\n",
    "\n",
    "In this example, a detailed guide on Difference-in-Differences with multiple time periods using the [DoubleML-package](https://docs.doubleml.org/stable/index.html). The implementation is based on [Callaway and Sant'Anna(2021)](https://doi.org/10.1016/j.jeconom.2020.12.001).\n",
    "\n",
    "The notebook requires the following packages:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import seaborn as sns\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "from lightgbm import LGBMRegressor, LGBMClassifier\n",
    "from sklearn.linear_model import LinearRegression, LogisticRegression\n",
    "\n",
    "from doubleml.did import DoubleMLDIDMulti\n",
    "from doubleml.data import DoubleMLPanelData\n",
    "\n",
    "from doubleml.did.datasets import make_did_CS2021"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data\n",
    "\n",
    "We will rely on the `make_did_CS2021` DGP, which is inspired by [Callaway and Sant'Anna(2021)](https://doi.org/10.1016/j.jeconom.2020.12.001) (Appendix SC) and [Sant'Anna and Zhao (2020)](https://doi.org/10.1016/j.jeconom.2020.06.003).\n",
    "\n",
    "We will observe `n_obs` units over `n_periods`. Remark that the dataframe includes observations of the potential outcomes `y0` and `y1`, such that we can use oracle estimates as comparisons. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_obs = 5000\n",
    "n_periods = 6\n",
    "\n",
    "df = make_did_CS2021(n_obs, dgp_type=4, n_periods=n_periods, n_pre_treat_periods=3, time_type=\"datetime\")\n",
    "df[\"ite\"] = df[\"y1\"] - df[\"y0\"]\n",
    "\n",
    "print(df.shape)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Data Details\n",
    "\n",
    "*Here, we slightly abuse the definition of the potential outcomes. $Y_{i,t}(1)$ corresponds to the (potential) outcome if unit $i$ would have received treatment at time period $\\mathrm{g}$ (where the group $\\mathrm{g}$ is drawn with probabilities based on $Z$).*\n",
    "\n",
    "More specifically\n",
    "\n",
    "$$\n",
    "\\begin{align*}\n",
    "Y_{i,t}(0)&:= f_t(Z) + \\delta_t + \\eta_i + \\varepsilon_{i,t,0}\\\\\n",
    "Y_{i,t}(1)&:= Y_{i,t}(0) + \\theta_{i,t,\\mathrm{g}} + \\epsilon_{i,t,1} - \\epsilon_{i,t,0}\n",
    "\\end{align*}\n",
    "$$\n",
    "\n",
    "where\n",
    " - $f_t(Z)$ depends on pre-treatment observable covariates $Z_1,\\dots, Z_4$ and time $t$\n",
    " - $\\delta_t$ is a time fixed effect\n",
    " - $\\eta_i$ is a unit fixed effect\n",
    " - $\\epsilon_{i,t,\\cdot}$ are time varying unobservables (iid. $N(0,1)$)\n",
    " - $\\theta_{i,t,\\mathrm{g}}$ correponds to the exposure effect of unit $i$ based on group $\\mathrm{g}$ at time $t$\n",
    "\n",
    "For the pre-treatment periods the exposure effect is set to\n",
    "$$\n",
    "\\theta_{i,t,\\mathrm{g}}:= 0 \\text{ for } t<\\mathrm{g}\n",
    "$$\n",
    "such that \n",
    "\n",
    "$$\n",
    "\\mathbb{E}[Y_{i,t}(1) - Y_{i,t}(0)] = \\mathbb{E}[\\epsilon_{i,t,1} - \\epsilon_{i,t,0}]=0  \\text{ for } t<\\mathrm{g}\n",
    "$$\n",
    "\n",
    "The [DoubleML Coverage Repository](https://docs.doubleml.org/doubleml-coverage/) includes coverage simulations based on this DGP."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Data Description"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is a balanced panel where each unit is observed over `n_periods` starting Janary 2025."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.groupby(\"t\").size()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The treatment column `d` indicates first treatment period of the corresponding unit, whereas `NaT` units are never treated.\n",
    "\n",
    "*Generally, never treated units should take either on the value `np.inf` or `pd.NaT` depending on the data type (`float` or `datetime`).*\n",
    "\n",
    "The individual units are roughly uniformly divided between the groups, where treatment assignment depends on the pre-treatment covariates `Z1` to `Z4`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.groupby(\"d\", dropna=False).size()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, the group indicates the first treated period and `NaT` units are never treated. To simplify plotting and pands"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.groupby(\"d\", dropna=False).size()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To get a better understanding of the underlying data and true effects, we will compare the unconditional averages and the true effects based on the oracle values of individual effects `ite`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# rename for plotting\n",
    "df[\"First Treated\"] = df[\"d\"].dt.strftime(\"%Y-%m\").fillna(\"Never Treated\")\n",
    "\n",
    "# Create aggregation dictionary for means\n",
    "def agg_dict(col_name):\n",
    "    return {\n",
    "        f'{col_name}_mean': (col_name, 'mean'),\n",
    "        f'{col_name}_lower_quantile': (col_name, lambda x: x.quantile(0.05)),\n",
    "        f'{col_name}_upper_quantile': (col_name, lambda x: x.quantile(0.95))\n",
    "    }\n",
    "\n",
    "# Calculate means and confidence intervals\n",
    "agg_dictionary = agg_dict(\"y\") | agg_dict(\"ite\")\n",
    "\n",
    "agg_df = df.groupby([\"t\", \"First Treated\"]).agg(**agg_dictionary).reset_index()\n",
    "agg_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_data(df, col_name='y'):\n",
    "    \"\"\"\n",
    "    Create an improved plot with colorblind-friendly features\n",
    "    \n",
    "    Parameters:\n",
    "    -----------\n",
    "    df : DataFrame\n",
    "        The dataframe containing the data\n",
    "    col_name : str, default='y'\n",
    "        Column name to plot (will use '{col_name}_mean')\n",
    "    \"\"\"\n",
    "    plt.figure(figsize=(12, 7))\n",
    "    n_colors = df[\"First Treated\"].nunique()\n",
    "    color_palette = sns.color_palette(\"colorblind\", n_colors=n_colors)\n",
    "\n",
    "    sns.lineplot(\n",
    "        data=df,\n",
    "        x='t',\n",
    "        y=f'{col_name}_mean',\n",
    "        hue='First Treated',\n",
    "        style='First Treated',\n",
    "        palette=color_palette,\n",
    "        markers=True,\n",
    "        dashes=True,\n",
    "        linewidth=2.5,\n",
    "        alpha=0.8\n",
    "    )\n",
    "    \n",
    "    plt.title(f'Average Values {col_name} by Group Over Time', fontsize=16)\n",
    "    plt.xlabel('Time', fontsize=14)\n",
    "    plt.ylabel(f'Average Value {col_name}', fontsize=14)\n",
    "    \n",
    "\n",
    "    plt.legend(title='First Treated', title_fontsize=13, fontsize=12, \n",
    "               frameon=True, framealpha=0.9, loc='best')\n",
    "    \n",
    "    plt.grid(alpha=0.3, linestyle='-')\n",
    "    plt.tight_layout()\n",
    "\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So let us take a look at the average values over time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_data(agg_df, col_name='y')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Instead the true average treatment treatment effects can be obtained by averaging (usually unobserved) the `ite` values.\n",
    "\n",
    "The true effect just equals the exposure time (in months):\n",
    "\n",
    "$$\n",
    "ATT(\\mathrm{g}, t) = \\min(\\mathrm{t} - \\mathrm{g} + 1, 0) =: e\n",
    "$$\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "source_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "plot_data(agg_df, col_name='ite')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### DoubleMLPanelData\n",
    "\n",
    "Finally, we can construct our `DoubleMLPanelData`, specifying\n",
    "\n",
    " - `y_col` : the outcome\n",
    " - `d_cols`: the group variable indicating the first treated period for each unit\n",
    " - `id_col`: the unique identification column for each unit\n",
    " - `t_col` : the time column\n",
    " - `x_cols`: the additional pre-treatment controls\n",
    " - `datetime_unit`: unit required for `datetime` columns and plotting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dml_data = DoubleMLPanelData(\n",
    "    data=df,\n",
    "    y_col=\"y\",\n",
    "    d_cols=\"d\",\n",
    "    id_col=\"id\",\n",
    "    t_col=\"t\",\n",
    "    x_cols=[\"Z1\", \"Z2\", \"Z3\", \"Z4\"],\n",
    "    datetime_unit=\"M\"\n",
    ")\n",
    "print(dml_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## ATT Estimation\n",
    "\n",
    "The [DoubleML-package](https://docs.doubleml.org/stable/index.html) implements estimation of group-time average treatment effect via the `DoubleMLDIDMulti` class (see [model documentation](https://docs.doubleml.org/stable/guide/models.html#difference-in-differences-models-did))."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Basics\n",
    "\n",
    "The class basically behaves like other `DoubleML` classes and requires the specification of two learners (for more details on the regression elements, see [score documentation](https://docs.doubleml.org/stable/guide/scores.html#difference-in-differences-models)).\n",
    "\n",
    "The basic arguments of a `DoubleMLDIDMulti` object include\n",
    "\n",
    " - `ml_g` \"outcome\" regression learner\n",
    " - `ml_m` propensity Score learner\n",
    " - `control_group` the control group for the parallel trend assumption\n",
    " - `gt_combinations` combinations of $(\\mathrm{g},t_\\text{pre}, t_\\text{eval})$\n",
    " - `anticipation_periods` number of anticipation periods\n",
    "\n",
    "We will construct a `dict` with \"default\" arguments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "default_args = {\n",
    "    \"ml_g\": LGBMRegressor(n_estimators=500, learning_rate=0.01, verbose=-1, random_state=123),\n",
    "    \"ml_m\": LGBMClassifier(n_estimators=500, learning_rate=0.01, verbose=-1, random_state=123),\n",
    "    \"control_group\": \"never_treated\",\n",
    "    \"gt_combinations\": \"standard\",\n",
    "    \"anticipation_periods\": 0,\n",
    "    \"n_folds\": 5,\n",
    "    \"n_rep\": 1,\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " The model will be estimated using the `fit()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dml_obj = DoubleMLDIDMulti(dml_data, **default_args)\n",
    "dml_obj.fit()\n",
    "print(dml_obj)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The summary displays estimates of the $ATT(g,t_\\text{eval})$ effects for different combinations of $(g,t_\\text{eval})$ via $\\widehat{ATT}(\\mathrm{g},t_\\text{pre},t_\\text{eval})$, where\n",
    " - $\\mathrm{g}$ specifies the group\n",
    " - $t_\\text{pre}$ specifies the corresponding pre-treatment period\n",
    " - $t_\\text{eval}$ specifies the evaluation period\n",
    "\n",
    "The choice `gt_combinations=\"standard\"`, used estimates all possible combinations of $ATT(g,t_\\text{eval})$ via $\\widehat{ATT}(\\mathrm{g},t_\\text{pre},t_\\text{eval})$,\n",
    "where the standard choice is $t_\\text{pre} = \\min(\\mathrm{g}, t_\\text{eval}) - 1$ (without anticipation).\n",
    "\n",
    "Remark that this includes pre-tests effects if $\\mathrm{g} > t_{eval}$, e.g. $\\widehat{ATT}(g=\\text{2025-04}, t_{\\text{pre}}=\\text{2025-01}, t_{\\text{eval}}=\\text{2025-02})$ which estimates the pre-trend from January to February even if the actual treatment occured in April."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As usual for the DoubleML-package, you can obtain joint confidence intervals via bootstrap."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "level = 0.95\n",
    "\n",
    "ci = dml_obj.confint(level=level)\n",
    "dml_obj.bootstrap(n_rep_boot=5000)\n",
    "ci_joint = dml_obj.confint(level=level, joint=True)\n",
    "ci_joint"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A visualization of the effects can be obtained via the `plot_effects()` method.\n",
    "\n",
    "Remark that the plot used joint confidence intervals per default. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "nbsphinx-thumbnail"
    ]
   },
   "outputs": [],
   "source": [
    "dml_obj.plot_effects()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sensitivity Analysis\n",
    "\n",
    "As descripted in the [Sensitivity Guide](https://docs.doubleml.org/stable/guide/sensitivity.html), robustness checks on omitted confounding/parallel trend violations are available, via the standard `sensitivity_analysis()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dml_obj.sensitivity_analysis()\n",
    "print(dml_obj.sensitivity_summary)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this example one can clearly, distinguish the robustness of the non-zero effects vs. the pre-treatment periods."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Control Groups\n",
    "\n",
    "The current implementation support the following control groups\n",
    "\n",
    " - ``\"never_treated\"``\n",
    " - ``\"not_yet_treated\"``\n",
    "\n",
    "Remark that the ``\"not_yet_treated\" depends on anticipation.\n",
    "\n",
    "For differences and recommendations, we refer to [Callaway and Sant'Anna(2021)](https://doi.org/10.1016/j.jeconom.2020.12.001)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dml_obj_nyt = DoubleMLDIDMulti(dml_data, **(default_args | {\"control_group\": \"not_yet_treated\"}))\n",
    "dml_obj_nyt.fit()\n",
    "dml_obj_nyt.bootstrap(n_rep_boot=5000)\n",
    "dml_obj_nyt.plot_effects()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Linear Covariate Adjustment\n",
    "\n",
    "Remark that we relied on boosted trees to adjust for conditional parallel trends which allow for a nonlinear adjustment. In comparison to linear adjustment, we could rely on linear learners.\n",
    "\n",
    "**Remark that the DGP (`dgp_type=4`) is based on nonlinear conditional expectations such that the estimates will be biased**\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "linear_learners = {\n",
    "    \"ml_g\": LinearRegression(),\n",
    "    \"ml_m\": LogisticRegression(),\n",
    "}\n",
    "\n",
    "dml_obj_linear = DoubleMLDIDMulti(dml_data, **(default_args | linear_learners))\n",
    "dml_obj_linear.fit()\n",
    "dml_obj_linear.bootstrap(n_rep_boot=5000)\n",
    "dml_obj_linear.plot_effects()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Aggregated Effects\n",
    "As the [did-R-package](https://bcallaway11.github.io/did/index.html), the $ATT$'s can be aggregated to summarize multiple effects.\n",
    "For details on different aggregations and details on their interpretations see [Callaway and Sant'Anna(2021)](https://doi.org/10.1016/j.jeconom.2020.12.001).\n",
    "\n",
    "The aggregations are implemented via the `aggregate()` method."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Group Aggregation\n",
    "\n",
    "\n",
    "To obtain group-specific effects one can would like to average $ATT(\\mathrm{g}, t_\\text{eval})$ over $t_\\text{eval}$.\n",
    "As a sample oracle we will combine all `ite`'s based on group $\\mathrm{g}$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_post_treatment = df[df[\"t\"] >= df[\"d\"]]\n",
    "df_post_treatment.groupby(\"d\")[\"ite\"].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To obtain group-specific effects it is possible to aggregate several $\\widehat{ATT}(\\mathrm{g},t_\\text{pre},t_\\text{eval})$ values based on the group $\\mathrm{g}$ by setting the `aggregation=\"group\"` argument."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "aggregated_group = dml_obj.aggregate(aggregation=\"group\")\n",
    "print(aggregated_group)\n",
    "_ = aggregated_group.plot_effects()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The output is a `DoubleMLDIDAggregation` object which includes an overall aggregation summary based on group size."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Time Aggregation\n",
    "\n",
    "To obtain time-specific effects one can would like to average $ATT(\\mathrm{g}, t_\\text{eval})$ over $\\mathrm{g}$ (respecting group size).\n",
    "As a sample oracle we will combine all `ite`'s based on group $\\mathrm{g}$. As oracle values, we obtain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_post_treatment.groupby(\"t\")[\"ite\"].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To aggregate $\\widehat{ATT}(\\mathrm{g},t_\\text{pre},t_\\text{eval})$, based on $t_\\text{eval}$, but weighted with respect to group size. Corresponds to *Calendar Time Effects* from the [did-R-package](https://bcallaway11.github.io/did/index.html).\n",
    "\n",
    "For calendar time effects set `aggregation=\"time\"`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "aggregated_time = dml_obj.aggregate(\"time\")\n",
    "print(aggregated_time)\n",
    "fig, ax = aggregated_time.plot_effects()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Event Study Aggregation\n",
    "\n",
    "To obtain event-study-type effects one can would like to aggregate $ATT(\\mathrm{g}, t_\\text{eval})$ over $e = t_\\text{eval} - \\mathrm{g}$ (respecting group size).\n",
    "As a sample oracle we will combine all `ite`'s based on group $\\mathrm{g}$. As oracle values, we obtain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df[\"e\"] = pd.to_datetime(df[\"t\"]).values.astype(\"datetime64[M]\") - \\\n",
    "    pd.to_datetime(df[\"d\"]).values.astype(\"datetime64[M]\")\n",
    "df.groupby(\"e\")[\"ite\"].mean()[1:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Analogously, `aggregation=\"eventstudy\"` aggregates $\\widehat{ATT}(\\mathrm{g},t_\\text{pre},t_\\text{eval})$ based on exposure time $e = t_\\text{eval} - \\mathrm{g}$ (respecting group size)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "aggregated_eventstudy = dml_obj.aggregate(\"eventstudy\")\n",
    "print(aggregated_eventstudy)\n",
    "aggregated_eventstudy.plot_effects()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Aggregation Details\n",
    "\n",
    "The `DoubleMLDIDAggregation` objects include several `DoubleMLFrameworks` which support methods like `bootstrap()` or `confint()`.\n",
    "Further, the weights can be accessed via the properties\n",
    "\n",
    " - ``overall_aggregation_weights``: weights for the overall aggregation\n",
    " - ``aggregation_weights``: weights for the aggregation\n",
    "\n",
    "To clarify, e.g. for the eventstudy aggregation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(aggregated_eventstudy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, the overall effect aggregation aggregates each effect with positive exposure"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(aggregated_eventstudy.overall_aggregation_weights)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If one would like to consider how the aggregated effect with $e=0$ is computed, one would have to look at the corresponding set of weights within the ``aggregation_weights`` property"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# the weights for e=0 correspond to the fifth element of the aggregation weights\n",
    "aggregated_eventstudy.aggregation_weights[4]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Taking a look at the original `dml_obj`, one can see that this combines the following estimates (only show month):\n",
    "\n",
    " - $\\widehat{ATT}(04,03,04)$\n",
    " - $\\widehat{ATT}(05,04,05)$\n",
    " - $\\widehat{ATT}(06,05,06)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(dml_obj.summary[\"coef\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Anticipation\n",
    "\n",
    "As described in the [Model Guide](https://docs.doubleml.org/stable/guide/models.html#difference-in-differences-models-did), one can include anticipation periods $\\delta>0$ by setting the `anticipation_periods` parameter."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Data with Anticipation\n",
    "\n",
    "The DGP allows to include anticipation periods via the `anticipation_periods` parameter.\n",
    "In this case the observations will be \"shifted\" such that units anticipate the effect earlier and the exposure effect is increased by the number of periods where the effect is anticipated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_obs = 4000\n",
    "n_periods = 6\n",
    "\n",
    "df_anticipation = make_did_CS2021(n_obs, dgp_type=4, n_periods=n_periods, n_pre_treat_periods=3, time_type=\"datetime\", anticipation_periods=1)\n",
    "\n",
    "print(df_anticipation.shape)\n",
    "df_anticipation.head()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To visualize the anticipation, we will again plot the \"oracle\" values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_anticipation[\"ite\"] = df_anticipation[\"y1\"] - df_anticipation[\"y0\"]\n",
    "df_anticipation[\"First Treated\"] = df_anticipation[\"d\"].dt.strftime(\"%Y-%m\").fillna(\"Never Treated\")\n",
    "agg_df_anticipation = df_anticipation.groupby([\"t\", \"First Treated\"]).agg(**agg_dictionary).reset_index()\n",
    "agg_df_anticipation.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One can see that the effect is already anticipated one period before the actual treatment assignment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_data(agg_df_anticipation, col_name='ite')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Initialize a corresponding `DoubleMLPanelData` object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dml_data_anticipation = DoubleMLPanelData(\n",
    "    data=df_anticipation,\n",
    "    y_col=\"y\",\n",
    "    d_cols=\"d\",\n",
    "    id_col=\"id\",\n",
    "    t_col=\"t\",\n",
    "    x_cols=[\"Z1\", \"Z2\", \"Z3\", \"Z4\"],\n",
    "    datetime_unit=\"M\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ATT Estimation\n",
    "\n",
    "Let us take a look at the estimation without anticipation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dml_obj_anticipation = DoubleMLDIDMulti(dml_data_anticipation, **default_args)\n",
    "dml_obj_anticipation.fit()\n",
    "dml_obj_anticipation.bootstrap(n_rep_boot=5000)\n",
    "dml_obj_anticipation.plot_effects()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The effects are obviously biased. To include anticipation periods, one can adjust the `anticipation_periods` parameter. Correspondingly, the outcome regression (and not yet treated units) are adjusted."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dml_obj_anticipation = DoubleMLDIDMulti(dml_data_anticipation, **(default_args| {\"anticipation_periods\": 1}))\n",
    "dml_obj_anticipation.fit()\n",
    "dml_obj_anticipation.bootstrap(n_rep_boot=5000)\n",
    "dml_obj_anticipation.plot_effects()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Group-Time Combinations\n",
    "\n",
    "The default option `gt_combinations=\"standard\"` includes all group time values with the specific choice of $t_\\text{pre} = \\min(\\mathrm{g}, t_\\text{eval}) - 1$ (without anticipation) which is the weakest possible parallel trend assumption.\n",
    "\n",
    "Other options are possible or only specific combinations of $(\\mathrm{g},t_\\text{pre},t_\\text{eval})$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### All combinations\n",
    "\n",
    "The  option `gt_combinations=\"all\"` includes all relevant group time values with $t_\\text{pre} < \\min(\\mathrm{g}, t_\\text{eval})$, including longer parallel trend assumptions.\n",
    "This can result in multiple estimates for the same $ATT(\\mathrm{g},t)$, which have slightly different assumptions (length of parallel trends)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dml_obj_all = DoubleMLDIDMulti(dml_data, **(default_args| {\"gt_combinations\": \"all\"}))\n",
    "dml_obj_all.fit()\n",
    "dml_obj_all.bootstrap(n_rep_boot=5000)\n",
    "dml_obj_all.plot_effects()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Selected Combinations\n",
    "\n",
    "Instead it is also possible to just submit a list of tuples containing $(\\mathrm{g}, t_\\text{pre}, t_\\text{eval})$ combinations. E.g. only two combinations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gt_dict = {\n",
    "    \"gt_combinations\": [\n",
    "        (np.datetime64('2025-04'),\n",
    "         np.datetime64('2025-01'),\n",
    "         np.datetime64('2025-02')),\n",
    "        (np.datetime64('2025-04'),\n",
    "         np.datetime64('2025-02'),\n",
    "         np.datetime64('2025-03')),\n",
    "    ]\n",
    "}\n",
    "\n",
    "dml_obj_all = DoubleMLDIDMulti(dml_data, **(default_args| gt_dict))\n",
    "dml_obj_all.fit()\n",
    "dml_obj_all.bootstrap(n_rep_boot=5000)\n",
    "dml_obj_all.plot_effects()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}