Note
-
Download Jupyter notebook:
https://docs.doubleml.org/stable/examples/did/py_panel_data_example.ipynb.
Python: Real-Data Example for Multi-Period Difference-in-Differences#
In this example, we replicate a real-data demo notebook from the did-R-package in order to illustrate the use of DoubleML
for multi-period difference-in-differences (DiD) models.
The notebook requires the following packages:
[1]:
import pyreadr
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.dummy import DummyRegressor, DummyClassifier
from sklearn.linear_model import LassoCV, LogisticRegressionCV
from doubleml.data import DoubleMLPanelData
from doubleml.did import DoubleMLDIDMulti
Causal Research Question#
Callaway and Sant’Anna (2021) study the causal effect of raising the minimum wage on teen employment in the US using county data over a period from 2001 to 2007. A county is defined as treated if the minimum wage in that county is above the federal minimum wage. We focus on a preprocessed balanced panel data set as provided by the did-R-package. The corresponding documentation for the mpdta
data is available from the did package website. We use this data solely as a demonstration example to help readers understand differences in the DoubleML
and did
packages. An analogous notebook using the same data is available from the did documentation.
We follow the original notebook and provide results under identification based on unconditional and conditional parallel trends. For the Double Machine Learning (DML) Difference-in-Differences estimator, we demonstrate two different specifications, one based on linear and logistic regression and one based on their \(\ell_1\) penalized variants Lasso and logistic regression with cross-validated penalty choice. The results for the former are expected to be very similar to those in the did data example. Minor differences might arise due to the use of sample-splitting in the DML estimation.
Data#
We will download and read a preprocessed data file as provided by the did-R-package.
[2]:
# download file from did package for R
url = "https://github.com/bcallaway11/did/raw/refs/heads/master/data/mpdta.rda"
pyreadr.download_file(url, "mpdta.rda")
mpdta = pyreadr.read_r("mpdta.rda")["mpdta"]
mpdta.head()
[2]:
year | countyreal | lpop | lemp | first.treat | treat | |
---|---|---|---|---|---|---|
0 | 2003 | 8001.0 | 5.896761 | 8.461469 | 2007.0 | 1.0 |
1 | 2004 | 8001.0 | 5.896761 | 8.336870 | 2007.0 | 1.0 |
2 | 2005 | 8001.0 | 5.896761 | 8.340217 | 2007.0 | 1.0 |
3 | 2006 | 8001.0 | 5.896761 | 8.378161 | 2007.0 | 1.0 |
4 | 2007 | 8001.0 | 5.896761 | 8.487352 | 2007.0 | 1.0 |
To work with DoubleML, we initialize a DoubleMLPanelData
object. The input data has to satisfy some requirements, i.e., it should be in a long format with every row containing the information of one unit at one time period. Moreover, the data should contain a column on the unit identifier and a column on the time period. The requirements are virtually identical to those of the
did-R-package, as listed in their data example. In line with the naming conventions of DoubleML, the treatment group indicator is passed to DoubleMLPanelData
by the d_cols
argument. To flexibly handle different formats for handling time periods, the time variable t_col
can handle float
,
int
and datetime
formats. More information are available in the user guide. To indicate never treated units, we set their value for the treatment group variable to np.inf
.
Now, we can initialize the DoubleMLPanelData
object, specifying
y_col
: the outcomed_cols
: the group variable indicating the first treated period for each unitid_col
: the unique identification column for each unitt_col
: the time columnx_cols
: the additional pre-treatment controls
[3]:
# Set values for treatment group indicator for never-treated to np.inf
mpdta.loc[mpdta['first.treat'] == 0, 'first.treat'] = np.inf
dml_data = DoubleMLPanelData(
data=mpdta,
y_col="lemp",
d_cols="first.treat",
id_col="countyreal",
t_col="year",
x_cols=['lpop']
)
print(dml_data)
================== DoubleMLPanelData Object ==================
------------------ Data summary ------------------
Outcome variable: lemp
Treatment variable(s): ['first.treat']
Covariates: ['lpop']
Instrument variable(s): None
Time variable: year
Id variable: countyreal
No. Observations: 500
------------------ DataFrame info ------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2500 entries, 0 to 2499
Columns: 6 entries, year to treat
dtypes: float64(5), int32(1)
memory usage: 107.6 KB
Note that we specified a pre-treatment confounding variable lpop
through the x_cols
argument. To consider cases under unconditional parallel trends, we can use dummy learners to ignore the pre-treatment confounding variable. This is illustrated below.
ATT Estimation: Unconditional Parallel Trends#
We start with identification under the unconditional parallel trends assumption. To do so, initialize a DoubleMLDIDMulti
object (see model documentation), which takes the previously initialized DoubleMLPanelData
object as input. We use scikit-learn’s DummyRegressor
(documentation here) and
DummyClassifier
(documentation here) to ignore the pre-treatment confounding variable. At this stage, we can also pass further options, for example specifying the number of folds and repetitions used for cross-fitting.
When calling the fit()
method, the model estimates standard combinations of \(ATT(g,t)\) parameters, which corresponds to the defaults in the did-R-package. These combinations can also be customized through the gt_combinations
argument, see the user guide.
[4]:
dml_obj = DoubleMLDIDMulti(
obj_dml_data=dml_data,
ml_g=DummyRegressor(),
ml_m=DummyClassifier(),
control_group="never_treated",
n_folds=10
)
dml_obj.fit()
print(dml_obj.summary.round(4))
coef std err t P>|t| 2.5 % 97.5 %
ATT(2004.0,2003,2004) -0.0105 0.0232 -0.4526 0.6508 -0.0560 0.0350
ATT(2004.0,2003,2005) -0.0704 0.0310 -2.2700 0.0232 -0.1312 -0.0096
ATT(2004.0,2003,2006) -0.1372 0.0363 -3.7763 0.0002 -0.2085 -0.0660
ATT(2004.0,2003,2007) -0.1008 0.0343 -2.9381 0.0033 -0.1680 -0.0335
ATT(2006.0,2003,2004) 0.0065 0.0232 0.2811 0.7786 -0.0390 0.0520
ATT(2006.0,2004,2005) -0.0027 0.0195 -0.1400 0.8886 -0.0410 0.0355
ATT(2006.0,2005,2006) -0.0046 0.0179 -0.2585 0.7960 -0.0397 0.0304
ATT(2006.0,2005,2007) -0.0412 0.0202 -2.0404 0.0413 -0.0807 -0.0016
ATT(2007.0,2003,2004) 0.0304 0.0151 2.0197 0.0434 0.0009 0.0600
ATT(2007.0,2004,2005) -0.0027 0.0165 -0.1645 0.8693 -0.0350 0.0296
ATT(2007.0,2005,2006) -0.0310 0.0179 -1.7348 0.0828 -0.0660 0.0040
ATT(2007.0,2006,2007) -0.0260 0.0167 -1.5628 0.1181 -0.0587 0.0066
The summary displays estimates of the \(ATT(g,t_\text{eval})\) effects for different combinations of \((g,t_\text{eval})\) via \(\widehat{ATT}(\mathrm{g},t_\text{pre},t_\text{eval})\), where
\(\mathrm{g}\) specifies the group
\(t_\text{pre}\) specifies the corresponding pre-treatment period
\(t_\text{eval}\) specifies the evaluation period
This corresponds to the estimates given in att_gt
function in the did-R-package, where the standard choice is \(t_\text{pre} = \min(\mathrm{g}, t_\text{eval}) - 1\) (without anticipation).
Remark that this includes pre-tests effects if \(\mathrm{g} > t_{eval}\), e.g. \(ATT(2007,2005)\).
As usual for the DoubleML-package, you can obtain joint confidence intervals via bootstrap.
[5]:
level = 0.95
ci = dml_obj.confint(level=level)
dml_obj.bootstrap(n_rep_boot=5000)
ci_joint = dml_obj.confint(level=level, joint=True)
print(ci_joint)
2.5 % 97.5 %
ATT(2004.0,2003,2004) -0.075498 0.054493
ATT(2004.0,2003,2005) -0.157304 0.016469
ATT(2004.0,2003,2006) -0.239029 -0.035446
ATT(2004.0,2003,2007) -0.196831 -0.004706
ATT(2006.0,2003,2004) -0.058475 0.071524
ATT(2006.0,2004,2005) -0.057417 0.051949
ATT(2006.0,2005,2006) -0.054715 0.045468
ATT(2006.0,2005,2007) -0.097724 0.015351
ATT(2007.0,2003,2004) -0.011777 0.072669
ATT(2007.0,2004,2005) -0.048822 0.043404
ATT(2007.0,2005,2006) -0.081068 0.019054
ATT(2007.0,2006,2007) -0.072675 0.020621
A visualization of the effects can be obtained via the plot_effects()
method.
Remark that the plot used joint confidence intervals per default.
[6]:
fig, ax = dml_obj.plot_effects()
C:\Users\bachp\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\matplotlib\cbook.py:1762: FutureWarning: Calling float on a single element Series is deprecated and will raise a TypeError in the future. Use float(ser.iloc[0]) instead
return math.isfinite(val)

Effect Aggregation#
As the did-R-package, the \(ATT\)’s can be aggregated to summarize multiple effects. For details on different aggregations and details on their interpretations see Callaway and Sant’Anna(2021).
The aggregations are implemented via the aggregate()
method. We follow the structure of the did package notebook and start with an aggregation relative to the treatment timing.
Event Study Aggregation#
We can aggregate the \(ATT\)s relative to the treatment timing. This is done by setting aggregation="eventstudy"
in the aggregate()
method. aggregation="eventstudy"
aggregates \(\widehat{ATT}(\mathrm{g},t_\text{pre},t_\text{eval})\) based on exposure time \(e = t_\text{eval} - \mathrm{g}\) (respecting group size).
[7]:
# rerun bootstrap for valid simultaneous inference (as values are not saved)
dml_obj.bootstrap(n_rep_boot=5000)
aggregated_eventstudy = dml_obj.aggregate("eventstudy")
# run bootstrap to obtain simultaneous confidence intervals
print(aggregated_eventstudy)
fig, ax = aggregated_eventstudy.plot_effects()
================== DoubleMLDIDAggregation Object ==================
Event Study Aggregation
------------------ Overall Aggregated Effects ------------------
coef std err t P>|t| 2.5 % 97.5 %
-0.077214 0.019951 -3.870174 0.000109 -0.116317 -0.038111
------------------ Aggregated Effects ------------------
coef std err t P>|t| 2.5 % 97.5 %
-3.0 0.030446 0.015075 2.019662 0.043418 0.000900 0.059992
-2.0 -0.000549 0.013317 -0.041223 0.967118 -0.026650 0.025552
-1.0 -0.024393 0.014200 -1.717808 0.085832 -0.052226 0.003439
0.0 -0.019919 0.011816 -1.685694 0.091855 -0.043079 0.003241
1.0 -0.050930 0.016783 -3.034679 0.002408 -0.083824 -0.018037
2.0 -0.137238 0.036342 -3.776254 0.000159 -0.208467 -0.066008
3.0 -0.100768 0.034297 -2.938126 0.003302 -0.167989 -0.033548
------------------ Additional Information ------------------
Score function: observational
Control group: never_treated
Anticipation periods: 0
C:\Users\bachp\Documents\Promotion\DissundPapers\Software\DoubleML\doubleml-for-py\doubleml\did\did_aggregation.py:368: UserWarning: Joint confidence intervals require bootstrapping which hasn't been performed yet. Automatically applying '.aggregated_frameworks.bootstrap(method="normal", n_rep_boot=500)' with default values. For different bootstrap settings, call bootstrap() explicitly before plotting.
warnings.warn(

Alternatively, the \(ATT\) could also be aggregated according to (calendar) time periods or treatment groups, see the user guide.
Aggregation Details#
TODO: Keep or drop this?
The DoubleMLDIDAggregation
objects include several DoubleMLFrameworks
which support methods like bootstrap()
or confint()
. Further, the weights can be accessed via the properties
overall_aggregation_weights
: weights for the overall aggregationaggregation_weights
: weights for the aggregation
To clarify, e.g. for the eventstudy aggregation
If one would like to consider how the aggregated effect with \(e=0\) is computed, one would have to look at the third set of weights within the aggregation_weights
property
[8]:
aggregated_eventstudy.aggregation_weights[2]
[8]:
array([0. , 0. , 0. , 0. , 0. ,
0.23391813, 0. , 0. , 0. , 0. ,
0.76608187, 0. ])
ATT Estimation: Conditional Parallel Trends#
We briefly demonstrate how to use the DoubleMLDIDMulti
model with conditional parallel trends. As the rationale behind DML is to flexibly model nuisance components as prediction problems, the DML DiD estimator includes pre-treatment covariates by default. In DiD, the nuisance components are the outcome regression and the propensity score estimation for the treatment group variable. This is why we had to enforce dummy learners in the unconditional parallel trends case to ignore the
pre-treatment covariates. Now, we can replicate the classical doubly robust DiD estimator as of Callaway and Sant’Anna(2021) by using linear and logistic regression for the nuisance components. This is done by setting ml_g
to LinearRegression()
and ml_m
to LogisticRegression()
. Similarly, we can also choose other learners, for example by setting ml_g
and ml_m
to LassoCV()
and LogisticRegressionCV()
. We present
the results for the ATTs and their event-study aggregation in the corresponding effect plots.
Please note that the example is meant to illustrate the usage of the DoubleMLDIDMulti
model in combination with ML learners. In real-data applicatoins, careful choice and empirical evaluation of the learners are required. Default measures for the prediction of the nuisance components are printed in the model summary, as briefly illustrated below.
[9]:
dml_obj_linear_logistic = DoubleMLDIDMulti(
obj_dml_data=dml_data,
ml_g=LinearRegression(),
ml_m=LogisticRegression(penalty=None),
control_group="never_treated",
n_folds=10
)
dml_obj_linear_logistic.fit()
dml_obj_linear_logistic.bootstrap(n_rep_boot=5000)
dml_obj_linear_logistic.plot_effects(title="Estimated ATTs by Group, Linear and logistic Regression")
C:\Users\bachp\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\matplotlib\cbook.py:1762: FutureWarning: Calling float on a single element Series is deprecated and will raise a TypeError in the future. Use float(ser.iloc[0]) instead
return math.isfinite(val)
[9]:
(<Figure size 1200x800 with 4 Axes>,
[<Axes: title={'center': 'First Treated: 2004.0'}, ylabel='Effect'>,
<Axes: title={'center': 'First Treated: 2006.0'}, ylabel='Effect'>,
<Axes: title={'center': 'First Treated: 2007.0'}, xlabel='Evaluation Period', ylabel='Effect'>])

We briefly look at the model summary, which includes some standard diagnostics for the prediction of the nuisance components.
[10]:
print(dml_obj_linear_logistic)
================== DoubleMLDIDMulti Object ==================
------------------ Data summary ------------------
Outcome variable: lemp
Treatment variable(s): ['first.treat']
Covariates: ['lpop']
Instrument variable(s): None
Time variable: year
Id variable: countyreal
No. Observations: 500
------------------ Score & algorithm ------------------
Score function: observational
Control group: never_treated
Anticipation periods: 0
------------------ Machine learner ------------------
Learner ml_g: LinearRegression()
Learner ml_m: LogisticRegression(penalty=None)
Out-of-sample Performance:
Regression:
Learner ml_g0 RMSE: [[0.17197022 0.18219482 0.25977582 0.25762107 0.1726177 0.15159716
0.20238368 0.20650302 0.17381994 0.15150495 0.20118645 0.16352671]]
Learner ml_g1 RMSE: [[0.10293317 0.12500233 0.13673155 0.1356723 0.13881924 0.1126528
0.08473008 0.10271377 0.13282935 0.16430589 0.15938565 0.1613265 ]]
Classification:
Learner ml_m Log Loss: [[0.23186483 0.23061763 0.23153821 0.23146489 0.34981033 0.34925181
0.35002383 0.34858068 0.60836257 0.60569912 0.60630968 0.60700137]]
------------------ Resampling ------------------
No. folds: 10
No. repeated sample splits: 1
------------------ Fit summary ------------------
coef std err t P>|t| 2.5 % \
ATT(2004.0,2003,2004) -0.013506 0.022282 -0.606161 0.544408 -0.057178
ATT(2004.0,2003,2005) -0.077125 0.028778 -2.680033 0.007361 -0.133528
ATT(2004.0,2003,2006) -0.132188 0.035710 -3.701704 0.000214 -0.202178
ATT(2004.0,2003,2007) -0.104989 0.033066 -3.175173 0.001497 -0.169797
ATT(2006.0,2003,2004) -0.000609 0.022310 -0.027312 0.978211 -0.044336
ATT(2006.0,2004,2005) -0.005410 0.018285 -0.295867 0.767332 -0.041248
ATT(2006.0,2005,2006) 0.003109 0.020571 0.151158 0.879851 -0.037209
ATT(2006.0,2005,2007) -0.041588 0.019824 -2.097880 0.035916 -0.080442
ATT(2007.0,2003,2004) 0.027959 0.014092 1.983968 0.047259 0.000338
ATT(2007.0,2004,2005) -0.004452 0.015648 -0.284504 0.776024 -0.035121
ATT(2007.0,2005,2006) -0.028741 0.018211 -1.578214 0.114517 -0.064434
ATT(2007.0,2006,2007) -0.027488 0.016248 -1.691824 0.090679 -0.059333
97.5 %
ATT(2004.0,2003,2004) 0.030165
ATT(2004.0,2003,2005) -0.020722
ATT(2004.0,2003,2006) -0.062197
ATT(2004.0,2003,2007) -0.040182
ATT(2006.0,2003,2004) 0.043118
ATT(2006.0,2004,2005) 0.030428
ATT(2006.0,2005,2006) 0.043428
ATT(2006.0,2005,2007) -0.002734
ATT(2007.0,2003,2004) 0.055580
ATT(2007.0,2004,2005) 0.026217
ATT(2007.0,2005,2006) 0.006952
ATT(2007.0,2006,2007) 0.004357
[11]:
es_linear_logistic = dml_obj_linear_logistic.aggregate("eventstudy")
es_linear_logistic.plot_effects(title="Estimated ATTs by Group, Linear and logistic Regression")
C:\Users\bachp\Documents\Promotion\DissundPapers\Software\DoubleML\doubleml-for-py\doubleml\did\did_aggregation.py:368: UserWarning: Joint confidence intervals require bootstrapping which hasn't been performed yet. Automatically applying '.aggregated_frameworks.bootstrap(method="normal", n_rep_boot=500)' with default values. For different bootstrap settings, call bootstrap() explicitly before plotting.
warnings.warn(
[11]:
(<Figure size 1200x600 with 1 Axes>,
<Axes: title={'center': 'Estimated ATTs by Group, Linear and logistic Regression'}, ylabel='Effect'>)

[12]:
dml_obj_lasso = DoubleMLDIDMulti(
obj_dml_data=dml_data,
ml_g=LassoCV(),
ml_m=LogisticRegressionCV(),
control_group="never_treated",
n_folds=10
)
dml_obj_lasso.fit()
dml_obj_lasso.bootstrap(n_rep_boot=5000)
dml_obj_lasso.plot_effects(title="Estimated ATTs by Group, LassoCV and LogisticRegressionCV()")
C:\Users\bachp\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\matplotlib\cbook.py:1762: FutureWarning: Calling float on a single element Series is deprecated and will raise a TypeError in the future. Use float(ser.iloc[0]) instead
return math.isfinite(val)
[12]:
(<Figure size 1200x800 with 4 Axes>,
[<Axes: title={'center': 'First Treated: 2004.0'}, ylabel='Effect'>,
<Axes: title={'center': 'First Treated: 2006.0'}, ylabel='Effect'>,
<Axes: title={'center': 'First Treated: 2007.0'}, xlabel='Evaluation Period', ylabel='Effect'>])

[13]:
# Model summary
print(dml_obj_lasso)
================== DoubleMLDIDMulti Object ==================
------------------ Data summary ------------------
Outcome variable: lemp
Treatment variable(s): ['first.treat']
Covariates: ['lpop']
Instrument variable(s): None
Time variable: year
Id variable: countyreal
No. Observations: 500
------------------ Score & algorithm ------------------
Score function: observational
Control group: never_treated
Anticipation periods: 0
------------------ Machine learner ------------------
Learner ml_g: LassoCV()
Learner ml_m: LogisticRegressionCV()
Out-of-sample Performance:
Regression:
Learner ml_g0 RMSE: [[0.17242002 0.18142511 0.25935844 0.25958681 0.17338302 0.15220244
0.20145569 0.20566302 0.17235633 0.15194071 0.20103747 0.16455826]]
Learner ml_g1 RMSE: [[0.09911511 0.13069581 0.13900982 0.15099534 0.13918785 0.11262764
0.08873154 0.1102721 0.131316 0.16060588 0.15919193 0.16009149]]
Classification:
Learner ml_m Log Loss: [[0.22914236 0.22913907 0.22913769 0.22913938 0.35596113 0.35595921
0.35596421 0.35595231 0.60886635 0.60885348 0.60885687 0.60885362]]
------------------ Resampling ------------------
No. folds: 10
No. repeated sample splits: 1
------------------ Fit summary ------------------
coef std err t P>|t| 2.5 % \
ATT(2004.0,2003,2004) -0.016264 0.023168 -0.702029 0.482661 -0.061672
ATT(2004.0,2003,2005) -0.077256 0.028982 -2.665626 0.007685 -0.134060
ATT(2004.0,2003,2006) -0.136480 0.035719 -3.820990 0.000133 -0.206487
ATT(2004.0,2003,2007) -0.104836 0.033835 -3.098432 0.001945 -0.171152
ATT(2006.0,2003,2004) -0.001649 0.023339 -0.070667 0.943663 -0.047394
ATT(2006.0,2004,2005) -0.005104 0.019313 -0.264274 0.791569 -0.042956
ATT(2006.0,2005,2006) -0.003129 0.017887 -0.174943 0.861124 -0.038187
ATT(2006.0,2005,2007) -0.041235 0.020320 -2.029240 0.042434 -0.081062
ATT(2007.0,2003,2004) 0.028086 0.015158 1.852874 0.063900 -0.001623
ATT(2007.0,2004,2005) -0.004803 0.016425 -0.292438 0.769951 -0.036996
ATT(2007.0,2005,2006) -0.030093 0.017879 -1.683182 0.092340 -0.065134
ATT(2007.0,2006,2007) -0.028444 0.016791 -1.693985 0.090268 -0.061354
97.5 %
ATT(2004.0,2003,2004) 0.029143
ATT(2004.0,2003,2005) -0.020452
ATT(2004.0,2003,2006) -0.066473
ATT(2004.0,2003,2007) -0.038520
ATT(2006.0,2003,2004) 0.044095
ATT(2006.0,2004,2005) 0.032748
ATT(2006.0,2005,2006) 0.031928
ATT(2006.0,2005,2007) -0.001408
ATT(2007.0,2003,2004) 0.057795
ATT(2007.0,2004,2005) 0.027389
ATT(2007.0,2005,2006) 0.004948
ATT(2007.0,2006,2007) 0.004466
[14]:
es_rf = dml_obj_lasso.aggregate("eventstudy")
es_rf.plot_effects(title="Estimated ATTs by Group, LassoCV and LogisticRegressionCV()")
C:\Users\bachp\Documents\Promotion\DissundPapers\Software\DoubleML\doubleml-for-py\doubleml\did\did_aggregation.py:368: UserWarning: Joint confidence intervals require bootstrapping which hasn't been performed yet. Automatically applying '.aggregated_frameworks.bootstrap(method="normal", n_rep_boot=500)' with default values. For different bootstrap settings, call bootstrap() explicitly before plotting.
warnings.warn(
[14]:
(<Figure size 1200x600 with 1 Axes>,
<Axes: title={'center': 'Estimated ATTs by Group, LassoCV and LogisticRegressionCV()'}, ylabel='Effect'>)
