2.3.1. doubleml.did.DoubleMLDIDMulti#
- class doubleml.did.DoubleMLDIDMulti(obj_dml_data, ml_g, ml_m=None, gt_combinations='standard', control_group='never_treated', anticipation_periods=0, n_folds=5, n_rep=1, score='observational', panel=True, in_sample_normalization=True, trimming_rule='truncate', trimming_threshold=0.01, ps_processor_config: PSProcessorConfig | None = None, draw_sample_splitting=True, print_periods=False)#
Double machine learning for multi-period difference-in-differences models.
- Parameters:
obj_dml_data (
DoubleMLPanelDataobject) – TheDoubleMLPanelDataobject providing the data and specifying the variables for the causal model.ml_g (estimator implementing
fit()andpredict()) – A machine learner implementingfit()andpredict()methods (e.g.sklearn.ensemble.RandomForestRegressor) for the nuisance function \(g_0(0,X) = E[Y_{t_{\text{eval}}}-Y_{t_{\text{pre}}}|X, C_{t_{ ext{eval}} + \delta} = 1]\). For a binary outcome variable \(Y\) (with values 0 and 1), a classifier implementingfit()andpredict_proba()can also be specified.ml_m (classifier implementing
fit()andpredict_proba()) – A machine learner implementingfit()andpredict_proba()methods (e.g.sklearn.ensemble.RandomForestClassifier) for the nuisance function \(m_0(X) = E[D=1|X]\). Only relevant forscore='observational'. Default isNone.gt_combinations (array-like or str) – A list of tuples with the group-time combinations to be evaluated. Can be a string with the value
'standard','all'or'universal', which constructs the corresponding combinations automatically. Default is'standard'.control_group (str) – Specifies the control group. Either
'never_treated'or'not_yet_treated'. Default is'never_treated'.anticipation_periods (int) – Number of anticipation periods. Default is
0.n_folds (int) – Number of folds for cross-fitting. Default is
5.n_rep (int) – Number of repetitions for the sample splitting. Default is
1.score (str) – A str (
'observational'or'experimental') specifying the score function. The'experimental'scores refers to an A/B setting, where the treatment is independent from the pretreatment covariates. Default is'observational'.panel (bool) – Indicates whether to rely on panel data structure (
True) or repeated cross sections (False). Default isTrue.in_sample_normalization (bool) – Indicates whether to use in-sample normalization of weights. Default is
True.trimming_rule (str, optional, deprecated) – A str (
'truncate'is the only choice) specifying the trimming approach. Default is'truncate'.trimming_rule – (DEPRECATED) A str (
'truncate'is the only choice) specifying the trimming approach. Use ps_processor_config instead. Will be removed in a future version.trimming_threshold (float, optional, deprecated) – (DEPRECATED) The threshold used for trimming. Use ps_processor_config instead. Will be removed in a future version.
ps_processor_config (PSProcessorConfig, optional) – Configuration for propensity score processing (clipping, calibration, etc.).
print_periods (bool) – Indicates whether to print information about the evaluated periods. Default is
False.
Examples
>>> import numpy as np >>> import doubleml as dml >>> from doubleml.did.datasets import make_did_CS2021 >>> from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier >>> np.random.seed(42) >>> df = make_did_CS2021(n_obs=500) >>> dml_data = dml.data.DoubleMLPanelData( ... df, ... y_col="y", ... d_cols="d", ... id_col="id", ... t_col="t", ... x_cols=["Z1", "Z2", "Z3", "Z4"], ... datetime_unit="M" ... ) >>> ml_g = RandomForestRegressor(n_estimators=100, max_depth=5) >>> ml_m = RandomForestClassifier(n_estimators=100, max_depth=5) >>> dml_did_obj = dml.did.DoubleMLDIDMulti( ... obj_dml_data=dml_data, ... ml_g=ml_g, ... ml_m=ml_m, ... gt_combinations="standard", ... control_group="never_treated", ... ) >>> print(dml_did_obj.fit().summary) coef std err ... 2.5 % 97.5 % ATT(2025-03,2025-01,2025-02) -0.797617 0.459617 ... -1.698450 0.103215 ATT(2025-03,2025-02,2025-03) 0.270311 0.456453 ... -0.624320 1.164941 ATT(2025-03,2025-02,2025-04) 0.628213 0.895275 ... -1.126494 2.382919 ATT(2025-03,2025-02,2025-05) 1.281360 1.327121 ... -1.319750 3.882470 ATT(2025-04,2025-01,2025-02) -0.078095 0.407758 ... -0.877287 0.721097 ATT(2025-04,2025-02,2025-03) 0.223625 0.479288 ... -0.715764 1.163013 ATT(2025-04,2025-03,2025-04) 1.008674 0.455564 ... 0.115785 1.901563 ATT(2025-04,2025-03,2025-05) 2.941047 0.832991 ... 1.308415 4.573679 ATT(2025-05,2025-01,2025-02) -0.102282 0.454129 ... -0.992359 0.787795 ATT(2025-05,2025-02,2025-03) 0.108742 0.547794 ... -0.964914 1.182399 ATT(2025-05,2025-03,2025-04) 0.253610 0.422984 ... -0.575423 1.082643 ATT(2025-05,2025-04,2025-05) 1.264255 0.487934 ... 0.307923 2.220587 [12 rows x 6 columns]
Methods
aggregate([aggregation])Aggregates treatment effects.
bootstrap([method, n_rep_boot])Multiplier bootstrap for DoubleML models.
confint([joint, level])Confidence intervals for DoubleML models.
fit([n_jobs_models, n_jobs_cv, ...])Estimate DoubleMLDIDMulti models.
p_adjust([method])Multiple testing adjustment for DoubleML models.
plot_effects([level, result_type, joint, ...])Plots coefficient estimates with confidence intervals over time, grouped by first treated period.
sensitivity_analysis([cf_y, cf_d, rho, ...])Performs a sensitivity analysis to account for unobserved confounders.
sensitivity_benchmark(benchmarking_set[, ...])Computes a benchmark for a given set of features.
sensitivity_plot([idx_treatment, value, ...])Contour plot of the sensivity with respect to latent/confounding variables.
tune_ml_models(ml_param_space[, ...])Hyperparameter-tuning for DoubleML models using Optuna.
Attributes
all_coefEstimates of the causal parameter(s) for the
n_repdifferent sample splits after callingfit()(shape (n_gt_atts,n_rep)).all_seStandard errors of the causal parameter(s) for the
n_repdifferent sample splits after callingfit()(shape (n_gt_atts,n_rep)).anticipation_periodsThe number of anticipation periods.
boot_methodThe method to construct the bootstrap replications.
boot_t_statBootstrapped t-statistics for the causal parameter(s) after calling
fit()andbootstrap()(shape (n_rep_boot,n_gt_atts,n_rep)).coefEstimates for the causal parameter(s) after calling
fit()(shape (n_gt_atts,)).control_groupThe control group.
frameworkThe corresponding
doubleml.DoubleMLFrameworkobject.g_valuesThe values of the treatment variable.
gt_combinationsThe combinations of g and t values.
gt_indexThe index of the combinations of g and t values.
gt_labelsThe evaluated labels of the treatment effects 'ATT(g, t_pre, t_eval)' and the period.
in_sample_normalizationIndicates whether the in sample normalization of weights are used.
modellistThe list of DoubleMLDIDBinary models.
n_foldsNumber of folds.
n_gt_attsThe number of evaluated combinations of the treatment variable and the period.
n_repNumber of repetitions for the sample splitting.
n_rep_bootThe number of bootstrap replications.
never_treated_valueThe value indicating that a unit was never treated.
nuisance_lossThe losses of the nuisance models (root-mean-squared-errors or logloss).
panelIndicates whether to rely on panel data structure (
True) or repeated cross sections (False).ps_processorPropensity score processor.
ps_processor_configConfiguration for propensity score processing (clipping, calibration, etc.).
pvalp-values for the causal parameter(s) (shape (
n_gt_atts,)).scoreThe score function.
seStandard errors for the causal parameter(s) after calling
fit()(shape (n_gt_atts,)).sensitivity_elementsValues of the sensitivity components after calling
fit(); If available (e.g., PLR, IRM) a dictionary with entriessigma2,nu2,psi_sigma2,psi_nu2andriesz_rep.sensitivity_paramsValues of the sensitivity parameters after calling
sesitivity_analysis(); If available (e.g., PLR, IRM) a dictionary with entriestheta,se,ci,rvandrva.sensitivity_summaryReturns a summary for the sensitivity analysis after calling
sensitivity_analysis().summaryA summary for the estimated causal effect after calling
fit().t_statt-statistics for the causal parameter(s) after calling
fit()(shape (n_gt_atts,)).t_valuesThe values of the time periods.
trimming_ruleSpecifies the used trimming rule.
trimming_thresholdSpecifies the used trimming threshold.
- DoubleMLDIDMulti.aggregate(aggregation='group')#
Aggregates treatment effects.
- Parameters:
aggregation (str or dict) – Method to aggregate treatment effects or dictionary with aggregation weights (masked numpy array). Has to one of
'group','time','eventstudy'or a masked numpy array. Default is'group'.- Returns:
Aggregated treatment effects framework
- Return type:
DoubleMLFramework
- DoubleMLDIDMulti.bootstrap(method='normal', n_rep_boot=500)#
Multiplier bootstrap for DoubleML models.
- DoubleMLDIDMulti.confint(joint=False, level=0.95)#
Confidence intervals for DoubleML models.
- DoubleMLDIDMulti.fit(n_jobs_models=None, n_jobs_cv=None, store_predictions=True, store_models=False, external_predictions=None)#
Estimate DoubleMLDIDMulti models.
- Parameters:
n_jobs_models (None or int) – The number of CPUs to use to fit the group-time ATTs.
Nonemeans1. Default isNone.n_jobs_cv (None or int) – The number of CPUs to use to fit the learners.
Nonemeans1. Does not speed up computation for quantile models. Default isNone.store_predictions (bool) – Indicates whether the predictions for the nuisance functions should be stored in
predictions. Default isTrue.store_models (bool) – Indicates whether the fitted models for the nuisance functions should be stored in
models. This allows to analyze the fitted models or extract information like variable importance. Default isFalse.external_predictions (dict or None) – A nested dictionary where the keys correspond the the treatment levels and can contain predictions according to each treatment level. The values have to be dictionaries which can contain keys
'ml_g0','ml_g1'and'ml_m'. Default is None.
- Returns:
self
- Return type:
- DoubleMLDIDMulti.p_adjust(method='romano-wolf')#
Multiple testing adjustment for DoubleML models.
- Parameters:
method (str) – A str (
'romano-wolf'','bonferroni','holm', etc) specifying the adjustment method. In addition to'romano-wolf'', all methods implemented instatsmodels.stats.multitest.multipletests()can be applied. Default is'romano-wolf'.- Returns:
p_val – A data frame with adjusted p-values.
- Return type:
pd.DataFrame
- DoubleMLDIDMulti.plot_effects(level=0.95, result_type='effect', joint=True, figsize=(12, 8), color_palette='colorblind', date_format=None, y_label=None, title=None, jitter_value=None, default_jitter=0.1)#
Plots coefficient estimates with confidence intervals over time, grouped by first treated period.
- Parameters:
level (float) – The confidence level for the intervals. Default is
0.95.result_type (str) – Type of result to plot. Either
'effect'for point estimates,'rv'for robustness values,'est_bounds'for estimate bounds, or'ci_bounds'for confidence interval bounds. Default is'effect'.joint (bool) – Indicates whether joint confidence intervals are computed. Default is
True.figsize (tuple) – Figure size as (width, height). Default is
(12, 8).color_palette (str) – Name of seaborn color palette to use for distinguishing pre and post treatment effects. Default is
"colorblind".date_format (str) – Format string for date ticks if x-axis contains datetime values. Default is
None.y_label (str) – Label for y-axis. Default is
None.title (str) – Title for the entire plot. Default is
None.jitter_value (float) – Amount of jitter to apply to points. Default is
None.default_jitter (float) – Default amount of jitter to apply to points. Default is
0.1.
- Returns:
fig (matplotlib.figure.Figure) – The created figure object
axes (list) – List of matplotlib axis objects for further customization
Notes
If joint=True and bootstrapping hasn’t been performed, this method will automatically perform bootstrapping with default parameters and issue a warning.
- DoubleMLDIDMulti.sensitivity_analysis(cf_y=0.03, cf_d=0.03, rho=1.0, level=0.95, null_hypothesis=0.0)#
Performs a sensitivity analysis to account for unobserved confounders. The evaluated scenario is stored as a dictionary in the property
sensitivity_params.- Parameters:
cf_y (float) – Percentage of the residual variation of the outcome explained by latent/confounding variables. Default is
0.03.cf_d (float) – Percentage gains in the variation of the Riesz representer generated by latent/confounding variables. Default is
0.03.rho (float) – The correlation between the differences in short and long representations in the main regression and Riesz representer. Has to be in [-1,1]. The absolute value determines the adversarial strength of the confounding (maximizes at 1.0). Default is
1.0.level (float) – The confidence level. Default is
0.95.null_hypothesis (float or numpy.ndarray) – Null hypothesis for the effect. Determines the robustness values. If it is a single float uses the same null hypothesis for all estimated parameters. Else the array has to be of shape (n_coefs,). Default is
0.0.
- Returns:
self
- Return type:
- DoubleMLDIDMulti.sensitivity_benchmark(benchmarking_set, fit_args=None)#
Computes a benchmark for a given set of features. Returns a DataFrame containing the corresponding values for cf_y, cf_d, rho and the change in estimates.
- Returns:
benchmark_results – Benchmark results.
- Return type:
- DoubleMLDIDMulti.sensitivity_plot(idx_treatment=0, value='theta', rho=1.0, level=0.95, null_hypothesis=0.0, include_scenario=True, benchmarks=None, fill=True, grid_bounds=(0.15, 0.15), grid_size=100)#
Contour plot of the sensivity with respect to latent/confounding variables.
- Parameters:
idx_gt_atte (int) – Index of the treatment to perform the sensitivity analysis. Default is
0.value (str) – Determines which contours to plot. Valid values are
'theta'(refers to the bounds) and'ci'(refers to the bounds including statistical uncertainty). Default is'theta'.rho (float) – The correlation between the differences in short and long representations in the main regression and Riesz representer. Has to be in [-1,1]. The absolute value determines the adversarial strength of the confounding (maximizes at 1.0). Default is
1.0.level (float) – The confidence level. Default is
0.95.null_hypothesis (float) – Null hypothesis for the effect. Determines the direction of the contour lines.
include_scenario (bool) – Indicates whether to highlight the scenario from the call of
sensitivity_analysis(). Default isTrue.benchmarks (dict or None) – Dictionary of benchmarks to be included in the plot. The keys are
cf_y,cf_dandname. Default isNone.fill (bool) – Indicates whether to use a heatmap style or only contour lines. Default is
True.grid_bounds (tuple) – Determines the evaluation bounds of the grid for
cf_dandcf_y. Has to contain two floats in [0, 1). Default is(0.15, 0.15).grid_size (int) – Determines the number of evaluation points of the grid. Default is
100.
- Returns:
fig – Plotly figure of the sensitivity contours.
- Return type:
- DoubleMLDIDMulti.tune_ml_models(ml_param_space, scoring_methods=None, cv=5, set_as_params=True, return_tune_res=False, optuna_settings=None)#
Hyperparameter-tuning for DoubleML models using Optuna.
The hyperparameter-tuning is performed using Optuna’s Bayesian optimization. Unlike grid/randomized search, Optuna tuning is performed once on the whole dataset using cross-validation, and the same optimal hyperparameters are used for all folds.
- Parameters:
ml_param_space (dict) –
A dict with a parameter grid function for each nuisance model (see attribute
params_names) or for each learner (see attributelearner_names). Mixed specification are allowed, i.e., some nuisance models can share the same learner. For mixed specifications, learner-specific settings will be overwritten by nuisance model-specific settings.Each parameter grid must be specified as a callable function that takes an Optuna trial and returns a dictionary of hyperparameters.
For PLR models, keys should be:
'ml_l','ml_m'(and optionally'ml_g'for IV-type score). For IRM models, keys should be:'ml_g0','ml_g1'(or just'ml_g'for both),'ml_m'.Example:
def ml_l_params(trial): return { 'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True), 'n_estimators': trial.suggest_int('n_estimators', 100, 500, step=50), 'num_leaves': trial.suggest_int('num_leaves', 20, 256), 'min_child_samples': trial.suggest_int('min_child_samples', 5, 100), } ml_param_space = {'ml_l': ml_l_params, 'ml_m': ml_m_params}
Note: Optuna tuning is performed globally (not fold-specific) to ensure consistent hyperparameters across all folds.
scoring_methods (None or dict) – The scoring method used to evaluate the predictions. The scoring method must be set per nuisance model via a dict (see attribute
params_namesfor the keys). If None, the estimator’s score method is used. Default isNone.cv (int, cross-validation splitter, or iterable of (train_indices, test_indices)) – Cross-validation strategy used for Optuna-based tuning. If an integer is provided, a shuffled
sklearn.model_selection.KFoldwith the specified number of splits andrandom_state=42is used. Custom splitters must implementsplit(and ideallyget_n_splits), or be an iterable yielding(train_indices, test_indices)pairs. Default is5.set_as_params (bool) – Indicates whether the hyperparameters should be set in order to be used when
fit()is called. Default isTrue.return_tune_res (bool) – Indicates whether detailed tuning results should be returned. Default is
False.optuna_settings (None or dict) –
Optional configuration passed to the Optuna tuner. Supports global settings as well as learner-specific overrides (using the keys from
ml_param_space). The dictionary can contain entries corresponding to Optuna’s study and optimize configuration such as:n_trials(int): Number of optimization trials (default: 100)timeout(float): Time limit in seconds for the study (default: None)direction(str): Optimization direction, ‘maximize’ or ‘minimize’. For sklearn scorers, use ‘maximize’ for negative metrics like ‘neg_mean_squared_error’ (since -0.1 > -0.2 means better performance). Can be set globally or per learner. (default: ‘maximize’)sampler(optuna.samplers.BaseSampler): Optuna sampler instance (default: None, uses TPE)callbacks(list): List of callback functions (default: None)show_progress_bar(bool): Show progress bar during optimization (default: False)n_jobs_optuna(int): Number of parallel trials (default: None)verbosity(int): Optuna logging verbosity level (default: None)study(optuna.study.Study): Pre-created study instance (default: None)study_kwargs(dict): Additional kwargs for study creation (default: {})optimize_kwargs(dict): Additional kwargs for study.optimize() (default: {})
To set direction per learner (similar to
scoring_methods):optuna_settings = { 'n_trials': 50, 'direction': 'maximize', # Global default 'ml_g0': {'direction': 'maximize'}, # Per-learner override 'ml_m': {'n_trials': 100, 'direction': 'maximize'} }
Defaults to
None.
- Returns:
self (object) – Returned if
return_tune_resisFalse.tune_res (list) – A list containing detailed tuning results and the proposed hyperparameters. Returned if
return_tune_resisTrue.
Examples
>>> import numpy as np >>> from doubleml import DoubleMLData, DoubleMLPLR >>> from doubleml.plm.datasets import make_plr_CCDDHNR2018 >>> from lightgbm import LGBMRegressor >>> import optuna >>> # Generate data >>> np.random.seed(42) >>> data = make_plr_CCDDHNR2018(n_obs=500, dim_x=20, return_type='DataFrame') >>> dml_data = DoubleMLData(data, 'y', 'd') >>> # Initialize model >>> dml_plr = DoubleMLPLR( ... dml_data, ... LGBMRegressor(n_estimators=50, verbose=-1, random_state=42), ... LGBMRegressor(n_estimators=50, verbose=-1, random_state=42) ... ) >>> # Define parameter grid functions >>> def ml_l_params(trial): ... return { ... 'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True), ... } >>> def ml_m_params(trial): ... return { ... 'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True), ... } >>> ml_param_space = {'ml_l': ml_l_params, 'ml_m': ml_m_params} >>> # Tune with TPE sampler >>> optuna_settings = { ... 'n_trials': 5, ... 'sampler': optuna.samplers.TPESampler(seed=42), ... } >>> tune_res = dml_plr.tune_ml_models(ml_param_space, optuna_settings=optuna_settings, return_tune_res=True) >>> print(tune_res[0]['ml_l'].best_params) {'learning_rate': 0.03907122389107094} >>> # Fit and get results >>> dml_plr.fit().summary coef std err t P>|t| 2.5 % 97.5 % d 0.57436 0.045206 12.705519 5.510257e-37 0.485759 0.662961 >>> # Example with scoring methods and directions >>> scoring_methods = { ... 'ml_l': 'neg_mean_squared_error', # Negative metric ... 'ml_m': 'neg_mean_squared_error' ... } >>> optuna_settings = { ... 'n_trials': 50, ... 'direction': 'maximize', # Maximize negative MSE (minimize MSE) ... 'sampler': optuna.samplers.TPESampler(seed=42), ... } >>> tune_res = dml_plr.tune_ml_models(ml_param_space, scoring_methods=scoring_methods, ... optuna_settings=optuna_settings, return_tune_res=True) >>> print(tune_res[0]['ml_l'].best_params) {'learning_rate': 0.04300012336462904} >>> dml_plr.fit().summary coef std err t P>|t| 2.5 % 97.5 % d 0.574796 0.045062 12.755721 2.896820e-37 0.486476 0.663115