2.3.1. doubleml.did.DoubleMLDIDMulti#

class doubleml.did.DoubleMLDIDMulti(obj_dml_data, ml_g, ml_m=None, gt_combinations='standard', control_group='never_treated', anticipation_periods=0, n_folds=5, n_rep=1, score='observational', panel=True, in_sample_normalization=True, trimming_rule='truncate', trimming_threshold=0.01, ps_processor_config: PSProcessorConfig | None = None, draw_sample_splitting=True, print_periods=False)#

Double machine learning for multi-period difference-in-differences models.

Parameters:

obj_dml_data (DoubleMLPanelData object) – The DoubleMLPanelData object providing the data and specifying the variables for the causal model.
ml_g (estimator implementing fit() and predict()) – A machine learner implementing fit() and predict() methods (e.g. sklearn.ensemble.RandomForestRegressor) for the nuisance function \(g_0(0,X) = E[Y_{t_{\text{eval}}}-Y_{t_{\text{pre}}}|X, C_{t_{ ext{eval}} + \delta} = 1]\). For a binary outcome variable \(Y\) (with values 0 and 1), a classifier implementing fit() and predict_proba() can also be specified.
ml_m (classifier implementing fit() and predict_proba()) – A machine learner implementing fit() and predict_proba() methods (e.g. sklearn.ensemble.RandomForestClassifier) for the nuisance function \(m_0(X) = E[D=1|X]\). Only relevant for score='observational'. Default is None.
gt_combinations (array-like or str) – A list of tuples with the group-time combinations to be evaluated. Can be a string with the value 'standard', 'all' or 'universal', which constructs the corresponding combinations automatically. Default is 'standard'.
control_group (str) – Specifies the control group. Either 'never_treated' or 'not_yet_treated'. Default is 'never_treated'.
anticipation_periods (int) – Number of anticipation periods. Default is 0.
n_folds (int) – Number of folds for cross-fitting. Default is 5.
n_rep (int) – Number of repetitions for the sample splitting. Default is 1.
score (str) – A str ('observational' or 'experimental') specifying the score function. The 'experimental' scores refers to an A/B setting, where the treatment is independent from the pretreatment covariates. Default is 'observational'.
panel (bool) – Indicates whether to rely on panel data structure (True) or repeated cross sections (False). Default is True.
in_sample_normalization (bool) – Indicates whether to use in-sample normalization of weights. Default is True.
trimming_rule (str, optional, deprecated) – A str ('truncate' is the only choice) specifying the trimming approach. Default is 'truncate'.
trimming_rule – (DEPRECATED) A str ('truncate' is the only choice) specifying the trimming approach. Use ps_processor_config instead. Will be removed in a future version.
trimming_threshold (float, optional, deprecated) – (DEPRECATED) The threshold used for trimming. Use ps_processor_config instead. Will be removed in a future version.
ps_processor_config (PSProcessorConfig, optional) – Configuration for propensity score processing (clipping, calibration, etc.).
print_periods (bool) – Indicates whether to print information about the evaluated periods. Default is False.

Examples

>>> import numpy as np
>>> import doubleml as dml
>>> from doubleml.did.datasets import make_did_CS2021
>>> from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
>>> np.random.seed(42)
>>> df = make_did_CS2021(n_obs=500)
>>> dml_data = dml.data.DoubleMLPanelData(
...     df,
...     y_col="y",
...     d_cols="d",
...     id_col="id",
...     t_col="t",
...     x_cols=["Z1", "Z2", "Z3", "Z4"],
...     datetime_unit="M"
... )
>>> ml_g = RandomForestRegressor(n_estimators=100, max_depth=5)
>>> ml_m = RandomForestClassifier(n_estimators=100, max_depth=5)
>>> dml_did_obj = dml.did.DoubleMLDIDMulti(
...     obj_dml_data=dml_data,
...     ml_g=ml_g,
...     ml_m=ml_m,
...     gt_combinations="standard",
...     control_group="never_treated",
... )
>>> print(dml_did_obj.fit().summary)  
                                  coef   std err  ...     2.5 %    97.5 %
ATT(2025-03,2025-01,2025-02) -0.797617  0.459617  ... -1.698450  0.103215
ATT(2025-03,2025-02,2025-03)  0.270311  0.456453  ... -0.624320  1.164941
ATT(2025-03,2025-02,2025-04)  0.628213  0.895275  ... -1.126494  2.382919
ATT(2025-03,2025-02,2025-05)  1.281360  1.327121  ... -1.319750  3.882470
ATT(2025-04,2025-01,2025-02) -0.078095  0.407758  ... -0.877287  0.721097
ATT(2025-04,2025-02,2025-03)  0.223625  0.479288  ... -0.715764  1.163013
ATT(2025-04,2025-03,2025-04)  1.008674  0.455564  ...  0.115785  1.901563
ATT(2025-04,2025-03,2025-05)  2.941047  0.832991  ...  1.308415  4.573679
ATT(2025-05,2025-01,2025-02) -0.102282  0.454129  ... -0.992359  0.787795
ATT(2025-05,2025-02,2025-03)  0.108742  0.547794  ... -0.964914  1.182399
ATT(2025-05,2025-03,2025-04)  0.253610  0.422984  ... -0.575423  1.082643
ATT(2025-05,2025-04,2025-05)  1.264255  0.487934  ...  0.307923  2.220587

[12 rows x 6 columns]

Methods

`aggregate`([aggregation])	Aggregates treatment effects.
`bootstrap`([method, n_rep_boot])	Multiplier bootstrap for DoubleML models.
`confint`([joint, level])	Confidence intervals for DoubleML models.
`fit`([n_jobs_models, n_jobs_cv, ...])	Estimate DoubleMLDIDMulti models.
`p_adjust`([method])	Multiple testing adjustment for DoubleML models.
`plot_effects`([level, result_type, joint, ...])	Plots coefficient estimates with confidence intervals over time, grouped by first treated period.
`sensitivity_analysis`([cf_y, cf_d, rho, ...])	Performs a sensitivity analysis to account for unobserved confounders.
`sensitivity_benchmark`(benchmarking_set[, ...])	Computes a benchmark for a given set of features.
`sensitivity_plot`([idx_treatment, value, ...])	Contour plot of the sensivity with respect to latent/confounding variables.
`tune_ml_models`(ml_param_space[, ...])	Hyperparameter-tuning for DoubleML models using Optuna.

Attributes

`all_coef`	Estimates of the causal parameter(s) for the `n_rep` different sample splits after calling `fit()` (shape (`n_gt_atts`, `n_rep`)).
`all_se`	Standard errors of the causal parameter(s) for the `n_rep` different sample splits after calling `fit()` (shape (`n_gt_atts`, `n_rep`)).
`anticipation_periods`	The number of anticipation periods.
`boot_method`	The method to construct the bootstrap replications.
`boot_t_stat`	Bootstrapped t-statistics for the causal parameter(s) after calling `fit()` and `bootstrap()` (shape (`n_rep_boot`, `n_gt_atts`, `n_rep`)).
`coef`	Estimates for the causal parameter(s) after calling `fit()` (shape (`n_gt_atts`,)).
`control_group`	The control group.
`framework`	The corresponding `doubleml.DoubleMLFramework` object.
`g_values`	The values of the treatment variable.
`gt_combinations`	The combinations of g and t values.
`gt_index`	The index of the combinations of g and t values.
`gt_labels`	The evaluated labels of the treatment effects 'ATT(g, t_pre, t_eval)' and the period.
`in_sample_normalization`	Indicates whether the in sample normalization of weights are used.
`modellist`	The list of DoubleMLDIDBinary models.
`n_folds`	Number of folds.
`n_gt_atts`	The number of evaluated combinations of the treatment variable and the period.
`n_rep`	Number of repetitions for the sample splitting.
`n_rep_boot`	The number of bootstrap replications.
`never_treated_value`	The value indicating that a unit was never treated.
`nuisance_loss`	The losses of the nuisance models (root-mean-squared-errors or logloss).
`panel`	Indicates whether to rely on panel data structure (`True`) or repeated cross sections (`False`).
`ps_processor`	Propensity score processor.
`ps_processor_config`	Configuration for propensity score processing (clipping, calibration, etc.).
`pval`	p-values for the causal parameter(s) (shape (`n_gt_atts`,)).
`score`	The score function.
`se`	Standard errors for the causal parameter(s) after calling `fit()` (shape (`n_gt_atts`,)).
`sensitivity_elements`	Values of the sensitivity components after calling `fit()`; If available (e.g., PLR, IRM) a dictionary with entries `sigma2`, `nu2`, `psi_sigma2`, `psi_nu2` and `riesz_rep`.
`sensitivity_params`	Values of the sensitivity parameters after calling `sesitivity_analysis()`; If available (e.g., PLR, IRM) a dictionary with entries `theta`, `se`, `ci`, `rv` and `rva`.
`sensitivity_summary`	Returns a summary for the sensitivity analysis after calling `sensitivity_analysis()`.
`summary`	A summary for the estimated causal effect after calling `fit()`.
`t_stat`	t-statistics for the causal parameter(s) after calling `fit()` (shape (`n_gt_atts`,)).
`t_values`	The values of the time periods.
`trimming_rule`	Specifies the used trimming rule.
`trimming_threshold`	Specifies the used trimming threshold.

DoubleMLDIDMulti.aggregate(aggregation='group')#

Aggregates treatment effects.

Parameters:: aggregation (str or dict) – Method to aggregate treatment effects or dictionary with aggregation weights (masked numpy array). Has to one of 'group', 'time', 'eventstudy' or a masked numpy array. Default is 'group'.
Returns:: Aggregated treatment effects framework
Return type:: DoubleMLFramework

DoubleMLDIDMulti.bootstrap(method='normal', n_rep_boot=500)#

Multiplier bootstrap for DoubleML models.

Parameters:

method (str) – A str ('Bayes', 'normal' or 'wild') specifying the multiplier bootstrap method. Default is 'normal'
n_rep_boot (int) – The number of bootstrap replications.

Returns:

self

Return type:

object

DoubleMLDIDMulti.confint(joint=False, level=0.95)#

Confidence intervals for DoubleML models.

Parameters:

joint (bool) – Indicates whether joint confidence intervals are computed. Default is False
level (float) – The confidence level. Default is 0.95.

Returns:

df_ci – A data frame with the confidence interval(s).

Return type:

pd.DataFrame

DoubleMLDIDMulti.fit(n_jobs_models=None, n_jobs_cv=None, store_predictions=True, store_models=False, external_predictions=None)#

Estimate DoubleMLDIDMulti models.

Parameters:

n_jobs_models (None or int) – The number of CPUs to use to fit the group-time ATTs. None means 1. Default is None.
n_jobs_cv (None or int) – The number of CPUs to use to fit the learners. None means 1. Does not speed up computation for quantile models. Default is None.
store_predictions (bool) – Indicates whether the predictions for the nuisance functions should be stored in predictions. Default is True.
store_models (bool) – Indicates whether the fitted models for the nuisance functions should be stored in models. This allows to analyze the fitted models or extract information like variable importance. Default is False.
external_predictions (dict or None) – A nested dictionary where the keys correspond the the treatment levels and can contain predictions according to each treatment level. The values have to be dictionaries which can contain keys 'ml_g0', 'ml_g1' and 'ml_m'. Default is None.

Returns:

self

Return type:

object

DoubleMLDIDMulti.p_adjust(method='romano-wolf')#

Multiple testing adjustment for DoubleML models.

Parameters:: method (str) – A str ('romano-wolf'', 'bonferroni', 'holm', etc) specifying the adjustment method. In addition to 'romano-wolf'', all methods implemented in statsmodels.stats.multitest.multipletests() can be applied. Default is 'romano-wolf'.
Returns:: p_val – A data frame with adjusted p-values.
Return type:: pd.DataFrame

DoubleMLDIDMulti.plot_effects(level=0.95, result_type='effect', joint=True, figsize=(12, 8), color_palette='colorblind', date_format=None, y_label=None, title=None, jitter_value=None, default_jitter=0.1)#

Plots coefficient estimates with confidence intervals over time, grouped by first treated period.

Parameters:

level (float) – The confidence level for the intervals. Default is 0.95.
result_type (str) – Type of result to plot. Either 'effect' for point estimates, 'rv' for robustness values, 'est_bounds' for estimate bounds, or 'ci_bounds' for confidence interval bounds. Default is 'effect'.
joint (bool) – Indicates whether joint confidence intervals are computed. Default is True.
figsize (tuple) – Figure size as (width, height). Default is (12, 8).
color_palette (str) – Name of seaborn color palette to use for distinguishing pre and post treatment effects. Default is "colorblind".
date_format (str) – Format string for date ticks if x-axis contains datetime values. Default is None.
y_label (str) – Label for y-axis. Default is None.
title (str) – Title for the entire plot. Default is None.
jitter_value (float) – Amount of jitter to apply to points. Default is None.
default_jitter (float) – Default amount of jitter to apply to points. Default is 0.1.

Returns:

fig (matplotlib.figure.Figure) – The created figure object
axes (list) – List of matplotlib axis objects for further customization

Notes

If joint=True and bootstrapping hasn’t been performed, this method will automatically perform bootstrapping with default parameters and issue a warning.

DoubleMLDIDMulti.sensitivity_analysis(cf_y=0.03, cf_d=0.03, rho=1.0, level=0.95, null_hypothesis=0.0)#

Performs a sensitivity analysis to account for unobserved confounders. The evaluated scenario is stored as a dictionary in the property sensitivity_params.

Parameters:

cf_y (float) – Percentage of the residual variation of the outcome explained by latent/confounding variables. Default is 0.03.
cf_d (float) – Percentage gains in the variation of the Riesz representer generated by latent/confounding variables. Default is 0.03.
rho (float) – The correlation between the differences in short and long representations in the main regression and Riesz representer. Has to be in [-1,1]. The absolute value determines the adversarial strength of the confounding (maximizes at 1.0). Default is 1.0.
level (float) – The confidence level. Default is 0.95.
null_hypothesis (float or numpy.ndarray) – Null hypothesis for the effect. Determines the robustness values. If it is a single float uses the same null hypothesis for all estimated parameters. Else the array has to be of shape (n_coefs,). Default is 0.0.

Returns:

self

Return type:

object

DoubleMLDIDMulti.sensitivity_benchmark(benchmarking_set, fit_args=None)#

Computes a benchmark for a given set of features. Returns a DataFrame containing the corresponding values for cf_y, cf_d, rho and the change in estimates.

Returns:: benchmark_results – Benchmark results.
Return type:: pandas.DataFrame

DoubleMLDIDMulti.sensitivity_plot(idx_treatment=0, value='theta', rho=1.0, level=0.95, null_hypothesis=0.0, include_scenario=True, benchmarks=None, fill=True, grid_bounds=(0.15, 0.15), grid_size=100)#

Contour plot of the sensivity with respect to latent/confounding variables.

Parameters:

idx_gt_atte (int) – Index of the treatment to perform the sensitivity analysis. Default is 0.
value (str) – Determines which contours to plot. Valid values are 'theta' (refers to the bounds) and 'ci' (refers to the bounds including statistical uncertainty). Default is 'theta'.
rho (float) – The correlation between the differences in short and long representations in the main regression and Riesz representer. Has to be in [-1,1]. The absolute value determines the adversarial strength of the confounding (maximizes at 1.0). Default is 1.0.
level (float) – The confidence level. Default is 0.95.
null_hypothesis (float) – Null hypothesis for the effect. Determines the direction of the contour lines.
include_scenario (bool) – Indicates whether to highlight the scenario from the call of sensitivity_analysis(). Default is True.
benchmarks (dict or None) – Dictionary of benchmarks to be included in the plot. The keys are cf_y, cf_d and name. Default is None.
fill (bool) – Indicates whether to use a heatmap style or only contour lines. Default is True.
grid_bounds (tuple) – Determines the evaluation bounds of the grid for cf_d and cf_y. Has to contain two floats in [0, 1). Default is (0.15, 0.15).
grid_size (int) – Determines the number of evaluation points of the grid. Default is 100.

Returns:

fig – Plotly figure of the sensitivity contours.

Return type:

object

DoubleMLDIDMulti.tune_ml_models(ml_param_space, scoring_methods=None, cv=5, set_as_params=True, return_tune_res=False, optuna_settings=None)#

Hyperparameter-tuning for DoubleML models using Optuna.

The hyperparameter-tuning is performed using Optuna’s Bayesian optimization. Unlike grid/randomized search, Optuna tuning is performed once on the whole dataset using cross-validation, and the same optimal hyperparameters are used for all folds.

Parameters:

ml_param_space (dict) –
A dict with a parameter grid function for each nuisance model (see attribute params_names) or for each learner (see attribute learner_names). Mixed specification are allowed, i.e., some nuisance models can share the same learner. For mixed specifications, learner-specific settings will be overwritten by nuisance model-specific settings.

Each parameter grid must be specified as a callable function that takes an Optuna trial and returns a dictionary of hyperparameters.

For PLR models, keys should be: 'ml_l', 'ml_m' (and optionally 'ml_g' for IV-type score). For IRM models, keys should be: 'ml_g0', 'ml_g1' (or just 'ml_g' for both), 'ml_m'.

Example:
```
def ml_l_params(trial):
    return {
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
        'n_estimators': trial.suggest_int('n_estimators', 100, 500, step=50),
        'num_leaves': trial.suggest_int('num_leaves', 20, 256),
        'min_child_samples': trial.suggest_int('min_child_samples', 5, 100),
    }

ml_param_space = {'ml_l': ml_l_params, 'ml_m': ml_m_params}
```
Note: Optuna tuning is performed globally (not fold-specific) to ensure consistent hyperparameters across all folds.
scoring_methods (None or dict) – The scoring method used to evaluate the predictions. The scoring method must be set per nuisance model via a dict (see attribute params_names for the keys). If None, the estimator’s score method is used. Default is None.
cv (int, cross-validation splitter, or iterable of (train_indices, test_indices)) – Cross-validation strategy used for Optuna-based tuning. If an integer is provided, a shuffled sklearn.model_selection.KFold with the specified number of splits and random_state=42 is used. Custom splitters must implement split (and ideally get_n_splits), or be an iterable yielding (train_indices, test_indices) pairs. Default is 5.
set_as_params (bool) – Indicates whether the hyperparameters should be set in order to be used when fit() is called. Default is True.
return_tune_res (bool) – Indicates whether detailed tuning results should be returned. Default is False.
optuna_settings (None or dict) –
Optional configuration passed to the Optuna tuner. Supports global settings as well as learner-specific overrides (using the keys from ml_param_space). The dictionary can contain entries corresponding to Optuna’s study and optimize configuration such as:
- n_trials (int): Number of optimization trials (default: 100)
- timeout (float): Time limit in seconds for the study (default: None)
- direction (str): Optimization direction, ‘maximize’ or ‘minimize’. For sklearn scorers, use ‘maximize’ for negative metrics like ‘neg_mean_squared_error’ (since -0.1 > -0.2 means better performance). Can be set globally or per learner. (default: ‘maximize’)
- sampler (optuna.samplers.BaseSampler): Optuna sampler instance (default: None, uses TPE)
- callbacks (list): List of callback functions (default: None)
- show_progress_bar (bool): Show progress bar during optimization (default: False)
- n_jobs_optuna (int): Number of parallel trials (default: None)
- verbosity (int): Optuna logging verbosity level (default: None)
- study (optuna.study.Study): Pre-created study instance (default: None)
- study_kwargs (dict): Additional kwargs for study creation (default: {})
- optimize_kwargs (dict): Additional kwargs for study.optimize() (default: {})
To set direction per learner (similar to scoring_methods):
```
optuna_settings = {
    'n_trials': 50,
    'direction': 'maximize',  # Global default
    'ml_g0': {'direction': 'maximize'},  # Per-learner override
    'ml_m': {'n_trials': 100, 'direction': 'maximize'}
}
```
Defaults to None.

Returns:

self (object) – Returned if return_tune_res is False.
tune_res (list) – A list containing detailed tuning results and the proposed hyperparameters. Returned if return_tune_res is True.

Examples

>>> import numpy as np
>>> from doubleml import DoubleMLData, DoubleMLPLR
>>> from doubleml.plm.datasets import make_plr_CCDDHNR2018
>>> from lightgbm import LGBMRegressor
>>> import optuna
>>> # Generate data
>>> np.random.seed(42)
>>> data = make_plr_CCDDHNR2018(n_obs=500, dim_x=20, return_type='DataFrame')
>>> dml_data = DoubleMLData(data, 'y', 'd')
>>> # Initialize model
>>> dml_plr = DoubleMLPLR(
...    dml_data,
...    LGBMRegressor(n_estimators=50, verbose=-1, random_state=42),
...    LGBMRegressor(n_estimators=50, verbose=-1, random_state=42)
... )
>>> # Define parameter grid functions
>>> def ml_l_params(trial):
...     return {
...         'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
...     }
>>> def ml_m_params(trial):
...     return {
...         'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
...     }
>>> ml_param_space = {'ml_l': ml_l_params, 'ml_m': ml_m_params}
>>> # Tune with TPE sampler
>>> optuna_settings = {
...     'n_trials': 5,
...     'sampler': optuna.samplers.TPESampler(seed=42),
... }
>>> tune_res = dml_plr.tune_ml_models(ml_param_space, optuna_settings=optuna_settings, return_tune_res=True)
>>> print(tune_res[0]['ml_l'].best_params)  
{'learning_rate': 0.03907122389107094}
>>> # Fit and get results
>>> dml_plr.fit().summary 
      coef   std err          t         P>|t|     2.5 %    97.5 %
d  0.57436  0.045206  12.705519  5.510257e-37  0.485759  0.662961
>>> # Example with scoring methods and directions
>>> scoring_methods = {
...     'ml_l': 'neg_mean_squared_error',  # Negative metric
...     'ml_m': 'neg_mean_squared_error'
... }
>>> optuna_settings = {
...     'n_trials': 50,
...     'direction': 'maximize',  # Maximize negative MSE (minimize MSE)
...     'sampler': optuna.samplers.TPESampler(seed=42),
... }
>>> tune_res = dml_plr.tune_ml_models(ml_param_space, scoring_methods=scoring_methods,
...                                   optuna_settings=optuna_settings, return_tune_res=True)
>>> print(tune_res[0]['ml_l'].best_params)  
{'learning_rate': 0.04300012336462904}
>>> dml_plr.fit().summary 
       coef   std err          t         P>|t|     2.5 %    97.5 %
d  0.574796  0.045062  12.755721  2.896820e-37  0.486476  0.663115