API reference#

Double machine learning data class#

DoubleMLData(data, y_col, d_cols[, x_cols, ...])

Double machine learning data-backend.

DoubleMLClusterData(data, y_col, d_cols, ...)

Double machine learning data-backend for data with cluster variables.

Double machine learning models#

DoubleMLPLR(obj_dml_data, ml_l, ml_m[, ...])

Double machine learning for partially linear regression models

DoubleMLPLIV(obj_dml_data, ml_l, ml_m, ml_r)

Double machine learning for partially linear IV regression models

DoubleMLIRM(obj_dml_data, ml_g, ml_m[, ...])

Double machine learning for interactive regression models

DoubleMLAPO(obj_dml_data, ml_g, ml_m, ...[, ...])

Double machine learning average potential outcomes for interactive regression models.

DoubleMLAPOS(obj_dml_data, ml_g, ml_m, ...)

Double machine learning for interactive regression models with multiple discrete treatments.

DoubleMLIIVM(obj_dml_data, ml_g, ml_m, ml_r)

Double machine learning for interactive IV regression models

DoubleMLDID(obj_dml_data, ml_g[, ml_m, ...])

Double machine learning for difference-in-differences models with panel data (two time periods).

DoubleMLDIDCS(obj_dml_data, ml_g[, ml_m, ...])

Double machine learning for difference-in-difference with repeated cross-sections.

DoubleMLSSM(obj_dml_data, ml_g, ml_pi, ml_m)

Double machine learning for sample selection models

DoubleMLPQ(obj_dml_data, ml_g, ml_m[, ...])

Double machine learning for potential quantiles

DoubleMLLPQ(obj_dml_data, ml_g, ml_m[, ...])

Double machine learning for local potential quantiles

DoubleMLCVAR(obj_dml_data, ml_g, ml_m[, ...])

Double machine learning for conditional value at risk for potential outcomes

DoubleMLQTE(obj_dml_data, ml_g[, ml_m, ...])

Double machine learning for quantile treatment effects

Other models#

rdd.RDFlex(obj_dml_data, ml_g[, ml_m, ...])

Flexible adjustment with double machine learning for regression discontinuity designs

Datasets module#

Dataset loaders#

datasets.fetch_401K([return_type, ...])

Data set on financial wealth and 401(k) plan participation.

datasets.fetch_bonus([return_type, ...])

Data set on the Pennsylvania Reemployment Bonus experiment.

Dataset generators#

datasets.make_plr_CCDDHNR2018([n_obs, ...])

Generates data from a partially linear regression model used in Chernozhukov et al. (2018) for Figure 1.

datasets.make_pliv_CHS2015(n_obs[, alpha, ...])

Generates data from a partially linear IV regression model used in Chernozhukov, Hansen and Spindler (2015).

datasets.make_irm_data([n_obs, dim_x, ...])

Generates data from a interactive regression (IRM) model.

datasets.make_iivm_data([n_obs, dim_x, ...])

Generates data from a interactive IV regression (IIVM) model.

datasets.make_plr_turrell2018([n_obs, ...])

Generates data from a partially linear regression model used in a blog article by Turrell (2018).

datasets.make_pliv_multiway_cluster_CKMS2021([...])

Generates data from a partially linear IV regression model with multiway cluster sample used in Chiang et al. (2021).

datasets.make_did_SZ2020([n_obs, dgp_type, ...])

Generates data from a difference-in-differences model used in Sant'Anna and Zhao (2020).

datasets.make_ssm_data([n_obs, dim_x, ...])

Generates data from a sample selection model (SSM).

datasets.make_confounded_plr_data([n_obs, ...])

Generates counfounded data from an partially linear regression model.

datasets.make_confounded_irm_data([n_obs, ...])

Generates counfounded data from an interactive regression model.

datasets.make_heterogeneous_data([n_obs, p, ...])

Creates a simple synthetic example for heterogeneous treatment effects.

datasets.make_irm_data_discrete_treatments([...])

Generates data from a interactive regression (IRM) model with multiple treatment levels (based on an underlying continous treatment).

rdd.datasets.make_simple_rdd_data([n_obs, ...])

Generates synthetic data for a regression discontinuity design (RDD) analysis.

Utility classes and functions#

Utility classes#

utils.DMLDummyRegressor()

A dummy regressor that raises an AttributeError when attempting to access its fit, predict, or set_params methods.

utils.DMLDummyClassifier()

A dummy classifier that raises an AttributeError when attempting to access its fit, predict, set_params, or predict_proba methods.

utils.DoubleMLBLP(orth_signal, basis[, is_gate])

Best linear predictor (BLP) for DoubleML with orthogonal signals.

utils.DoubleMLPolicyTree(orth_signal, features)

Policy Tree fitting for DoubleML.

utils.GlobalRegressor(base_estimator)

A global regressor that ignores the attribute sample_weight when being fit to ensure a global fit.

utils.GlobalClassifier(base_estimator)

A global classifier that ignores the attribute sample_weight when being fit to ensure a global fit.

Utility functions#

utils.gain_statistics(dml_long, dml_short)

Compute gain statistics as benchmark values for sensitivity parameters cf_d and cf_y.

Score mixin classes for double machine learning models#

double_ml_score_mixins.LinearScoreMixin()

Mixin class implementing DML estimation for score functions being linear in the target parameter

double_ml_score_mixins.NonLinearScoreMixin()

Mixin class implementing DML estimation for score functions being nonlinear in the target parameter