{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# DoubleML meets FLAML - How to tune learners automatically within `DoubleML`\n", "\n", "Recent advances in automated machine learning make it easier to tune hyperparameters of ML estimators automatically. These optimized learners can be used for the estimation part within DoubleML. In this notebook we are going to explore how to tune learners with AutoML for the DoubleML framework.\n", "\n", "This notebook will use [`FLAML`](https://github.com/microsoft/FLAML), but there are also many other AutoML frameworks. Particularly useful for DoubleML are packages that provide some way to export the model in `sklearn`-style.\n", "\n", "Examples are: [`TPOT`](https://epistasislab.github.io/tpot/), [`autosklearn`](https://automl.github.io/auto-sklearn/master/), [`H20`](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html) or [`Gama`](https://openml-labs.github.io/gama/master/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Generation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We create synthetic data using the [``make_plr_CCDDHNR2018()``](https://docs.doubleml.org/stable/api/generated/doubleml.datasets.make_plr_CCDDHNR2018.html) process, with $1000$ observations of $50$ covariates as well as $1$ treatment variable and an outcome. We calibrate the process such that hyperparameter tuning becomes more important." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
X1X2X3X4X5X6X7X8X9X10...X43X44X45X46X47X48X49X50yd
01.0653681.1625931.0899640.8246570.157733-1.228404-0.675775-0.2239280.1662380.124480...-2.021823-1.662975-2.100385-1.225670-1.2231580.397536-0.4500310.5112570.845534-0.784792
10.2144581.6996163.2228823.5502422.6924601.8219701.223617-0.100154-0.2344310.375844...-0.695711-0.819507-1.465424-0.341472-0.0235370.436016-0.503374-1.3426321.9873070.835035
20.725820-0.310145-0.586921-0.8790580.2392670.6384610.1310240.459436-1.140081-0.583692...-0.0023880.7168010.0759421.4399580.674747-0.2683430.6821220.9783030.154890-0.168089
30.2657440.4796550.0133131.4177360.9087671.7860900.996892-0.026822-0.8672010.433753...-0.482616-0.172628-0.309539-0.609522-0.830263-0.883953-1.249986-2.6886411.2540350.161288
41.5818270.9269012.3023820.803112-0.152896-0.389164-0.569590-0.1243060.055439-0.383531...0.048220-0.698751-0.754678-0.6896000.7266580.7800681.4755170.7777181.7737691.786563
\n", "

5 rows × 52 columns

\n", "
" ], "text/plain": [ " X1 X2 X3 X4 X5 X6 X7 \\\n", "0 1.065368 1.162593 1.089964 0.824657 0.157733 -1.228404 -0.675775 \n", "1 0.214458 1.699616 3.222882 3.550242 2.692460 1.821970 1.223617 \n", "2 0.725820 -0.310145 -0.586921 -0.879058 0.239267 0.638461 0.131024 \n", "3 0.265744 0.479655 0.013313 1.417736 0.908767 1.786090 0.996892 \n", "4 1.581827 0.926901 2.302382 0.803112 -0.152896 -0.389164 -0.569590 \n", "\n", " X8 X9 X10 ... X43 X44 X45 X46 \\\n", "0 -0.223928 0.166238 0.124480 ... -2.021823 -1.662975 -2.100385 -1.225670 \n", "1 -0.100154 -0.234431 0.375844 ... -0.695711 -0.819507 -1.465424 -0.341472 \n", "2 0.459436 -1.140081 -0.583692 ... -0.002388 0.716801 0.075942 1.439958 \n", "3 -0.026822 -0.867201 0.433753 ... -0.482616 -0.172628 -0.309539 -0.609522 \n", "4 -0.124306 0.055439 -0.383531 ... 0.048220 -0.698751 -0.754678 -0.689600 \n", "\n", " X47 X48 X49 X50 y d \n", "0 -1.223158 0.397536 -0.450031 0.511257 0.845534 -0.784792 \n", "1 -0.023537 0.436016 -0.503374 -1.342632 1.987307 0.835035 \n", "2 0.674747 -0.268343 0.682122 0.978303 0.154890 -0.168089 \n", "3 -0.830263 -0.883953 -1.249986 -2.688641 1.254035 0.161288 \n", "4 0.726658 0.780068 1.475517 0.777718 1.773769 1.786563 \n", "\n", "[5 rows x 52 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "from doubleml.datasets import make_plr_CCDDHNR2018\n", "import doubleml as dml\n", "from flaml import AutoML\n", "from xgboost import XGBRegressor\n", "\n", "# Generate synthetic data\n", "data = make_plr_CCDDHNR2018(alpha=0.5, n_obs=1000, dim_x=50, return_type=\"DataFrame\", a0=0, a1=1, s1=0.25, s2=0.25)\n", "data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tuning on the full Sample\n", "\n", "In this section, we manually tune two [XGBoost](https://xgboost.readthedocs.io/en/stable/) models using FLAML for a [Partially Linear Regression Model](https://docs.doubleml.org/stable/guide/models.html#partially-linear-regression-model-plr). In the PLR (using the default score) we have to estimate a nuisance $\\eta$ consisting of\n", "\n", "$$\\eta := \\{m_0(x), \\ell_0(x)\\} = \\{\\mathbb{E}[D|X], \\mathbb{E}[Y|X]\\}.$$\n", "\n", "We initialize two `FLAML` AutoML objects and fit them accordingly. Once the tuning has been completed, we pass the learners to `DoubleML`.\n", "\n", "#### Step 1: Initialize and Train the AutoML Models:\n", "\n", "*Note: This cell will optimize the nuisance models for 4 minutes in total.*" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Initialize AutoML for outcome model (ml_l): Predict Y based on X\n", "automl_l = AutoML()\n", "settings_l = {\n", " \"time_budget\": 120,\n", " \"metric\": 'rmse',\n", " \"estimator_list\": ['xgboost'],\n", " \"task\": 'regression',\n", "}\n", "automl_l.fit(X_train=data.drop(columns=[\"y\", \"d\"]).values, y_train=data[\"y\"].values, verbose=2, **settings_l)\n", "\n", "# Initialize AutoML for treatment model (ml_m): Predict D based on X\n", "automl_m = AutoML()\n", "settings_m = {\n", " \"time_budget\": 120,\n", " \"metric\": 'rmse',\n", " \"estimator_list\": ['xgboost'],\n", " \"task\": 'regression',\n", "}\n", "automl_m.fit(X_train=data.drop(columns=[\"y\", \"d\"]).values, y_train=data[\"d\"].values, verbose=2, **settings_m)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 2: Evaluate the Tuned Models \n", "\n", "`FLAML` reports the best loss during training as `best_loss` attribute. For more details, we refer to the [FLAML documentation](https://microsoft.github.io/FLAML/docs/Getting-Started)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Best RMSE during tuning (ml_m): 1.0078540263583833\n", "Best RMSE during tuning (ml_l): 1.1155142425200442\n" ] } ], "source": [ "rmse_oos_ml_m = automl_m.best_loss\n", "rmse_oos_ml_l = automl_l.best_loss\n", "print(\"Best RMSE during tuning (ml_m):\",rmse_oos_ml_m)\n", "print(\"Best RMSE during tuning (ml_l):\",rmse_oos_ml_l)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 3: Create and Fit DoubleML Model\n", "\n", "We create a `DoubleMLData` object with the dataset, specifying $y$ as the outcome variable and $d$ as the treatment variable. We then initialize a `DoubleMLPLR` model using the tuned `FLAML` estimators for both the treatment and outcome components. `DoubleML` will use copies with identical configurations on each fold." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " coef std err t P>|t| 2.5 % 97.5 %\n", "d 0.498286 0.032738 15.220407 2.589147e-52 0.434121 0.562452\n" ] } ], "source": [ "obj_dml_data = dml.DoubleMLData(data, \"y\", \"d\")\n", "\n", "obj_dml_plr_fullsample = dml.DoubleMLPLR(obj_dml_data, ml_m=automl_m.model.estimator,\n", " ml_l=automl_l.model.estimator)\n", "\n", "print(obj_dml_plr_fullsample.fit().summary)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`DoubleML`'s built-in learner evaluation reports the out-of-sample error during cross-fitting. We can compare this measure to the best loss during training from above." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "RMSE evaluated by DoubleML (ml_m): 1.0156853566737638\n", "RMSE evaluated by DoubleML (ml_l): 1.1309844442144665\n" ] } ], "source": [ "rmse_dml_ml_l_fullsample = obj_dml_plr_fullsample.evaluate_learners()['ml_l'][0][0]\n", "rmse_dml_ml_m_fullsample = obj_dml_plr_fullsample.evaluate_learners()['ml_m'][0][0]\n", "\n", "print(\"RMSE evaluated by DoubleML (ml_m):\", rmse_dml_ml_m_fullsample)\n", "print(\"RMSE evaluated by DoubleML (ml_l):\", rmse_dml_ml_l_fullsample)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The best RMSE during automated tuning and the out-of-sample error in nuisance prediction are similar, which hints that there is no overfitting. We don't expect large amounts of overfitting, since FLAML uses cross-validation internally and reports the best loss on a hold-out sample." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tuning on the Folds\n", "\n", "Instead of externally tuning the `FLAML` learners, it is also possible to tune the AutoML learners internally. We have to define custom classes for integrating `FLAML` to `DoubleML`. The tuning will be automatically be started when calling `DoubleML`'s `fit()` method. Training will occure $K$ times, so each fold will have an individualized optimal set of hyperparameters.\n", "\n", "#### Step 1: Custom API for FLAML Models within `DoubleML`\n", "\n", "The following API is designed to facilitate automated machine learning model tuning for both regression and classification tasks. In this example however, we will only need the Regressor API as the treatment is continous." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from sklearn.utils.multiclass import unique_labels\n", "\n", "class FlamlRegressorDoubleML:\n", " _estimator_type = 'regressor'\n", "\n", " def __init__(self, time, estimator_list, metric, *args, **kwargs):\n", " self.auto_ml = AutoML(*args, **kwargs)\n", " self.time = time\n", " self.estimator_list = estimator_list\n", " self.metric = metric\n", "\n", " def set_params(self, **params):\n", " self.auto_ml.set_params(**params)\n", " return self\n", "\n", " def get_params(self, deep=True):\n", " dict = self.auto_ml.get_params(deep)\n", " dict[\"time\"] = self.time\n", " dict[\"estimator_list\"] = self.estimator_list\n", " dict[\"metric\"] = self.metric\n", " return dict\n", "\n", " def fit(self, X, y):\n", " self.auto_ml.fit(X, y, task=\"regression\", time_budget=self.time, estimator_list=self.estimator_list, metric=self.metric, verbose=False)\n", " self.tuned_model = self.auto_ml.model.estimator\n", " return self\n", "\n", " def predict(self, x):\n", " preds = self.tuned_model.predict(x)\n", " return preds\n", " \n", "class FlamlClassifierDoubleML:\n", " _estimator_type = 'classifier'\n", "\n", " def __init__(self, time, estimator_list, metric, *args, **kwargs):\n", " self.auto_ml = AutoML(*args, **kwargs)\n", " self.time = time\n", " self.estimator_list = estimator_list\n", " self.metric = metric\n", "\n", " def set_params(self, **params):\n", " self.auto_ml.set_params(**params)\n", " return self\n", "\n", " def get_params(self, deep=True):\n", " dict = self.auto_ml.get_params(deep)\n", " dict[\"time\"] = self.time\n", " dict[\"estimator_list\"] = self.estimator_list\n", " dict[\"metric\"] = self.metric\n", " return dict\n", "\n", " def fit(self, X, y):\n", " self.classes_ = unique_labels(y)\n", " self.auto_ml.fit(X, y, task=\"classification\", time_budget=self.time, estimator_list=self.estimator_list, metric=self.metric, verbose=False)\n", " self.tuned_model = self.auto_ml.model.estimator\n", " return self\n", "\n", " def predict_proba(self, x):\n", " preds = self.tuned_model.predict_proba(x)\n", " return preds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 2: Using the API when calling `DoubleML`'s `.fit()` Method\n", "\n", "We initialize a `FlamlRegressorDoubleML` and hand it without fitting into the DoubleML object. When calling `.fit()` on the DoubleML object, copies of the API object will be created on the folds and a seperate set of hyperparameters is created. Since we fit $K$ times, we reduce the computation time accordingly to ensure comparibility to the full sample case." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " coef std err t P>|t| 2.5 % 97.5 %\n", "d 0.502016 0.033265 15.091263 1.848688e-51 0.436817 0.567215\n" ] } ], "source": [ "# Define the FlamlRegressorDoubleML\n", "ml_l = FlamlRegressorDoubleML(time=24, estimator_list=['xgboost'], metric='rmse')\n", "ml_m = FlamlRegressorDoubleML(time=24, estimator_list=['xgboost'], metric='rmse')\n", "\n", "# Create DoubleMLPLR object using the new regressors\n", "dml_plr_obj_onfolds = dml.DoubleMLPLR(obj_dml_data, ml_m, ml_l)\n", "\n", "# Fit the DoubleMLPLR model\n", "print(dml_plr_obj_onfolds.fit(store_models=True).summary)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Best RMSE during tuning (ml_m): 1.0068101213851626\n", "Best RMSE during tuning (ml_l): 1.1151610541568202\n", "RMSE evaluated by DoubleML (ml_m): 1.0084871742256079\n", "RMSE evaluated by DoubleML (ml_l): 1.1272404618426184\n" ] } ], "source": [ "rmse_oos_onfolds_ml_l = np.mean([dml_plr_obj_onfolds.models[\"ml_l\"][\"d\"][0][i].auto_ml.best_loss for i in range(5)])\n", "rmse_oos_onfolds_ml_m = np.mean([dml_plr_obj_onfolds.models[\"ml_m\"][\"d\"][0][i].auto_ml.best_loss for i in range(5)])\n", "print(\"Best RMSE during tuning (ml_m):\",rmse_oos_onfolds_ml_m)\n", "print(\"Best RMSE during tuning (ml_l):\",rmse_oos_onfolds_ml_l)\n", "\n", "rmse_dml_ml_l_onfolds = dml_plr_obj_onfolds.evaluate_learners()['ml_l'][0][0]\n", "rmse_dml_ml_m_onfolds = dml_plr_obj_onfolds.evaluate_learners()['ml_m'][0][0]\n", "\n", "print(\"RMSE evaluated by DoubleML (ml_m):\", rmse_dml_ml_m_onfolds)\n", "print(\"RMSE evaluated by DoubleML (ml_l):\", rmse_dml_ml_l_onfolds)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similar to the above case, we see no hints for overfitting." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparison to AutoML with less Computation time and Untuned XGBoost Learners\n", "\n", "#### AutoML with less Computation time\n", "\n", "As a baseline, we can compare the learners above that have been tuned using two minutes of training time each with ones that only use ten seconds.\n", "\n", "Note: These tuning times are examples. For this setting, we found 10s to be insuffienct and 120s to be sufficient. In general, necessary tuning time can depend on data complexity, data set size, computational power of the machine used, etc.. For more info on how to use ``FLAML`` properly please refer to [the documentation](https://microsoft.github.io/FLAML/docs/Getting-Started/) and [the paper](https://arxiv.org/pdf/1911.04706)." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Initialize AutoML for outcome model similar to above, but use a smaller time budget.\n", "automl_l_lesstime = AutoML()\n", "settings_l = {\n", " \"time_budget\": 10,\n", " \"metric\": 'rmse',\n", " \"estimator_list\": ['xgboost'],\n", " \"task\": 'regression',\n", "}\n", "automl_l_lesstime.fit(X_train=data.drop(columns=[\"y\", \"d\"]).values, y_train=data[\"y\"].values, verbose=2, **settings_l)\n", "\n", "# Initialize AutoML for treatment model similar to above, but use a smaller time budget.\n", "automl_m_lesstime = AutoML()\n", "settings_m = {\n", " \"time_budget\": 10,\n", " \"metric\": 'rmse',\n", " \"estimator_list\": ['xgboost'],\n", " \"task\": 'regression',\n", "}\n", "automl_m_lesstime.fit(X_train=data.drop(columns=[\"y\", \"d\"]).values, y_train=data[\"d\"].values, verbose=2, **settings_m)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " coef std err t P>|t| 2.5 % 97.5 %\n", "d 0.436394 0.031007 14.073929 5.493102e-45 0.375621 0.497168\n" ] } ], "source": [ "obj_dml_plr_lesstime = dml.DoubleMLPLR(obj_dml_data, ml_m=automl_m_lesstime.model.estimator,\n", " ml_l=automl_l_lesstime.model.estimator)\n", "\n", "print(obj_dml_plr_lesstime.fit().summary)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can check the performance again." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Best RMSE during tuning (ml_m): 0.9158080176561963\n", "Best RMSE during tuning (ml_l): 1.2197237644227434\n", "RMSE evaluated by DoubleML (ml_m): 1.0739130271918385\n", "RMSE evaluated by DoubleML (ml_l): 1.1362430723104844\n" ] } ], "source": [ "rmse_dml_ml_l_lesstime = obj_dml_plr_lesstime.evaluate_learners()['ml_l'][0][0]\n", "rmse_dml_ml_m_lesstime = obj_dml_plr_lesstime.evaluate_learners()['ml_m'][0][0]\n", "\n", "\n", "print(\"Best RMSE during tuning (ml_m):\", automl_m_lesstime.best_loss)\n", "print(\"Best RMSE during tuning (ml_l):\", automl_l_lesstime.best_loss)\n", "print(\"RMSE evaluated by DoubleML (ml_m):\", rmse_dml_ml_m_lesstime)\n", "print(\"RMSE evaluated by DoubleML (ml_l):\", rmse_dml_ml_l_lesstime)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see a more severe difference in oos RMSE between AutoML and DML estimations. This could hint that the learner underfits, i.e. training time was not sufficient." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Untuned (default parameter) XGBoost\n", "\n", "As another baseline, we set up DoubleML with an XGBoost learner that has not been tuned at all, i.e. using the default set of hyperparameters." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "xgb_untuned_m, xgb_untuned_l = XGBRegressor(), XGBRegressor()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " coef std err t P>|t| 2.5 % 97.5 %\n", "d 0.431253 0.03258 13.236884 5.373218e-40 0.367398 0.495108\n" ] } ], "source": [ "# Create DoubleMLPLR object using AutoML models\n", "dml_plr_obj_untuned = dml.DoubleMLPLR(obj_dml_data, xgb_untuned_l, xgb_untuned_m)\n", "print(dml_plr_obj_untuned.fit().summary)\n", "\n", "rmse_dml_ml_l_untuned = dml_plr_obj_untuned.evaluate_learners()['ml_l'][0][0]\n", "rmse_dml_ml_m_untuned = dml_plr_obj_untuned.evaluate_learners()['ml_m'][0][0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparison and summary\n", "\n", "We combine the summaries from various models: full-sample and on-the-folds tuned AutoML, untuned XGB, and dummy models." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coefstd errtP>|t|2.5 %97.5 %
Model TypeMetric
Full Sampled0.4982860.03273815.2204072.589147e-520.4341210.562452
On the foldsd0.5020160.03326515.0912631.848688e-510.4368170.567215
Defaultd0.4312530.03258013.2368845.373218e-400.3673980.495108
Less timed0.4363940.03100714.0739295.493102e-450.3756210.497168
\n", "
" ], "text/plain": [ " coef std err t P>|t| 2.5 % \\\n", "Model Type Metric \n", "Full Sample d 0.498286 0.032738 15.220407 2.589147e-52 0.434121 \n", "On the folds d 0.502016 0.033265 15.091263 1.848688e-51 0.436817 \n", "Default d 0.431253 0.032580 13.236884 5.373218e-40 0.367398 \n", "Less time d 0.436394 0.031007 14.073929 5.493102e-45 0.375621 \n", "\n", " 97.5 % \n", "Model Type Metric \n", "Full Sample d 0.562452 \n", "On the folds d 0.567215 \n", "Default d 0.495108 \n", "Less time d 0.497168 " ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "summary = pd.concat([obj_dml_plr_fullsample.summary, dml_plr_obj_onfolds.summary, dml_plr_obj_untuned.summary, obj_dml_plr_lesstime.summary],\n", " keys=['Full Sample', 'On the folds', 'Default', 'Less time'])\n", "summary.index.names = ['Model Type', 'Metric']\n", "\n", "summary" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Plot Coefficients and 95% Confidence Intervals\n", "\n", "This section generates a plot comparing the coefficients and 95% confidence intervals for each model type." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Extract model labels and coefficient values\n", "model_labels = summary.index.get_level_values('Model Type')\n", "coef_values = summary['coef'].values\n", "\n", "# Calculate errors\n", "errors = np.full((2, len(coef_values)), np.nan)\n", "errors[0, :] = summary['coef'] - summary['2.5 %']\n", "errors[1, :] = summary['97.5 %'] - summary['coef']\n", "\n", "# Plot Coefficients and 95% Confidence Intervals\n", "plt.figure(figsize=(10, 6))\n", "plt.errorbar(model_labels, coef_values, fmt='o', yerr=errors, capsize=5)\n", "plt.axhline(0.5, color='red', linestyle='--')\n", "plt.xlabel('Model')\n", "plt.ylabel('Coefficients and 95%-CI')\n", "plt.title('Comparison of Coefficients and 95% Confidence Intervals')\n", "plt.xticks(rotation=45)\n", "plt.tight_layout()\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Compare Metrics for Nuisance Estimation\n", "\n", "In this section, we compare metrics for different models and plot a bar chart to visualize the differences in their performance." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig, axs = plt.subplots(1,2,figsize=(10,4))\n", "axs = axs.flatten()\n", "\n", "axs[0].bar(x = ['Full Sample', 'On the folds', 'Default', 'Less time'],\n", " height=[rmse_dml_ml_m_fullsample, rmse_dml_ml_m_onfolds, rmse_dml_ml_m_untuned, rmse_dml_ml_m_lesstime])\n", "\n", "axs[1].bar(x = ['Full Sample', 'On the folds', 'Default', 'Less time'],\n", " height=[rmse_dml_ml_l_fullsample, rmse_dml_ml_l_onfolds, rmse_dml_ml_l_untuned, rmse_dml_ml_l_lesstime])\n", "\n", "axs[0].set_xlabel(\"Tuning Method\")\n", "axs[0].set_ylim((1,1.12))\n", "axs[0].set_ylabel(\"RMSE\")\n", "axs[0].set_title(\"OOS RMSE for Different Tuning Methods (ml_m)\")\n", "\n", "axs[1].set_xlabel(\"Tuning Method\")\n", "axs[1].set_ylim((1.1,1.22))\n", "axs[1].set_ylabel(\"RMSE\")\n", "axs[1].set_title(\"OOS RMSE for Different Tuning Methods (ml_l)\")\n", "\n", "fig.suptitle(\"Out of Sample RMSE in Nuisance Estimation by Tuning Method\")\n", "fig.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "\n", "This notebook highlights that tuning plays an important role and can be easily done using FLAML AutoML. In our [recent study](https://arxiv.org/abs/2402.04674) we provide more evidence for tuning with AutoML, especially that the full sample case in all investigated cases performed similarly to the full sample case and thus tuning time and complexity can be saved by tuning externally.\n", "\n", "See also our fully automated API for tuning DoubleML objects using AutoML, called [``AutoDoubleML``](https://github.com/OliverSchacht/AutoDoubleML) which can be installed from Github for python." ] } ], "metadata": { "kernelspec": { "display_name": "flaml", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.4" } }, "nbformat": 4, "nbformat_minor": 2 }