{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example: Sensitivity Analysis for Causal ML\n",
    "\n",
    "This notebook complements the introductory paper \"*Sensitivity Analysis for Causal ML: A Use Case at Booking.com*\" by Bach et al. (2024) (forthcoming). It illustrates the causal analysis and sensitivity considerations in a simplified example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import doubleml as dml\n",
    "from doubleml.datasets import make_confounded_irm_data\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "from sklearn.linear_model import LinearRegression, LogisticRegression\n",
    "from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier\n",
    "from sklearn.isotonic import IsotonicRegression\n",
    "from lightgbm import LGBMRegressor, LGBMClassifier\n",
    "import plotly.graph_objects as go\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Data\n",
    "\n",
    "We will simulate a data set according to a data generating process (DGP), which is available and documented in the [DoubleML](https://docs.doubleml.org/stable/api/generated/doubleml.datasets.make_confounded_irm_data.html#doubleml.datasets.make_confounded_irm_data) for Python. We will parametrize the DGP in a way that it roughly mimics patterns of the data used in the original analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use smaller number of observations in demo example to reduce computational time\n",
    "n_obs = 75000\n",
    "\n",
    "# Parameters for the data generating process\n",
    "# True average treatment effect (very similar to ATT in this example)\n",
    "theta = 0.07 \n",
    "# Coefficient of the unobserved confounder in the outcome regression.\n",
    "beta_a = 0.25\n",
    "# Coefficient of the unobserved confounder in the propensity score.\n",
    "gamma_a = 0.123\n",
    "# Variance for outcome regression error\n",
    "var_eps = 1.5\n",
    "# Threshold being applied on trimming propensity score on the population level\n",
    "trimming_threshold = 0.05\n",
    "\n",
    "# Number of observations\n",
    "np.random.seed(42)\n",
    "dgp_dict = make_confounded_irm_data(n_obs=n_obs, theta=theta, beta_a=beta_a, gamma_a=gamma_a, var_eps=var_eps, trimming_threshold=trimming_threshold)\n",
    "\n",
    "x_cols = [f'X{i + 1}' for i in np.arange(dgp_dict['x'].shape[1])]\n",
    "df = pd.DataFrame(np.column_stack((dgp_dict['x'], dgp_dict['y'], dgp_dict['d'])), columns=x_cols + ['y', 'd'])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Causal Analysis with DoubleML\n",
    "\n",
    "### 1. Formulation of Causal Model & Identification Assumptions\n",
    "\n",
    "In the use case, we focused on a nonparametric model for the treatment effect, also called [Interactive Regression Model (IRM)](https://docs.doubleml.org/stable/guide/models.html#interactive-regression-models-irm). Under the assumptions of consistency, overlap and unconfoundedness, this model can be used to identify the Average Treatment Effect (ATE) and the Average Treatment Effect on the Treated (ATT). The identification strategy was based on a DAG.\n",
    "\n",
    "![DAG underlying to the causal analysis.](figures/dag_usecase_revised.png)\n",
    "\n",
    "\n",
    "In the use case of consideration, the key causal quantity was the ATT as it quantifies by how much ancillary products increase follow-up bookings on average for customers who purchased an ancillary product.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set up the data backend with treatment variable d, outcome variable y, and covariates x\n",
    "dml_data = dml.DoubleMLData(df, 'y', 'd', x_cols)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Estimation of Causal Effect\n",
    "\n",
    "For estimation, we employed the [DoubleML](https://docs.doubleml.org/stable/index.html) package in Python. The nuisance functions (including the outcome regression and the propensity score) have been used with [LightGBM](https://lightgbm.readthedocs.io/en/stable/)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize LightGBM learners\n",
    "n_estimators = 150\n",
    "learning_rate = 0.05\n",
    "ml_g = LGBMRegressor(n_estimators=n_estimators, learning_rate = 0.05, verbose=-1)\n",
    "ml_m = LGBMClassifier(n_estimators=n_estimators, learning_rate = 0.05, verbose=-1)\n",
    "\n",
    "# Initialize the DoubleMLIRM model, specify score \"ATTE\" for average treatment effect on the treated\n",
    "dml_obj = dml.DoubleMLIRM(dml_data, score = \"ATTE\", ml_g = ml_g, ml_m = ml_m, n_folds = 5, n_rep = 2)\n",
    "\n",
    "\n",
    "# fit the model\n",
    "dml_obj.fit()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's summarize the estimation results for the ATT."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dml_obj.summary.round(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The results point at a sizeable positive effect. However, we are concerned that this effect might be driven by unobserved confounders: The large positive effect might represent selection into treatment mechanisms rather than the *pure* causal effect of the treatment."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Sensitivity Analysis\n",
    "\n",
    "To address the concerns with regard to the confounding bias, sensitivity analysis has been employed. The literature has developed various approaches, which differ in terms of their applicability to the specific estimation approach (among others). In the context of the use case, the approaches of [VanderWeele and Arah (2011)](https://doi.org/10.1097%2FEDE.0b013e3181f74493) and [Chernozhukov et al. (2023)](https://arxiv.org/abs/2112.13398) have been employed. Here, we won't go into the details of the methods but rather illustrate the application of the sensitivity analysis.\n",
    "\n",
    "[VanderWeele and Arah (2011)](https://doi.org/10.1097%2FEDE.0b013e3181f74493) provide a general formula for the omitted variable bias that is applicable irrespective of the estimation framework. The general formula is based on explicit parametrization of the model in terms of the distribution of the unobserved confounder. Such a specification might be difficult to achieve in practice. Hence, the authors also offer a simplified version that employs additional assumptions. For the ATT, these assumptions impose a binary confounding variable that has an effect on $D$ and $Y$ which does not vary with the observed confounding variables. Under these scenarios, the bias formula arises as\n",
    "$$\n",
    "\\theta_s - \\theta_0 = \\delta \\cdot \\gamma\n",
    "$$\n",
    "where $\\theta_s$ refers to the short parameter (= the ATT that is identfiable from the available data, i.e., under unobserved confounding) and $\\theta_0$ the long or true parameter (that would be identifiable if the unobserved confounder was observed). $\\delta$ and $\\gamma$ denote the sensitivity parameters in this framework and refer to difference in the prevalence of the (binary) confounder in the treated and the untreated group (after accounting for $X$): $\\delta = P(U|D=1, X) - P(U|D=0,X)$. $\\gamma$ refers to the confounding effect in the main equation, i.e., $\\gamma = E[Y|D,X, U=1] - E[Y|D,X, U=0]$, which describes the average expected difference in the outcome variable due to a change in the confounding variable $U$ (given $D$ and $X$). For a more detailed treatment, we refer to the original paper by [VanderWeele and Arah (2011)](https://doi.org/10.1097%2FEDE.0b013e3181f74493). This sensitivity approach is appealing because of its simplicity and applicability. We can specify various scenarios in terms of values for $\\gamma$ and $\\delta$ and compute the bias. This could also be illustrated in a contour plot.\n",
    "\n",
    "We would like to note that in the context of the original analysis, we experimented with various sensitivity frameworks. Hence, the presentation here is mainly for illustrative purposes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Implement vanderWeele and Arah corrected estimate\n",
    "def adj_vanderWeeleArah(coef, gamma, delta, downward = True):\n",
    "    bias = gamma * delta\n",
    "\n",
    "    if downward:\n",
    "        adj = coef - bias\n",
    "    else:\n",
    "        adj = coef + bias\n",
    "    return adj"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We set up a grid of values for $\\gamma$ and $\\delta$ and compute the bias for each combination. We then illustrate the bias in a contour plot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gamma_val, delta_val = np.linspace(0, 1, 100), np.linspace(0, 5, 100)\n",
    "\n",
    "# all combinations of gamma_val and delta_val\n",
    "gamma_val, delta_val = np.meshgrid(gamma_val, delta_val)\n",
    "\n",
    "# Set \"downward = True\": We are worried that selection into the treatment leads to an upward bias of the ATT\n",
    "adj_est = adj_vanderWeeleArah(dml_obj.coef, gamma_val, delta_val, downward = True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set up a contour plot based on the values for gamma_val, delta_val, and adj_est\n",
    "fig = go.Figure(data=go.Contour(z=adj_est, x=gamma_val[0], y=delta_val[:, 0],\n",
    "                                contours=dict(coloring = 'heatmap', showlabels = True)))\n",
    "\n",
    "fig.update_layout(title='Adjusted ATT estimate based on VanderWeele and Arah (downward bias)',\n",
    "                  xaxis_title='gamma',\n",
    "                  yaxis_title='delta')\n",
    "\n",
    "# highlight the contour line at the level of zero\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The contour plot shows the corresponding bias for different values of $\\gamma$ and $\\delta$. A critical contour line is the line at $0$ as it indicates all combinations for $\\delta$ and $\\gamma$ that would suffice to render the ATT down to a value of $0$. The sensitivity parameters are not defined on the same scale and also not bounded in their range. To obtain a relation to the data set, we can perform a benchmarking analysis presented in the next section.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we will start with applying the sensitivity analysis of [Chernozhukov et al. (2023)](https://arxiv.org/abs/2112.13398). Without further specifications, a default scenario with `cf_d` = `cf_y` = `0.03` is applied, i.e., the bias formula in [Chernozhukov et al. (2023)](https://arxiv.org/abs/2112.13398) is computed for these values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dml_obj.sensitivity_analysis()\n",
    "\n",
    "# Summary\n",
    "print(dml_obj.sensitivity_summary)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From this analysis, we can conclude that the ATT is robust to the default setting with `cf_d` = `cf_y` = `0.03`. The corresponding robustness value is $7.58\\%$, which means that a confounding scenario with `cf_d` = `cf_y` = `0.0758` would be sufficient to set the ATT down to a value of exactly zero. An important question is whether the considered confounding scenario is plausible in the use case. To get a better understanding of the sensitivity of the ATT with regard to other confounding scenarios, we can generate a contour plot, which displays all lower bounds for the ATT for different values of the sensitivity parameters `cf_d` and `cf_y`. We can also add the robustness value (white cross)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "contour_plot = dml_obj.sensitivity_plot(grid_bounds = (0.08, 0.08))\n",
    "\n",
    "# Add robustness value to the plot: Intersection of diagonal with contour line at zero\n",
    "rv = dml_obj.sensitivity_params['rv']\n",
    "\n",
    "# Add the point with an \"x\" shape at coordinates (4.97, 4.97)\n",
    "contour_plot.add_trace(go.Scatter(\n",
    "    x=rv,\n",
    "    y=rv,\n",
    "    mode='markers+text',\n",
    "    marker=dict(\n",
    "        size=12,\n",
    "        symbol='x',\n",
    "        color='white'\n",
    "    ),\n",
    "    name=\"RV\"\n",
    "))\n",
    "\n",
    "contour_plot.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. Benchmarking Analysis\n",
    "\n",
    "Benchmarking helps to bridge the gap between the hypothetical values for the sensitivity parameters and the empirical relationships in the data. The idea of benchmarking is as follows: We can leave out one of the observed confounders and then re-compute the values for the sensitivity parameters. \n",
    "\n",
    "#### VanderWeele and Arah (2011): Benchmarking\n",
    "\n",
    "One way to obtain benchmarking values for $\\gamma$ and $\\delta$, would be to transform one of the candidate variables into a dummy variable (if it was not binary before). This type of benchmarking analysis is not presented in the original paper by [VanderWeele and Arah (2011)](https://doi.org/10.1097%2FEDE.0b013e3181f74493) and is mainly motivated by practical considerations. We can compute the average difference in the prevalence of the binary version of $X_2$ in the group of treated and untreated, i.e, $\\delta^* = P(X_2|D=1, X_{-2}) - P(X_2|D=0,X_{-2})$ and $\\gamma^* = E[Y|D,X_{-2}, X_2=1] - E[Y|D,X_{-2}, X_2=0]$, where $X_{-2}$ indicates all observed confounders except for $X_2$. To keep it simple, we focus on linear models for estimation of the conditional probabilities and expectations. As a rough approximation we can obtain a benchmark value for $\\gamma^*$ from a linear regression of the outcome variable on the covariates and the treatment variables. There are basically several approaches to approach the benchmark quantities here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a dummy variable for X2\n",
    "df_binary = df.copy()\n",
    "df_binary['X2_dummy'] = (df_binary['X2'] > df_binary['X2'].median()).astype(int)\n",
    "\n",
    "# replace X2 by binary versoin\n",
    "x_cols_binary = x_cols.copy()\n",
    "x_cols_binary[1] = 'X2_dummy'\n",
    "\n",
    "df_binary = df_binary[x_cols_binary + ['d'] + ['y']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# logistic regression: predict X_2 based on all X's for treated\n",
    "ml_m_bench_treated = LogisticRegression(max_iter=1000, penalty = None)\n",
    "x_binary_treated = df_binary.loc[df_binary['d'] == 1, x_cols_binary]\n",
    "ml_m_bench_treated.fit(x_binary_treated, df_binary.loc[df_binary['d'] == 1, 'X2_dummy'])\n",
    "# predictions for treated\n",
    "X2_preds_treated = ml_m_bench_treated.predict(x_binary_treated)\n",
    "\n",
    "# logistic regression: predict X_2 based on all X's for control\n",
    "ml_m_bench_control = LogisticRegression(max_iter=1000, penalty = None)\n",
    "x_binary_control = df_binary.loc[df_binary['d'] == 0, x_cols_binary]\n",
    "ml_m_bench_control.fit(x_binary_control, df_binary.loc[df_binary['d'] == 0, 'X2_dummy'])\n",
    "# predictions for control\n",
    "X2_preds_control = ml_m_bench_control.predict(x_binary_control)\n",
    "\n",
    "# Difference in prevalence\n",
    "gamma_bench = np.mean(X2_preds_treated) - np.mean(X2_preds_control)\n",
    "print(gamma_bench)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Linear regression based on df_binary\n",
    "lm = LinearRegression()\n",
    "lm.fit(df_binary[x_cols_binary + ['d']], df_binary['y'])\n",
    "delta_bench = lm.coef_[1]\n",
    "print(delta_bench)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now compute the corresponding bias and add this benchmarking scenario to the contour plot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "adj_coef_bench = adj_vanderWeeleArah(dml_obj.coef, gamma_bench, delta_bench, downward = True)\n",
    "adj_coef_bench"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set up a contour plot based on the values for gamma_val, delta_val, and adj_est\n",
    "fig = go.Figure(data=go.Contour(z=adj_est, x=gamma_val[0], y=delta_val[:, 0],\n",
    "                                contours=dict(coloring = 'heatmap', showlabels = True)))\n",
    "\n",
    "# Add the benchmark values gamma_bench, delta_bench and bias_bench\n",
    "fig.add_trace(go.Scatter(x=[gamma_bench], y=[delta_bench], mode='markers+text', name='Benchmark',text=['<b>Benchmark Scenario</b>'],\n",
    "                                 textposition=\"top right\",))\n",
    "\n",
    "fig.update_layout(title='Adjusted ATT estimate based on VanderWeele and Arah (downward bias)',\n",
    "                  xaxis_title='gamma',\n",
    "                  yaxis_title='delta')\n",
    "\n",
    "# highlight the contour line at the level of zero\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is, of course, also possible to re-estimate the ATT in the benchmark scenario. We can see that omitting $X_2$ leads to an upward bias of the ATT, which reflects our major concerns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set up the data backend with treatment variable d, outcome variable y, and covariates x\n",
    "x_cols_bench = x_cols_binary.copy()\n",
    "x_cols_bench.remove('X2_dummy')\n",
    "\n",
    "df_bench = df_binary[x_cols_bench + ['d'] + ['y']]\n",
    "dml_data_bench = dml.DoubleMLData(df_bench, 'y', 'd', x_cols_bench)\n",
    "\n",
    "np.random.seed(42)\n",
    "dml_obj_bench = dml.DoubleMLIRM(dml_data_bench, score = \"ATTE\", ml_g = ml_g, ml_m = ml_m, n_folds = 5, n_rep = 2)\n",
    "dml_obj_bench.fit()\n",
    "\n",
    "dml_obj_bench.summary.round(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Chernozhukov et al. (2023): Benchmarking\n",
    "\n",
    "A caveat to the benchmarking procedure presented above is its informal nature, see [Cinelli and Hazlett (2020)](https://doi.org/10.1111/rssb.12348). A more benchmarking approach is presented in [Cinelli and Hazlett (2020)](https://doi.org/10.1111/rssb.12348), which in turn was generalized in [Chernozhukov et al. (2023)](https://arxiv.org/abs/2112.13398), which we will demonstrate later. To have a better relation to the use case data, we can now perform a benchmarking analysis, i.e., we can leave out one or multiple observed confounding variables and re-compute the values for the sensitivity parameters `cf_d` and `cf_y`. We can then compute the corresponding bias and add this benchmarking scenario to the contour plot. Let's consider two confounding variables $X_1$ and $X_2$. $X_1$ is known to be the most important predictor for $Y$ and $D$ whereas $X_2$ is less important. $X_2$ is a measure for customers' membership. Usually, it's recommended to consider multiple scenarios and not solely base all conclusions on one benchmarking scenario.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Benchmarking\n",
    "bench_x1 = dml_obj.sensitivity_benchmark(benchmarking_set=['X1'])\n",
    "bench_x2 = dml_obj.sensitivity_benchmark(benchmarking_set=['X2'])\n",
    "print(bench_x1)\n",
    "print(bench_x2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can perform the sensitivity analysis in both scenarios. Note that here, we start with a value for $\\rho=1$, although the benchmarking resulted in a smaller calibrated value. The reason to do so is that $\\rho=1$ is the most conservative setting and that the theoretical results in [Chernozhukov et al. (2023)](https://arxiv.org/abs/2112.13398) are based on this setting. We can later investigate how lowering $\\rho$ would change the conclusions from the sensitivity analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# benchmarking scenario\n",
    "dml_obj.sensitivity_analysis(cf_y = bench_x1.loc[\"d\", \"cf_y\"], cf_d = bench_x1.loc[\"d\", \"cf_d\"], rho = 1.0)\n",
    "print(dml_obj.sensitivity_summary)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# benchmarking scenario\n",
    "dml_obj.sensitivity_analysis(cf_y = bench_x2.loc[\"d\", \"cf_y\"], cf_d = bench_x2.loc[\"d\", \"cf_d\"], rho = 1.0)\n",
    "print(dml_obj.sensitivity_summary)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can conclude that the ATT would not be robust to the benchmarking scenario based on $X_1$: A confounder that would be comparable in terms of the explanatory power for the outcome and the treatment variable to the observed confounder $X_1$ would be sufficient to switch the sign of the ATT estimate. This is different for the less pessimistic scenario based on $X_2$. Here, the ATT is robust to the benchmarking scenario. This is in line with the overall intuition in the business case under consideration."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Sensitivity analysis with benchmark scenario for X2 (which is supposed to be \"not unlike the omitted confounder\")\n",
    "contour_plot = dml_obj.sensitivity_plot(include_scenario=True, grid_bounds = (0.08, 0.12))\n",
    "\n",
    "# Add robustness value to the plot: Intersection of diagonal with contour line at zero\n",
    "rv = dml_obj.sensitivity_params['rv']\n",
    "\n",
    "# Add the point with an \"x\" shape at coordinates (rv, rv)\n",
    "contour_plot.add_trace(go.Scatter(\n",
    "    x=rv,\n",
    "    y=rv,\n",
    "    mode='markers+text',\n",
    "    marker=dict(\n",
    "        size=12,\n",
    "        symbol='x',\n",
    "        color='white'\n",
    "    ),\n",
    "    name=\"RV\",\n",
    "    showlegend = False\n",
    "))\n",
    "\n",
    "# Set smaller margin for better visibility (for paper version of the plot)\n",
    "contour_plot.update_layout(\n",
    "    margin=dict(l=1, r=1, t=5, b=5) \n",
    ")\n",
    "\n",
    "contour_plot.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As the benchmarking scenarios has been based on the conservative value for $\\rho=1$, we can now investigate how the conclusions would change if we would gradually lower $\\rho$ to the calibrated value of $0.50967$. Let's first set the value for $\\rho$ to an intermediate value of $\\rho=0.7548$.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# benchmarking scenario\n",
    "rho_val = (1+bench_x2.loc[\"d\", \"rho\"])/2\n",
    "dml_obj.sensitivity_analysis(cf_y = bench_x2.loc[\"d\", \"cf_y\"], cf_d = bench_x2.loc[\"d\", \"cf_d\"], rho = rho_val)\n",
    "print(dml_obj.sensitivity_summary)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As expected from using a less conservative value for $\\rho<1$, the robustness values $RV$ and $RVa$ are larger now. We can draw the contour plot again, which is now scaled according to the smaller value for $\\rho$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "contour_plot = dml_obj.sensitivity_plot(include_scenario=True, grid_bounds = (0.12, 0.12))\n",
    "\n",
    "# Add robustness value to the plot: Intersection of diagonal with contour line at zero\n",
    "rv = dml_obj.sensitivity_params['rv']\n",
    "\n",
    "# Add the point with an \"x\" shape at coordinates (rv, rv)\n",
    "contour_plot.add_trace(go.Scatter(\n",
    "    x=rv,\n",
    "    y=rv,\n",
    "    mode='markers+text',\n",
    "    marker=dict(\n",
    "        size=12,\n",
    "        symbol='x',\n",
    "        color='white'\n",
    "    ),\n",
    "    name=\"RV\",\n",
    "    showlegend = False\n",
    "))\n",
    "\n",
    "# Set smaller margin for better visibility (for paper version of the plot)\n",
    "contour_plot.update_layout(\n",
    "    margin=dict(l=1, r=1, t=5, b=5) \n",
    ")\n",
    "\n",
    "contour_plot.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By setting $\\rho$ to the benchmarked value of $\\rho=0.5097$, the results suggest a rather robust causal effect of the treatment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# benchmarking scenario\n",
    "rho_val = bench_x2.loc[\"d\", \"rho\"]\n",
    "dml_obj.sensitivity_analysis(cf_y = bench_x2.loc[\"d\", \"cf_y\"], cf_d = bench_x2.loc[\"d\", \"cf_d\"], rho = rho_val)\n",
    "print(dml_obj.sensitivity_summary)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Sensitivity analysis with benchmark scenario for X2 (which is supposed to be \"not unlike the omitted confounder\")\n",
    "contour_plot = dml_obj.sensitivity_plot(include_scenario=True, grid_bounds = (0.15, 0.15))\n",
    "\n",
    "# Add robustness value to the plot: Intersection of diagonal with contour line at zero\n",
    "rv = dml_obj.sensitivity_params['rv']\n",
    "\n",
    "# Add the point with an \"x\" shape at coordinates (rv, rv)\n",
    "contour_plot.add_trace(go.Scatter(\n",
    "    x=rv,\n",
    "    y=rv,\n",
    "    mode='markers+text',\n",
    "    marker=dict(\n",
    "        size=12,\n",
    "        symbol='x',\n",
    "        color='white'\n",
    "    ),\n",
    "    name=\"RV\",\n",
    "    showlegend = False\n",
    "))\n",
    "\n",
    "# Set smaller margin for better visibility (for paper version of the plot)\n",
    "contour_plot.update_layout(\n",
    "    margin=dict(l=1, r=1, t=5, b=5) \n",
    ")\n",
    "\n",
    "contour_plot.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5. Conclusion\n",
    "\n",
    "In this notebook, we presented a simplified example of a sensitivity analysis for a causal effect inspired by a use case at Booking.com. Sensitivity analysis is intended to quantify the strength of an ommited (confounding) variable bias, which is inherently based on untestable identification assumptions. Hence, the results from a applying sensitivity analysis will generally not be unambiguous (as for example from common statistical test procedures) and have to be considered in the context of the use case. Hence, the integration of use-case-related expertise is essential to interpret the results of a sensitivity analysis, specifically regarding the plausibility of the assumed confounding scenarios."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Disclaimer\n",
    "\n",
    "We would like to note that the use case is presented in a stylized way. Due to confidentiality concerns and for the sake of replicability, the empirical analysis is based on a simulated data example. The results cannot be used to obtain any insights to the exact findings from the actual use case at Booking.com.\n",
    "\n",
    "Please note that we will link to the paper once it is published."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References\n",
    "\n",
    "* Philipp Bach, Victor Chernozhukov, Carlos Cinelli, Lin Jia, Sven Klaassen, Nils Skotara, and Martin Spindler. 2024. Sensitivity Analysis for Causal ML: A Use Case at Booking.com.\n",
    "\n",
    "* Chernozhukov, Victor, Cinelli, Carlos, Newey, Whitney, Sharma, Amit, and Syrgkanis, Vasilis. (2022). Long Story Short: Omitted Variable Bias in Causal Machine Learning. National Bureau of Economic Research. https://doi.org/10.48550/arXiv.2112.13398\n",
    "\n",
    "* Carlos Cinelli, Chad Hazlett, Making Sense of Sensitivity: Extending Omitted Variable Bias, Journal of the Royal Statistical Society Series B: Statistical Methodology, Volume 82, Issue 1, February 2020, Pages 39–67, https://doi.org/10.1111/rssb.12348"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "base",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}