{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# R: Basics of Double Machine Learning\n", "\n", "**Remark**: This notebook has a long computation time due to the large number of simulations.\n", "\n", "This notebooks contains the detailed simulations according to the introduction to double machine learning in the [User Guide](https://docs.doubleml.org/stable/guide/basics.html) of the DoubleML package." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [], "source": [ "library(data.table)\n", "library(ggplot2)\n", "library(mlr3)\n", "library(mlr3learners)\n", "library(data.table)\n", "library(DoubleML)\n", "\n", "lgr::get_logger(\"mlr3\")$set_threshold(\"warn\")\n", "options(repr.plot.width=5, repr.plot.height=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Generating Process (DGP)\n", "\n", "We consider the following partially linear model:\n", "\n", "$$\n", "\\begin{align*}\n", "y_i &= \\theta_0 d_i + g_0(x_i) + \\zeta_i, & \\zeta_i \\sim \\mathcal{N}(0,1), \\\\\n", "d_i &= m_0(x_i) + v_i, & v_i \\sim \\mathcal{N}(0,1),\n", "\\end{align*}\n", "$$\n", "\n", "with covariates $x_i \\sim \\mathcal{N}(0, \\Sigma)$, where $\\Sigma$ is a matrix with entries $\\Sigma_{kj} = 0.7^{|j-k|}$. We are interested in performing valid inference on the causal parameter $\\theta_0$. The true parameter $\\theta_0$ is set to $0.5$ in our simulation experiment.\n", "\n", "The nuisance functions are given by:\n", "\n", "$$\n", "\\begin{align*}\n", "m_0(x_i) &= x_{i,1} + \\frac{1}{4} \\frac{\\exp(x_{i,3})}{1+\\exp(x_{i,3})}, \\\\\n", "g_0(x_i) &= \\frac{\\exp(x_{i,1})}{1+\\exp(x_{i,1})} + \\frac{1}{4} x_{i,3}.\n", "\\end{align*}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We generate ``n_rep`` replications of the data generating process with sample size ``n_obs`` and compare the performance of different estimators." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [], "source": [ "set.seed(1234)\n", "n_rep = 1000\n", "n_obs = 500\n", "n_vars = 5\n", "alpha = 0.5\n", "\n", "data = list()\n", "for (i_rep in seq_len(n_rep)) {\n", " data[[i_rep]] = make_plr_CCDDHNR2018(alpha=alpha, n_obs=n_obs, dim_x=n_vars,\n", " return_type=\"data.frame\")\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Regularization Bias in Simple ML-Approaches\n", "\n", "Naive inference that is based on a direct application of machine learning methods to estimate the causal parameter, $\\theta_0$, is generally invalid. The use of machine learning methods introduces a bias that arises due to regularization. A simple ML approach is given by randomly splitting the sample into two parts. On the auxiliary sample indexed by $i \\in I^C$ the nuisance function $g_0(X)$ is estimated with an ML method, for example a random forest learner. Given the estimate $\\hat{g}_0(X)$, the final estimate of $\\theta_0$ is obtained as ($n=N/2$) using the other half of observations indexed with $i \\in I$\n", "\n", "$$\n", "\\hat{\\theta}_0 = \\left(\\frac{1}{n} \\sum_{i\\in I} D_i^2\\right)^{-1} \\frac{1}{n} \\sum_{i\\in I} D_i (Y_i - \\hat{g}_0(X_i)).\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As this corresponds to a \"non-orthogonal\" score, which is not implemented in the DoubleML package, we need to define a custom callable." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [], "source": [ "non_orth_score = function(y, d, l_hat, m_hat, g_hat, smpls) {\n", "u_hat = y - g_hat\n", "psi_a = -1*d*d\n", "psi_b = d*u_hat\n", "psis = list(psi_a = psi_a, psi_b = psi_b)\n", "return(psis)\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remark that the estimator is not able to estimate $\\hat{g}_0(X)$ directly, but has to be based on a preliminary estimate of $\\hat{m}_0(X)$. All following estimators with ``score=\"IV-type\"`` are based on the same preliminary procedure." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Replication 1000/1000\r" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Warning message in geom_function(fun = dnorm, aes(colour = \"N(0, 1)\", fill = \"N(0, 1)\")):\n", "\"\u001b[1m\u001b[22mAll aesthetics have length 1, but the data has 1000 rows.\n", "\u001b[36mℹ\u001b[39m Did you mean to use `annotate()`?\"\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlgAAAHgCAMAAABOyeNrAAAAFVBMVEUAAAAAAItNTU2lpc6zs9zr6+v///9Pjet6AAAACXBIWXMAABJ0AAASdAHeZh94AAAPyUlEQVR4nO3di3ravBJGYQFp7v+SGzAyPshgS/NZGrHW3n9DyJQx4X1oQg6EXyJBofYBUJ8BiyQBiyQBiyQBiyQBiyQBiyQBiyRZwvrX4NCu69figXsPWKbrgBUDluk6YMWAZboOWDFgma4DVgxYpuuAFQOW6TpgxYBlug5YMWCZrgNWDFim64AVA5bpOmDFgGW6DlgxYJmuA1Zsxzv++tf01c3BFm8fYFXq8zv+Ov4xvAqsU4a8dxTWlXusc4a8dxDWlX8KTxryXhGsf+0Xah/A5xQ3a/2Owbr+co910pD3DsFafBy/rMHbJ4Q9sho8cPcdgzW0Ndne7RMe/zttHbBeHX64wdM9Voj/P2ed6ZD3gGW3znTIewceeZ98AJ+utdsnTP47YZ3tkPd6/lohsCrWMawQ/wBWhb4B1scr2diBdxGwrNYZD3mvX1hh9ad0nfWQ94BltM56yHvAMlpnPeQ9YBmtsx7yXrewQuqFbp35kPeAZbPOfMh7wLJZZz7kPWDZrDMf8l6vsMLGS9E6+yHvActknf2Q94Blss5+yHvAMllnP+S9TmGF7ROKdYIh7wHLYp1gyHvAslgnGPIesCzWCYa8ByyLdYIh7/UJK6xPAevcvgbWu2vazoH3E7AM1imGvAcsg3WKIe8By2CdYsh7wDJYpxjyXpewQuoksE7te2C9uarNHHhHAat8nWTIe8AqXycZ8h6wytdJhrwHrPJ1kiHv9QgrpE8D68y+CNb2dW3lwHsKWMXrNEPeA1bxOs2Q94BVvE4z5D1gFa/TDHkPWMXrNEPe6xBW2PeK1TrRkPeKYNV+pr90Yd8rzWR1U7YV91il60RD3gNW6TrRkPeAVbpONOQ9YJWuEw15D1il60RD3usPVtj9msk61ZD3gFW4TjXkPWAVrlMNeQ9YhetUQ94DVuE61ZD3gFW4TjXkPWAVrlMNea87WOHIq+XrZEPeA1bZOtmQ94BVtk425D1gla2TDXkPWGXrZEPeA1bZOtmQ94BVtk425L3eYC2vD7Aq9WWw0te3hQPvLWAVrdMNeQ9YRet0Q94DVtE63ZD3gFW0TjfkPWAVrdMNeQ9YRet0Q97rDNbq6gCrUt8GK3mFGzjw7gJWyTrhkPeAVbJOOOQ9YJWsEw55D1gl64RD3gNWyTrhkPeAVbJOOOS9vmCtrw2wKvV1sFLXuP6B9xewCtYph7wHrIJ1yiHvAatgnXLIe8AqWKcc8h6wCtYph7y3A9b1r9TpVfVvH2A102dY1/GP+el19W8fYDVTV7ASV2bXWdUPvMOOwfpdnp5V/fYBVjsVwar9hGzLEk8bt++sqpnfpk10FFbTH7xzj9VO/FOYv0465D1g5a+TDnmPzwrz10mHvAes/HXSIe8deOT9OjmdrPbtk7ouwKpUT18r3AlrfV7tA+8xYGWv0w55D1jZ67RD3gNW9jrtkPeAlb1OO+Q9YGWv0w55D1jZ67RD3gNW9jrtkPc6gpW8KrvOBJZ9wMpdJx7yHrBy14mHvAes3HXiIe8BK3edeMh7wMpdJx7yHrBy14mHvAes3HXiIe8BK3edeMh7/cBKX5Nd5wLLPmBlrlMPeQ9YmevUQ94DVuY69ZD3gJW5Tj3kPWBlrlMPeQ9YmevUQ94DVuY69ZD3gJW5Tj3kvW5gbVyRXWcDyz5g5a2TD3kPWHnr5EPeA1beOvmQ94CVt04+5D1g5a2TD3kPWHnr5EPeA1beOvmQ94CVt04+5L1eYG1dj13nA8s+YGWt0w95D1hZ6/RD3gNW1jr9kPeAlbVOP+S9Ili1n5Bt0tZzxR09v0JWN2VbdXKPdQmX2HyIe6xKdQPr5xmw2ghYsmMqG/Let8KavQFY9gFLdkxlQ97rBNboCliNBCzZMZUNeQ9YsmMqG/IesGTHVDbkvf5gXWYPlQKrUv3Bmt93AatSwJIdU9mQ974W1vQtwLIPWLJjKhvyHrBkx1Q25D1gyY6pbMh7wJIdU9mQ94AlO6ayIe8BS3ZMZUPe6wNWuACrsYClOqbCIe95hvX6qmAOrMmbgGWfa1ijIWA1F7Bsj8lsyHvAsj0msyHvAcv2mMyGvAcs22MyG/IesGyPyWzIe8CyPSazIe8By/aYzIa81wWs8JMD6/U2YNkHLNtjMhvy3lfAWvxIWAxYwr4D1vy8mANYlzftv5QaAavCge8ZGnod+SpgyYaA1XDAqnDge4aGgHWvNVjDd2rdA9bp9QxreNP6XiwGLGE9wAorTvtgjVceWPYBq8KB7xkaAtY9YAFrDFgVDnzP0NA+WGH24nNHb/TF5Yc9l7Bjx/Wv1OlVwKoFa7giu7yEvYOLvzNZYwTrOv4xP70OWNXuscLwYkfA+jwErGcTWCGEgU4Ir5t2eub95PjW8HqZ/Nvj65M1weqfwhUmYJ0xNLT7Y6zwevF4Ob1jWp0ZltO/k/uhkH7775mwKj/R3+U2FG63eHJ1KizOm11AG09a+O6dnw1retuuzlyesfVyeuG/cljNffAe1vdTO++x4rXv4B7reWtbwxr+LXx9Vhime94GrAoHvmdo6MDDDSFBY+IiC9b0flANa9sVsJqDNXvzcVirj7GEsN64AlbdB0iTd07jy0+wUn97dn7qPuxNB2G9cwWsBh55nz5AMN62szPD+sGJ9cMNr7f/pmE9H7jY7jOs8dH263By+6F3YHn+WuEOCbUuDlhOYc3v3wwv0yhgOYX1++GftayLNLwsYHmFJcg/rCWdI7CeVx9Y9gGrwoHvGRoC1j1gAWsMWBUOfM/QED9ifw9Y5rD8BqwKB75nyHvAqnDge4a8B6wKB75nyHvuYYUUJ2BV78thDdcfWPYBq8KB7xnyHrAqHPieIe99G6zlA4zAEvVtsH5eJx8XAixRwKpw4HuGjjT51uHZdxwvJqy/5WrPMZkErFqwnj9NMflvfsMe+a0hVsdkeFnAqnaPNfsBnBWs8AusI0OX0VU+rMc7oB9YL1NhMfG7PlMasCoc+J6hWEg3n5n/vB+wCoe+AtaegGU7BKxnL1XAshgC1rOnJGAZDQHrGbBsh4D17PlgArCMhoD17MVm8atFFxPmP+78+ZhMqgArpDl9GaxJYfOVxOvKvhjWs3C5AEt8IIU5g/V8GX56gvXuBuWL0G+HJt9RBax2cwgryrgBq+GABSxJwAKWJGABS5JvWGENBlhtBKyfACxBwAKWJGABS1IRrCrPwjZ93riw8WRy208rd1ufDJfPS5VZ3ZRtxT0W91iSgAUsScACliTXsEICDLDaCFh/M8CyD1jAkgSs+7+Fe34pP7AOBay/mdt4XvVr103AApYkYAFLErCAJckzrJBSkgHrBRRYZgELWJK8wJr90Bew2s8NrDUIYLUcsIAlCVjAkgQsYElyDCuklQCriYD1d2q87wOWWcACliRgAUsSsIAlCVjAkgQsYEkCFrAk+YUVbnawxl8XX+nadRiwgCUJWMCSBCxgSQIWsCQBC1iSgAUsSW5hhR9LWE9ZwDILWMCSBCxgSQIWsCQBC1iS2oaV+DFVYPmocVibcsIPsJoOWAOs5+Cp167rgAUsSTtgXf+avLY9CCxgjX2GdR3/eJwAltm167qDsK7cY9ldu647eo8FLLtr13VFsOTPuLb32eKKnlbueeo+WeUJ5sxv0yZyeo91v3+xvcd6THKPZRawgCUJWDNYb578BFiHAtYM1uu8E65d1wELWJIOPPI+kGoC1oOLMaz7KLDM8vm1QmA1H7CAJQlYwJIELGBJAhawJLmENXwfsTWsv1lgmQUsYEkCFrAkAQtYkoAFLEnAApYkj7Cev8zKHNb0F3wDqzBgAUsSsIAlCVjAkgQsYElyCCs+8Y09rEgWWOUBC1iSgAUsScACliRgAUuSP1jjjS+ANaJN/OgqsA4FrA1Yr7fKrl3XAQtYkoAFLEnAApYkd7BeX3ZRwBrZAqswYAFLErCAJQlYwJIELGBJ8gZr8gMPwGo5YM1hRVnAKgxYwJIELGBJcgZr+quGgNVywAKWJGABSxKwFrCesoBVmC9Ys1/yD6yWK4Ilf8a15FO/7XmOuIynlYunwvI89ZW0uinbinss7rEkAQtYklzBmisRwVr9Fdm16zpgAUsSsIAlyROsxUc/wGq5FmFNfgq5BqyNp8QE1qGahPVzW9/4wPIVsIAlyRGs5ZeHZbDSzwMGrEM1A+sy/ciqKVjrXzwjehd0VTuwpjd5U7Diup/EL80yfRd0lR9Yq5+f0cGaP/kJsHICFrAkAQtYktzAWv+CISGs2VMJACsnYAFLErCAJckLrLC6oaWwEvuAdShgAUsSsIAlyQmsxMc8Wljrj+mAdShgAUuSD1ipByzFsFYPyALrUJVhpb5ZFFg9VBtW+iZfwEp+t4Ea1vK7KYB1KGABS5IHWItvFT4L1uJboYF1KGAdgfXxe0mBFXMAa8uGHNb8L99mb7Z7F3QasIAlqX1Yq9/+ch6s2T/CwDpU87DWvxP0RFjTz0eBdagqsLZ+hj4BK/EUJE3A2vgwHlixOrDegmgJ1uQx/9vGlrx3Qfc1Div1VLrxxCmwXl/+3oI1u+8CVqxtWInvwjob1ngMW7Bep468C7qvaVipb5Y5HVa81wTWoVqGlfyehvNhPe+zgHWohmGlv/RcAdZwIcA6VLuwNr5CWAPW41KAdahmYW2KqAHrfjHAOpQc1iXV2xvoDitsfiGnDqyfEPJgpR5JBdbQ9a/U6VVpWIub4PbJy/3zsDcPi1aCdaeVBWv25jfvp976DOs6/jE/ve71Dntz7/QRVvi7BdO/g3Rx6mRYt8eRvRtM3TsBa7MsWNs3+RtY4dH4WeGHG/p8WH8nX4eYHnxeErCqwwqTXoPNwhqaHjSw0hXBWjzxXsgo+bG9YfoFx+P5Ch/l3GO96eShXZ/1tnjg3gOW6TpgxYBlug5YMWCZrgNWDFim64AVO/DI+3VyOlmLtw+wKtXMrzESDQGrUsAyXQesGLBM1wErBizTdcCKAct0HbBiwDJdB6wYsEzXASsGLNN1wIoBy3QdsGLAMl0HrBiwTNcBKwYs03XAilnCIhoDFkkCFkkCFkkCFkkCFkkCFkkCFkkCFkkCFkmyhPX2Z8OqXFJ7R2R5SU1nCOv9T7PWuKT2jsjyktoOWG4vqe2sYTV1ccY3oh2sL8gUlulHD93C4mOsgz3eX3Yf4VpcyviHwWUZcbB8L7Vckx9jvS7O4CKaOiI+xjrU8Bm0xbts8rk4sFzX5D0WnxX6D1huL6nteOTd7yU1HV8rJEnAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIknAIkn/AXSrOrBWmALpAAAAAElFTkSuQmCC", "text/plain": [ "plot without title" ] }, "metadata": { "image/png": { "height": 240, "width": 300 } }, "output_type": "display_data" } ], "source": [ "set.seed(1111)\n", "\n", "ml_l = lrn(\"regr.xgboost\", nrounds = 300, eta = 0.1)\n", "ml_m = lrn(\"regr.xgboost\", nrounds = 300, eta = 0.1)\n", "ml_g = ml_l$clone()\n", "\n", "theta_nonorth = rep(NA, n_rep)\n", "se_nonorth = rep(NA, n_rep)\n", "\n", "for (i_rep in seq_len(n_rep)) {\n", " cat(sprintf(\"Replication %d/%d\", i_rep, n_rep), \"\\r\", sep=\"\")\n", " flush.console()\n", " df = data[[i_rep]]\n", " obj_dml_data = double_ml_data_from_data_frame(df, y_col = \"y\", d_cols = \"d\")\n", " obj_dml_plr_nonorth = DoubleMLPLR$new(obj_dml_data,\n", " ml_l, ml_m, ml_g,\n", " n_folds=2,\n", " score=non_orth_score,\n", " apply_cross_fitting=FALSE)\n", " obj_dml_plr_nonorth$fit()\n", " theta_nonorth[i_rep] = obj_dml_plr_nonorth$coef\n", " se_nonorth[i_rep] = obj_dml_plr_nonorth$se\n", "}\n", "\n", "g_nonorth = ggplot(data.frame(theta_rescaled=(theta_nonorth - alpha)/se_nonorth)) +\n", " geom_histogram(aes(y=after_stat(density), x=theta_rescaled, colour = \"Non-orthogonal ML\", fill=\"Non-orthogonal ML\"),\n", " bins = 30, alpha = 0.3) +\n", " geom_vline(aes(xintercept = 0), col = \"black\") +\n", " suppressWarnings(geom_function(fun = dnorm, aes(colour = \"N(0, 1)\", fill=\"N(0, 1)\"))) +\n", " scale_color_manual(name='',\n", " breaks=c(\"Non-orthogonal ML\", \"N(0, 1)\"),\n", " values=c(\"Non-orthogonal ML\"=\"dark blue\", \"N(0, 1)\"='black')) +\n", " scale_fill_manual(name='',\n", " breaks=c(\"Non-orthogonal ML\", \"N(0, 1)\"),\n", " values=c(\"Non-orthogonal ML\"=\"dark blue\", \"N(0, 1)\"=NA)) +\n", " xlim(c(-6.0, 6.0)) + xlab(\"\") + ylab(\"\") + theme_minimal()\n", "\n", "g_nonorth" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The regularization bias in the simple ML-approach is caused by the slow convergence of $\\hat{\\theta}_0$\n", "\n", "$$\n", "|\\sqrt{n} (\\hat{\\theta}_0 - \\theta_0) | \\rightarrow_{P} \\infty\n", "$$\n", "\n", "i.e., slower than $1/\\sqrt{n}$.\n", "The driving factor is the bias that arises by learning $g$ with a random forest or any other ML technique.\n", "A heuristic illustration is given by\n", "\n", "$$\n", "\\sqrt{n}(\\hat{\\theta}_0 - \\theta_0) = \\underbrace{\\left(\\frac{1}{n} \\sum_{i\\in I} D_i^2\\right)^{-1} \\frac{1}{n} \\sum_{i\\in I} D_i \\zeta_i}_{=:a}\n", "+ \\underbrace{\\left(\\frac{1}{n} \\sum_{i\\in I} D_i^2\\right)^{-1} \\frac{1}{n} \\sum_{i\\in I} D_i (g_0(X_i) - \\hat{g}_0(X_i))}_{=:b}.\n", "$$\n", "\n", "$a$ is approximately Gaussian under mild conditions.\n", "However, $b$ (the regularization bias) diverges in general." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overcoming regularization bias by orthogonalization\n", "\n", "To overcome the regularization bias we can partial out the effect of $X$ from $D$ to obtain the orthogonalized regressor $V = D - m(X)$. We then use the final estimate\n", "\n", "$$\n", "\\check{\\theta}_0 = \\left(\\frac{1}{n} \\sum_{i\\in I} \\hat{V}_i D_i\\right)^{-1} \\frac{1}{n} \\sum_{i\\in I} \\hat{V}_i (Y_i - \\hat{g}_0(X_i)).\n", "$$\n", "\n", "The following figure shows the distribution of the resulting estimates $\\hat{\\theta}_0$ without sample-splitting. Again, we are using external predictions to avoid cross-fitting (for demonstration purposes)." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Replication 1/1000\r" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Replication 1000/1000\r" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Warning message:\n", "\"\u001b[1m\u001b[22mRemoved 927 rows containing non-finite outside the scale range (`stat_bin()`).\"\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlgAAAHgCAMAAABOyeNrAAAAFVBMVEUAAABNTU3r6+vxz6X/jAD/3bP///+dazXEAAAACXBIWXMAABJ0AAASdAHeZh94AAANgklEQVR4nO3dAVuq6hKA0c/Tzv//k0/tTAFhhJFBcK917yl1ACnfh8ra0s5QoL16B3hPwqKEsCghLEoIixLCooSwKCEsSqwV1n8bjoJ93nI3iAgrPSIirPSIiLDSIyLCSo+ICCs9IiKs9IiIsNIjIsJKj4gIKz0iIqz0iIiw0iMiwkqPiMwI6/Sle3V0IWHR8zis0/XNz1VhMcPSsE6OWMyxMKyTL4XM8lRY/71Ge83dVnz639eysE5nRyxmWRTW4Pv4rg0f0damd1pYe7EsrB9jSwmLnsVPN7z8iNX+/m+b+xJWmrDSIyILnnnvfAN/b7NHtJ2FdQSH+11h+/3/a3eDB4SVHhERVnpE5GhhtbOwDuGgYU3tt7D2QljpERFhpUdEhJUeERFWekTkYGG1/ptX7QYPCSs9IiKs9IiIsNIjIsJKj4gIKz0icqyw2t3bl+wGjwkrPSIirPSIiLDSIyLCSo+ICCs9InKosFr3nbB27bhhje66sPZCWOkREWGlR0SElR4REVZ6RORIYbX+e2Ht2YHDGtt3Ye2FsNIjIsJKj4gIKz0iIqz0iMiBwmrDC8LasSOHNbLzwtoLYaVHRISVHhERVnpERFjpERFhpUdEhJUeEXkqrG3PF9imL2xirU/5v+E4R6wWXtpsN5hHWOkREWGlR0SElR4REVZ6RERY6RERYaVHRISVHhERVnpERFjpEZHDhNUeXdxmN5hJWOkREWGlR0SElR4REVZ6RERY6RERYaVHRISVHhERVnpERFjpERFhpUdEjhJWG78srL06eFjD/RfWXggrPSIirPSIiLDSIyLCSo+ICCs9IiKs9IiIsNIjIsJKj4gIKz0iIqz0iMhBwmpTV4S1U0cPa/ABCGsvhJUeERFWekREWOkREWGlR0SElR4REVZ6RERY6RGRGWGdvoxd7hEWPY/DOl3f9C/3CYseYaVHRJaFdR5evhIWPU+FtdkZ3drsa4VW/9y/taVhveab9/5eBtccsfbiGF8KhXU4wkqPiBzjp0JhHY6w0iMiC555P3Uu3xEWPcf4XaGwDkdY6RERYaVHRISVHhERVnpERFjpEZFDhDXYyeCqsPZCWOkREWGlR0SElR4REVZ6RERY6RERYaVHRISVHhERVnpERFjpERFhpUdEhJUeERFWekTkCGEN9zG4Lqy9EFZ6RERY6RERYaVHRISVHhERVnpERFjpERFhpUdEhJUeERFWekREWOkREWGlR0SElR4REVZ6ROQAYd3tYnCDsPZCWOkREWGlR0SElR4REVZ6RERY6RERYaVHRJ4Ka5vTud2dN+7xDSXW+pT/Gxyx0iMiwkqPiAgrPSIirPSIiLDSIyL7D+t+D4NbhLUXwkqPiAgrPSIirPSIiLDSIyLCSo+ICCs9IiKs9IiIsNIjIsJKj4gIKz0iIqz0iIiw0iMiwkqPiOw+rJEdDG4S1l4IKz0iIqz0iIiw0iMiwkqPiAgrPSIirPSIiLDSIyLCSo+ICCs9IiKs9IjI3sMa27/gNmHthbDSIyLCSo+ICCs96voTmLeFdyOs9Kjrz+ckYT1FWMLqEVZ61CWsIWGlR13CGhJWetQlrKGdhzW6e8GNwtoLYaVHXcIaElZ61CWsIWGlR13zwmo/xjbQBu/vb26jg4wH6/bva94fAaQWOn0Zu9wjrDlh9d71TIXVru/a2OIpj8O6vJ1acM69P17mdH3Tv9wnrPlhRUcBYQ0Ia2FYl6+J7XK9nW/XO18u2+UBHoR1WeuyXPfLa+vfdlmmd2V4H4Nlr3vz82W79e/ousOPLAvrPLx8JaxlYbXh++tD2nvowrDaeWT5sc22B/cxNr4W2L+jtklY9ecIHD0T4fwbVxV9kp4P6/Z+dliDVe8ezjax4MQ6/XE3rLGNPbI8rPib98Hfijz405FHh4rxvQtuPe4R6/a++6Nju1U1K6zz7Ytoa8MWJu7j+o1cZ7zDsPqfze+rH9PP5Qhr6mjSXXJJWN0vVo+OWOf+Oi8Oa7wrYe0mrOkWpsMaGW8d1kRXwloQ1uDw8PNkwu1BbCOLzg3rbrO9G6fv4348EVbVN+9TXQlrVlidb21a/0f4yacbbg9vb+VzJ46JpxtGjljzn2742cBgI79PN4z/8qBvxiK/z7affi6OP/UurBlhrWfGw7Z4pXnbvCb93J3NJqz3D+t25Hr+zmYT1qZhpR63p49Y3d8WPHlnswlr27B2T1jpUZewhoSVHnUJa2jXYU3sXHCzsPZCWOlRl39iPySs9IiIsNIjIsJKj4gIKz0iIqz0iIiw0iMiwkqPiAgrPSIirPSIiLDSIyLCSo+Wa513w78PHowOb89hTe1bcPvOw7r8xXrnv/4H0/0XfgcnrPRouc4rH4yGdZkLq0NYM9zCujXVBkuc7288JGGlR4M7HzfcwXYW1iL/fFhzCGs5Yc1wq0pYcwlrhktJwlrgQVgTf6W7h7DivyAWVtJGYU38u4JdhPV3D6cOqgVPkN69WMfdEsK6EdYMt2z6Lx7aBkvMedGNvRPWhmF1tMkrI9ePSFjCKrHjsCZ37R3Cij7x79DVW4X1NThOWO9OWMIqISxhlRCWsEoIS1glhCWsEsISVglhCauEsIRV4qmwRs699ufjrz+Dq/1b55k8UdzyQWT2vq31Kf83OGI5YpXYb1jTeyasA3irsM5NWHshLGGVEJawSghLWCWEJawSwhJWCWEJq4SwhFVCWMIqISxhlRCWsErsNqzcv7sLRsLalLCEVUJYwipxmLA658Jt0y9nJay9OE5Yl018/demOxDWXghLWCWEJawSwhJWCWEJq4SwhFVCWMIqsdew7n7rNy+s4DeMwtqUsIRVQljCKiEsYZUQlrBKCEtYJYQlrBLCElYJYQmrhLCEVWKnYd2/hJqwjuXdwpr+gIS1KWEJq4SwhFVCWMIqISxhlRCWsEoIS1glZoR1+tK5Nr6QsOh5HNbp+ubvBWExx8KwTtscsdr9SFjHsvSItfuwJj8iYW3qqbAen6QteVq5kfPDdbbUwk0tPbec08qVWOuIdX2Noa2OWOP6R6zfWy/31bva3fBHd4ODj2diD3lgtbA+bw/8JmH17u56dRBW927/G36Zvi300d3S4OOZ2EMeEJawSghLWCWEJawSC555/0lKWMyx1u8KhUXPLsNqI6PZYXU+JGG9jrCEVUJYwiohLGGVEJawSghLWCWEJawSwhJWiT2G1cZG88O6fUzCeh1hCauEsIRVQljCKiEsYZUQlrBKCEtYJXYYVhsdCetY3jGs6wclrNcRlrBKCEtYJYQlrBL7C6uNj4R1LMISVglhCauEsIRVoiqszqsa3Yc18RpEvT16Jqw/beQ1le7Cuu1iL6zeqsLKqgqrc3UkrN/hx/1R7neHngvr837D92FdF/oY2ZKwniMsYZUQlrBKCEtYJYQlrBLvGdZnE9aL7S2s6/4I69iEJawSwhJWiTcN66csYb2OsIRVQljCKrGzsG67I6xje9ewegtdNiisDT0V1v3J2T7655H7GD+t3MRC4TnhhqeVm9zEZdl2d+uDU95NfABOK5fjiOWIVUJYwiohLGGV2FdYnb15NqzvpYT1OsISVglhCauEsIRVQljCKrGrsLo783RYX4sJ63WEJawSwhJWCWEJq4SwhFViT2H19uX5sD6bsF5HWMIqsW1Y/dcIugsreHGjp8Ia3O3dFoW1vm3DGi40CKs7/ZjMYHlYPxucPG4Kq8COwmq96QphffZLFdaWhCWsEsISVglhCavEfsJq/ekaYf3pvoSDsDYlLGGVEJawSghLWCV2E1b7LAir+xIOwtqUsIRVQljCKrGXsNpwuk5YnWWFtSlhCavETsJqd9OVwroVK6xNCUtYJfYRVrufCuvY3j6s64+bwtrULsJqI1NhHdv7h/X7uyJhbWoPYbWx6XphXbYvrE3tIKw2Ol0xrJ8jorA29fqw2vh0zbD+riGsTb08rDYxXTWs71WEtSlhCavEq8OajGTdsL7WEdamXhxWm1hn9bA+m7A2NSOs05exyz2psFqbWmf9sIavaiSsWo/DOl3f9C/3JcJqbfDaDbVhfbTf58vutiis9b0orPblc/iiIMVhfR8h/97t3RaFtb5tw2q/LsONw/r7/ncXhFXqqbA6pxNss4y+ANZi62xm5h47X2HOWkes4NO++ijY5y13g4iw0iMiwkqPiAgrPSIirPSIyIJn3k+dy3eERc9avysUFj3CSo+ICCs9IiKs9IiIsNIjIsJKj4gIKz0iIqz0iIiw0iMiwkqPiAgrPSIirPSIyFphQY+wKCEsSgiLEsKihLAoISxKCIsSwqKEsCixVliT/y5s9bW2u6fsWpxXC2v6X7KuvdZ295Rdi2/CWn0tvq0Z1iarPvFg58IiZbWw0t+N7Dos32NlrRXW3xd2yKy3dKVsWJlEsh8VL/4e67bqHu/J91jPeDqsn5/Ilz4EnZ/jhfWW/FS4+lp8E9bqa/HNM+/rr8XZ7wopIixKCIsSwqKEsCghLEoIixLCooSwKCEsSgiLEsKihLAoISxKCIsSwqKEsCghLEoIixLCooSwKCEsSgiLEsKihLAoISxKCIsSwqKEsCghLEoIixLCooSwKCEsSgiLEv8DQZGOOtUKVCkAAAAASUVORK5CYII=", "text/plain": [ "plot without title" ] }, "metadata": { "image/png": { "height": 240, "width": 300 } }, "output_type": "display_data" } ], "source": [ "set.seed(2222)\n", "\n", "theta_orth_nosplit = rep(NA, n_rep)\n", "se_orth_nosplit = rep(NA, n_rep)\n", "\n", "for (i_rep in seq_len(n_rep)){\n", " cat(sprintf(\"Replication %d/%d\", i_rep, n_rep), \"\\r\", sep=\"\")\n", " flush.console()\n", " df = data[[i_rep]]\n", " obj_dml_data = double_ml_data_from_data_frame(df, y_col = \"y\", d_cols = \"d\")\n", " obj_dml_plr_orth_nosplit = DoubleMLPLR$new(obj_dml_data,\n", " ml_l, ml_m, ml_g,\n", " n_folds=1,\n", " score='IV-type',\n", " apply_cross_fitting=FALSE)\n", " obj_dml_plr_orth_nosplit$fit()\n", " theta_orth_nosplit[i_rep] = obj_dml_plr_orth_nosplit$coef\n", " se_orth_nosplit[i_rep] = obj_dml_plr_orth_nosplit$se\n", "}\n", "\n", "g_nosplit = ggplot(data.frame(theta_rescaled=(theta_orth_nosplit - alpha)/se_orth_nosplit), aes(x = theta_rescaled)) +\n", " geom_histogram(aes(y=after_stat(density), x=theta_rescaled, colour = \"Double ML (no sample splitting)\", fill=\"Double ML (no sample splitting)\"),\n", " bins = 30, alpha = 0.3) +\n", " geom_vline(aes(xintercept = 0), col = \"black\") +\n", " suppressWarnings(geom_function(fun = dnorm, aes(colour = \"N(0, 1)\", fill=\"N(0, 1)\"))) +\n", " scale_color_manual(name='',\n", " breaks=c(\"Double ML (no sample splitting)\", \"N(0, 1)\"),\n", " values=c(\"Double ML (no sample splitting)\"=\"dark orange\", \"N(0, 1)\"='black')) +\n", " scale_fill_manual(name='',\n", " breaks=c(\"Double ML (no sample splitting)\", \"N(0, 1)\"),\n", " values=c(\"Double ML (no sample splitting)\"=\"dark orange\", \"N(0, 1)\"=NA)) +\n", " xlim(c(-6.0, 6.0)) + xlab(\"\") + ylab(\"\") + theme_minimal()\n", "\n", "g_nosplit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the nuisance models $\\hat{g}_0()$ and $\\hat{m}()$ are estimated on the whole dataset, which is also used for obtaining the final estimate $\\check{\\theta}_0$, another bias is observed.\n", "\n", "## Sample splitting to remove bias induced by overfitting\n", "\n", "Using sample splitting, i.e., estimate the nuisance models $\\hat{g}_0()$ and $\\hat{m}()$ on one part of the data (training data) and estimate $\\check{\\theta}_0$ on the other part of the data (test data), overcomes the bias induced by overfitting. We can exploit the benefits of cross-fitting by switching the role of the training and test sample. Cross-fitting performs well empirically because the entire sample can be used for estimation.\n", "\n", "The following figure shows the distribution of the resulting estimates $\\hat{\\theta}_0$ with orthogonal score and sample-splitting." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Replication 1/1000\r" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Replication 1000/1000\r" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlgAAAHgCAMAAABOyeNrAAAAFVBMVEUAAAAAZABNTU2lw6Wz0bPr6+v///83EGXEAAAACXBIWXMAABJ0AAASdAHeZh94AAAM50lEQVR4nO3di1LqSBhG0QYZ3v+RhzsJBgwn+ZJuWLtqDugP2D2sQkUlZS8FKmsvQJ8ZWIoEliKBpUhgKRJYigSWIoGlSLPB+m+d6ev1r7QogZWZCqzIVGBFpgIrMhVYkanAikwFVmQqsCJTgRWZCqzIVGBFpgIrMtUYWNtD3TeHLwWWuv0Na3v75/wmWH9P9TasrUesEVO9C2vrU+GYqabB+m/9ytoLOJS4X5rvPVjbvUesMVO9B+vh6/he69yHpbzcAFir9R6sc4MXWwvWyx2AtVpvP91Q1SPWidWLLYC1WmAlpnrnmffOF/ADrXEflj1Ytdb0zwovqp7vAazVAisxFViRqZqGVfZgVdsnwHq+CbBWC6zEVGBFpgIrMlXLsDqmnu0CrNUCKzEVWJGpwIpM1TCsHqkn2wBrtcBKTAVWZKrPgLXZlM2p1Rela83C6j5UbXa7sjsEVj2BlZgKrMhUYEWmAisyFViRqT4M1rW1FqVbrcIq3dMDrJOs42nvkQus1QIrMRVYkanAikwFVmQqsCJTgRWZCqzIVGBFpmoVVumdOYEqYNUUWImpwIpMBVZkKrAiU4EVmQqsyFRgRaYCKzLVNFjrHSKw9M5sfg6Vn5/T6aHNsouZ6774qNp8xCr9c+dHquIRq6LASkwFVmQqsCJTgRWZCqzIVGBFpgIrMlXTsI6v0nB+jUiwqqtpWLvri4GAVV1Nwros+gHWroBVT2AlpgIrMhVYkanAikwFVmQqsCJTgRWZCqzIVE3Cuq4ZrIr7JFjnN8CqIrASU4EVmQqsyFRgRaYCKzIVWJGpWoR1WzJYFfdRsDYFrFoCKzEVWJGpwIpMBVZkKrAiU4EVmapBWPcVg1VxnwXr8hQpWOvXIqzNNbDqrUlYN0dgVRtYianAikwFVmSqMbC2h4bO9wNL3f6Gtb390z//EFjqBlZiqjdh7R/P3wNL3SbBWvbobefK5ShyP+ejyJXrmctpcVi5OnoX1upfvJf9q0esy9tLL0q/au5TIVhtBFZiqva+KwSrjcBKTPXOM+/bzvnfgaVuzf2sEKw2AisxFViRqcCKTAVWZCqwIlM1B+u43lewTu8Aa/3ASkwFVmQqsCJTgRWZCqzIVGBFpgIrMhVYkanAikwFVmSq1mCdlvsS1vE9YK0fWImpwIpMBVZkKrAiU4EVmQqsyFRgRaYCKzJVY7DOq30N6/AusNYPrMRUYEWmAisyFViRqcCKTAVWZCqwIlOBFZmqLViXxf4Ba1fAWj+wElOBFZkKrMhUYEWmmgZr2aO3nQ4pd+zVYeWOFYeVWz+PWImpwIpMBVZkKrAiU4EVmaopWGVz6Q9Yu/LrqgsvWW3BevADVsWBlZgKrMhUYEWmAisyFViRqcCKTAVWZCqwIlOBFZkKrMhUYEWmAisyVUuwyqOfp7A25eGqSy9ZYEWmAisyFViRqcCKTAVWZCqwIlOBFZkKrMhUYEWmAisyFViRqcCKTNUQrLIfD+u6LbBWC6zE9N7mRSNvos3ASkzvXZc0EFijAmswsKYG1mBgTQ2swcCaGliDgTU1sAYDa2pgDQbW1MAaDKypgTXYOFjl3NANlIfT3+9+cheWweuN7bCYzjX/4UZGXGV7aOh8P7AGGwmrd9LrGaxyO3l+F5Ynt/l3nQX9q8w/L7G9/dM//1BVsDaXV+5uDNbQvQHWyxtZGNb53bsmYV0+J5bL22V/f7vz6bJc7vQ+rNL5p5wvfr3+eX69rRPIy+Dx9Hq2XD9Dl+469qX/WfJZ78HaP56/l74Py/4dWJf3NwirPJ6W29ulc9HnsB4md6W/b3PwA3Zu6PaI1V/HfvirwX6TYC15UMDSP07hq+MVHk/P719maa/+702HdT/9G9bloaV7hf6VOqe9D7B/kDAA62E9f/U+rJW+eP/aR6z7afdbx76g7g29C+v2+NN9HALrq2Dte9cYhtV9IBsHq0/rbHcFWE9cgTVcE7CGT5eF9cwVWMO9/TxW9w48P4Lc3n400r3L+7f04KIHq3dDz8S8hhX44v2pK7CGe/+Z99L/tv7p0w33u/zha6/e5MFL56mMV0839G6glJ672Z5uuD3bvj2fffLUO1iDfebPCueBNS6wBvs4WPN9jTUysAb7OFi9z5ivLjbXxwNrsM+DNbJWYB3X+Q6s0wCs9QIrMb0H1tTAGgysqYE1mD+xnxpY6gZWYiqwIlOBFZmqFVinZYLVUB8L6zgBa73ASkwFVmQqsCJTgRWZCqzI9N0efid94Heexv5+XTWBlZi+2/XX3Tv/PfyNxP0CjQRWYvpunb95GIR1/fMFsGaenlf5HqzDqEFYvb/M6l1i//udNQdWYtqtDNe/zK8/3wPrGlgTAut5YE3orgqsx8Ca0EUSWAOBNSGwngfWhC5PJoA1EFgTurN5eJ2Oh0uM+hvkOgIrMZ1QefrGwNsVB1ZiOiGwHkreh5dFfgWsV/dIQ64+GdauNAnrQwIrMRVYkanAikzVBKzrIZfAaqg2YA26AavmwEpMBVZkKrAiU02DtcxR2w6wfh9O7q/Dyh0rFRxW7mvziJWYqglYZdgNWDX3ybB2BazVAisxFViRqcCKTAVWZCqwIlOBFZkKrMhUHw7r9eGQwAr22bB+Lm8vvWSBFVmyWoBVnrn5E9btp9dLL1lgRZYssCJLFliRJQusyJIFVmTJAiuyZIEVWbLAiixZYEWWLLAiS1YDsMp//w7rp4C1UmAlNiSwIhsSWJENCazIhgRWZEMCK7IhgRXZkMCKbEj1wyr7KbAu82WXrD1YmQ0JrMiGBFZkQwIrsiGBFdmQqod1WB9YLfbpsE4XAGv5wEpsSGNgbQ913npyKbDU7W9Y29s/pzNgjZjqXVhbj1hjpnr7EQusMVNNg5U/Yls5/vPkqHF/HlbufHq4RPjwcrPfKZ9Q5Y9Yx+VNe8Q6XsIj1vKBNfeSdQqsuZesU2DNvWSdAmvuJevUG8+8n0mBNWKq2n9WCFar1Q3rtLqJsA4XAWv5wEpsSGBFNiSwIhsSWJENCazIhgRWZEP6ElibwWP4ghWsaljnxU2Ftbu9nvcOrMUCK7EhgRXZkMCKbEhgRTYksCIbEliRDalqWJe1TYa1K2AtXtWwrk9rgtVeVcM6a/gBq8HASmxIYEU2pJphXb+Zmw7reghfsJYLrMSGBFZkQwIrsiGBFdmQKoZV9vPBun5fCNZigZXYkMCKbEhgRTYksCIbEliRDaleWIeFzQhrN3TUcbCCgZXYkMCKbEhgRTYksCIb0rfB6r04CFjBaoV1XNecsB4umFiyuoGV2JCmwQoequ14PLmXR4sbfVi5wQvOuta57ouPyiNWYkOqEdbpq+ty/0NVsFqsRli7iwOwGq5SWF0G88DqUgUrH1iJDQmsyIYEVmRDAiuyIVUKq3R9zATrdERMsJYKrMSGBFZkQwIrsiHVCav0fMwF63i0JrCWCqzEhgRWZEOqElbp+5gN1v1IAmDFAyuxIYEV2ZBqhHV/8ey5Yd1elxuseGAlNiSwIhtShbA6B4GbHdb1yzew4oGV2JDAimxIYEU2pPpg3V4hMgKr96ozYAUDK7EhVQfr/upFGViXK8y5ZA0FVmJDqg1W57VAQrDOf60z45I1GFiJDakyWKfVhGGdfvd9viVruKpgnRcD1if0hbCOv6I825L1pJpgXdYC1if0jbB2Bax4FcG6LiUPa1dmWrKeBlZiQ6oI1m0lC8DalFmWrOdVA+u+kCVgnT4cWMEqgHV7+e3r2wvAurze9+bFssCaVA2wdj+dn7MsA+t0zZ8dWLHqgNX5lYPFYO0KWMFGwNoeGjrfbwKs0j+Y4FKwbk+Uzr0h7cfA2t7+6Z9/aPT9sOkfNHBfjp+TTvf2taVg/ZTyYvtgTWoFWLd7+WDqeNdurrB2wwxysG5LmLQhDbUCrHLr8o41YZ3/J/QX9OaGNNQkWL0DDI5uM2cz39roHK/wj5Z/xJp3+nr9Ky1KYGWmAisyFViRqcCKTPXOM+/bzvnfgaVuFfysEKxPDKzEVGBFpgIrMhVYkanAikwFVmQqsCJTgRWZCqzIVGBFpgIrMtV8sKRuYCkSWIoEliKBpUhgKRJYigSWIoGlSGAp0mywnv9hWPC6q3zQidf9kuaC9eJPWXPXXeWDTrzutwTWwtf9lmaFtfQNTL6Dp8DSy+aDNfHLjrZg+Rrrr2aDdXplh3+/9j9ddRqsf8cxbbPfUQ1fY91voIUPOsfH/YKmwzp/6/1v/68737aD9VnV8Ijlu8IPDKyFr/steeZ96et+SX5WqEhgKRJYigSWIoGlSGApEliKBJYigaVIYCkSWIoEliKBpUhgKRJYigSWIoGlSGApEliKBJYigaVIYCkSWIoEliKBpUhgKRJYigSWIoGlSGApEliKBJYigaVIYCkSWIr0P8klX3zuIi8qAAAAAElFTkSuQmCC", "text/plain": [ "plot without title" ] }, "metadata": { "image/png": { "height": 240, "width": 300 } }, "output_type": "display_data" } ], "source": [ "set.seed(3333)\n", "\n", "theta_dml = rep(NA, n_rep)\n", "se_dml = rep(NA, n_rep)\n", "\n", "for (i_rep in seq_len(n_rep)) {\n", " cat(sprintf(\"Replication %d/%d\", i_rep, n_rep), \"\\r\", sep=\"\")\n", " flush.console()\n", " df = data[[i_rep]]\n", " obj_dml_data = double_ml_data_from_data_frame(df, y_col = \"y\", d_cols = \"d\")\n", " obj_dml_plr = DoubleMLPLR$new(obj_dml_data,\n", " ml_l, ml_m, ml_g,\n", " n_folds=2,\n", " score='IV-type')\n", " obj_dml_plr$fit()\n", " theta_dml[i_rep] = obj_dml_plr$coef\n", " se_dml[i_rep] = obj_dml_plr$se\n", "}\n", "\n", "g_dml = ggplot(data.frame(theta_rescaled=(theta_dml - alpha)/se_dml), aes(x = theta_rescaled)) +\n", " geom_histogram(aes(y=after_stat(density), x=theta_rescaled, colour = \"Double ML with cross-fitting\", fill=\"Double ML with cross-fitting\"),\n", " bins = 30, alpha = 0.3) +\n", " geom_vline(aes(xintercept = 0), col = \"black\") +\n", " suppressWarnings(geom_function(fun = dnorm, aes(colour = \"N(0, 1)\", fill=\"N(0, 1)\"))) +\n", " scale_color_manual(name='',\n", " breaks=c(\"Double ML with cross-fitting\", \"N(0, 1)\"),\n", " values=c(\"Double ML with cross-fitting\"=\"dark green\", \"N(0, 1)\"='black')) +\n", " scale_fill_manual(name='',\n", " breaks=c(\"Double ML with cross-fitting\", \"N(0, 1)\"),\n", " values=c(\"Double ML with cross-fitting\"=\"dark green\", \"N(0, 1)\"=NA)) +\n", " xlim(c(-6.0, 6.0)) + xlab(\"\") + ylab(\"\") + theme_minimal()\n", "\n", "g_dml" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Double/debiased machine learning\n", "\n", "To illustrate the benefits of the auxiliary prediction step in the DML framework we write the error as\n", "\n", "$$\n", "\\sqrt{n}(\\check{\\theta}_0 - \\theta_0) = a^* + b^* + c^*\n", "$$\n", "\n", "Chernozhukov et al. (2018) argues that:\n", "\n", "The first term\n", "\n", "$$\n", "a^* := (EV^2)^{-1} \\frac{1}{\\sqrt{n}} \\sum_{i\\in I} V_i \\zeta_i\n", "$$\n", "\n", "will be asymptotically normally distributed.\n", "\n", "The second term\n", "\n", "$$\n", "b^* := (EV^2)^{-1} \\frac{1}{\\sqrt{n}} \\sum_{i\\in I} (\\hat{m}(X_i) - m(X_i)) (\\hat{g}_0(X_i) - g_0(X_i))\n", "$$\n", "\n", "vanishes asymptotically for many data generating processes.\n", "\n", "The third term $c^*$ vanishes in probability if sample splitting is applied. Finally, let us compare all distributions." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "tags": [ "nbsphinx-thumbnail" ], "vscode": { "languageId": "r" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Warning message in geom_function(fun = dnorm, aes(colour = \"N(0, 1)\", fill = \"N(0, 1)\")):\n", "\"\u001b[1m\u001b[22mAll aesthetics have length 1, but the data has 1000 rows.\n", "\u001b[36mℹ\u001b[39m Did you mean to use `annotate()`?\"\n", "Warning message:\n", "\"\u001b[1m\u001b[22mRemoved 927 rows containing non-finite outside the scale range (`stat_bin()`).\"\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlgAAAHgCAMAAABOyeNrAAAAQlBMVEUAAAAAAIsAHmIAZABMKmJNTU10kpF+m5qHjWaOk2ylpc6lw6WzgACzs9yz0bPAnpHKp5rr6+vxz6X/jAD/3bP///9+XbiiAAAACXBIWXMAABJ0AAASdAHeZh94AAARQklEQVR4nO3dC2ObyJJA4c5oFMfOdeLYw///q1e8RHcLSk1BNd3knN0NlkoPxvoWy7IEriEyyB29AnTOgEUmAYtMAhaZBCwyCVhkErDIJGCRSXvB+pNxJKxzztUgKWCpRyQFLPWIpIClHpEUsNQjkgKWekRSwFKPSApY6hFJAUs9IilgqUckBSz1iKSApR6RFLDUI5JKgHW95Z+cvRCwKOg5rOv9n/4ksCihtbCubLEopZWwrvwopKQ2wfpzTO6Yu7X49p+3dbCuDVssSmoVrOh5vF/GR9S55ZUGVimtg9U3dylgUdDqlxsO32K57n/y3Bew1AFLPSKpFa+8e0/gH8v2iLoGWDVU3d8K3fi/x64GPQlY6hFJAUs9IqnaYLkGWFVUKayl9QZWKQFLPSIpYKlHJAUs9YikgKUekVRlsFz4z1GrQU8DlnpEUsBSj0gKWOoRSQFLPSIpYKlHJFUXLHf/93Jxl7ZDVoOeVy2sX+7XLWCVGrDUI5IClnpEUsBSj0gKWOoRSVUFy00LYBVevbA6WZexnKtBCdUO61cfsEoLWOoRSQFLPSIpYKlHJFUTLOctgVV4FcNqZQGr1IClHpEUsNQjkgKWekRSwFKPSKoiWM7/AliFVzOsmyxglRqw1COSApZ6RFLAUo9ICljqEUkBSz0iKWCpRyS1CVbe4wU6/4vLz1vuZ7e4dclw/3t9y/+O6tliueCrflPl2GKVGrDUI5IClnpEUsBSj0gKWOoRSQFLPSIpYKlHJAUs9YikaoPV7qeh20kksMquOli//A8UAqvYqoE1rGgIq9+XEbAKDFjqEUkBSz0iKWCpRyQFLPWIpIClHpEUsNQjkgKWekRSwFKPSApY6hFJAUs9IqlaYI3rCaxKqhzWxQGrzIClHpEUsNQjkgKWekRSwFKPSApY6hFJAUs9IilgqUckBSz1iKSApR6RFLDUI5KqBNZ9NYFVSbXDGv4MDazSApZ6RFLAUo9IqhpYlyFg1VE9sEZJwKoiYKlHJAUs9YikgKUekVQCrOutua+DgEVBz2Fd7/+EX4cBi4KApR6R1DpYTfz1PWBR0CZYGY7l1ueG48j97I4j54Zlv3AcVq7E1sI65sm7axa3WP0miy1WadXxoxBY1QUs9Yik6vitEFjVBSz1iKRWvPJ+9b5+CFgUVMffCoFVXcBSj0gKWOoRSQFLPSIpYKlHJAUs9YikqoA1HQR6BlZ/Msdq0IqApR6RFLDUI5IClnpEUsBSj0gKWOoRSQFLPSIpYKlHJAUs9YikgKUekRSw1COSApZ6RFLAUo9IqgZY3Touw2pPA6u0gKUekRSw1COSApZ6RFLAUo9ICljqEUkBSz0iKWCpRyQFLPWIpIClHpEUsNQjkgKWekRSwFKPSApY6hFJVQCrX0UB1u0MYJUWsNQjkgKWekRSwFKPSApY6hFJAUs9IilgqUcktQlWhmO5/emOKfdHOKxcdwaHlSut4rdY3765b21sseqqfFjv7r0NWHUFLPWIpIClHpEUsNQjv29CabdwtoqHdXGXPgHWL3c4rPfFgLUpQ1gvXcCqK2CpR37AigOWeuQHrLh6YP3TBaxKqgfWtOECVgUBSz3yA1YcsNQjP2DFAUs98gNWHLDUI780WC5YPG/tgxPdvtvv4V0fsNQjv0RY/Xc76XvuUi8YXce7G2BJo1PB6r/dwFoRsNbBcs71dJybHgL/zPbL+9RNy9lr3097d+P4USiOzgVr3BCNwHwPj2e6+NKNtx1y8/MGWGmjwZUM65eLrpVjDf3UsIZlc1/6Z8ZnLC39G2+AlTY6Gazh0d4bVv+zcPqt0Pn3c0jlwhreJnc2WN6WZFp6LlSw/O0gsJ6Mhsfqr4AVjNfDeniOBSxhdFZYzezG6b58Bmvu2sH5c9uwAwKWeuS3+pV3/wWC+2MQnOkeX5x4fLlhmjfzsIYXLvIHLPXIL+PfCo/bCK0KWOqRXx5Y4fat7IClHvll2mId9GNNE7DUIz/eNhMHLPXID1hxpcNylyRYF2e1GssjP2DFAUs98uMj9nHVwpo+DVYCLIqrFtZ0sgRYX0Jpt3C2gKUe+X39txiwNgUsYAUBSz3yA1ZcdbBuz9hd+6wdWGVXHazb/7lwAwasEisclntPhDX8hwCrlIClHvkBKw5Y6pEfsOKApR75pcES3s/pouXj2W52oOnJdcP7mrlw0n0nXOh6a+7rIGClwAoWQUuworfJxxdX9RxW473nfv0NJF7mev8n/DrsaFjDXrurgCVtBYAVdTSs7lXTXxXBij4PMe6Gwf/oRDN8jKt5+FDzcK1ozw7+1+HeHcJdPbj563jjce8R42c0os9ypMhaB6uJv74HrHWwXLy8P6TBQyfCcs3M5edu1j25j7nxXWB4Ry4LLNPDA357u+XeLt/7LtPCeSd/jsctdN3CcoWkb9J2WNMyGVZ01YeH0y1ccOE64diHNXdjz1oPS37yHr1X5MlbR55tsdx78har/wtivVusaen/6ugmVUmwmumHaPCpfek+gv3BlQsr/G62Jz+XX8sB1tLWxL/kGlj+D6tnW6wmvM7BsOZdAasYWMsWlmHNjHPDWnAFrBWwos1D/2LC9CC6mYumwnq42eDM5ft4HC/AsnryvuQKWEmwvKc20e4YFl9umB7e4MqNh2Ph5YaZLVb6yw39DUQ3Mu064vl3JOEi46vt1/7L+ZfegZUAa78SHrbVV0q7zTvpbXeWHLDOD2vacm2/s+SAlRWW6nHbvMUKd8689dbSAlZeWMUHLPXID1hxwFKP/IAVVzSsm6t0WK0sYBUTsNQjPz5iHwcs9YikgKUe+V2E0m7hbAFLPfIb12MmYG0KWMAKApZ65AesOGCpR37AigOWeuQHrDhgqUd+wIqrHdY/005ugVVStcMap8AqLGCpR35psHbcKUh8CeXjOH60edONzN/yTrdjAat1tQLWTVbpsIJF0BKs6T3uy49U8pvvlu5VfwNPb3pzwEqHNfdNB9ZCwFoJa8tOQfzP3gxHKr/vuKPxb3vYIYjzzgxu/X5NYQcgyoClHvmthuXipffBK++iy7Ciif9m9Pg2Z+/Qu6H7Fitcj6RPeS0GLPXIbzusafkc1rBp8a8QXslbBnfQRA/4DKxofdQBSz3y2xVW0k5B1sK6b3/87RCw/ipYTXCNeVj+hiwNVkhr2hlN3bAW3qVbAiz5HcRngjW/rBvWwucKioDVreHSRtXqdSz/AVy/U5BxE9QELgJYwQ0tiZFh1fDkHVhtu+0UJIIV7xo0+MSy8HJDcAMLOwDRBqyMsGoLWP27HNw/wNqrMz/H6lylwnrpzgfWbi39tTz5+jutx46wxqO/r4b14oBVTCXC6kC9AavqgAUsk4C1Dyw+CR1VLqzhsEt1wKK4kmHN0AFWLQELWCYBC1gmAQtYJm2CNXPsta/Prq/oZHiuXHc8ubcbLP9wckuHlQumTnNYueR12+tb/nfEFostlknlwnJzdIBVS6eC9eKAVUrAApZJwAKWScDKCCt6T/rMe542v7+umE4Ga3grl7SGcTlhBW9Vn1HkH/a78k4G6627brGwvI1VMwNr/PgCsKaAldAEK/hkVnCJ5vHMKgPWTrDcfOFlHj6+B6xnFQJLeG9dEVssYK2tEFjf+1PSGsZl/q3QAWtVwEpokASsFe0Aa/zYVw/LzdORYY1/uJbWMA5YJpUE673v5LCmHYoCKyFgJTSxifbTEV1i22eQy6geWP/raheuXdYIy8stnpg5XWP1wHrtaheuXQKr7IB10LsbhG/8GVwBi7fN2AQsYJkELGCZBCxgmQQsYJlUKiz3poL13QGrjIAFLJOABSyTgAUsk4AFLJOABSyTgAUsk4AFLJOABSyTgAUskwqF5d6BVXdng9W/IV5awzhgmQQsYJkELGCZVAKs4IOqwDpHRcDy34cFrHMELGCZBCxgmZQA63rLOzV/IWBR0HNY1/s/3RfAopRWwrrm2WK5d2BV3totVvGwWlnAOr5NsJ4fpC3psHLDceSGRbt07WHlogPH/ehqF65dzh1Wbjzo3Kqjy3FYOZP22mJ9jeXaYk17NfKW0RZrXKfhvoKT05oPW6xw+jV/YUptN1j/DXZywZpOessYln+3f+If09OFPqcVH6fRSVrbOWENfyMC1nGdE9ZwS8A6LmABy6QVr7z3pIBFKe31t0JgUVCRsG6u9LBusoB1fMAClknAApZJwAKWScAClknAApZJwAKWSeeENQaswyoRVutqA6wXd78OsA4LWMAyCVjAMglYwDIJWMAyCVjAMglYwDKpQFidK2BVXomwhlc31bBeHLAOr0RYHYrvwKo6YAHLJGABy6TyYA1PvXeB9W8XsA7o3LA+uoB1QMAClknAApZJVrC8vRo9wvqabzdYFzfs1UiENa1iACvYHROwtFnB8k7OwBqHn/1yePyH1933gDWclGHdV/EzWuPHDRitDVjAMglYwDIJWMAyCVjAMumcsF4dsA6uNFjuHVinCFjAMglYwDLppLB6WcA6LmABy6Rzwxrq7hRYWSsMVvsJnR1hDed2dwqsrJ0VVnCh7k6BlbVNsB4PzvYZHkfuc/6wcvGFfnd9vc0dT27psHLTSW95mS7rXyjpkHcL/wEcVk4XWyy2WCYVBKt9mu3un1QFVt2VBOulf1MxsM5QWbDcy26w2ksB67iABSyTgAUskw6F9e/wSWVgna9DYX387l9nANb5KgqWe9kR1u1iwDouYAHLpGNgjc+tMsH61gesjB0Da/gbTi5Y/U5N34GVMWABy6SSYLmXXWG9OmAdF7CAZVJeWGMLsIadD4WLHWCFuyYCVo7ywhrPXYDlI/kxqtgBVnfzvz+AlbGCYLkAyQ6wXoNdOAAra8AClknAApZJwAKWSeXAciGSPWBd/F04ACtrwAKWScAClkl5YYVvGQXWicsL6+PDkxTCch8GsPydzgAra8AClknAApZJpcByHyaw+suOsPzDFwLLtr8K1nC34SoDy6RCYN1c2cDqPgUGrPwBC1gmlQGrdQWsU5UJVvDCaFZY7WcqIljRp8GAZVEmWAGoB1idq3yw+iO5vgPLsvPDenVLsIbNKLAsKgFW78oM1qtbgDWcBJZFBcByX8awLg5Y2Tselvswh9VtE4GVtcNhuY8MsNp7AVbWjoR1GXfAbQ1r2M33BVj5OhTW6w/vb3mGsLrr/HgFVsasYI3vFY3eMhrD8o/+Zgrr1aXCur8IAawtJcC63pr7OiiG5RH67T2pCmG54ECotrDuL5Q+g3U/F1hbeg7rev8n/Dos/n/4WVjj8QP7R9i1P566B37IFtYP5+ZhRZsoYO3S7rAet02/H3/vu6FyrnuO9fD428F6vbjubmNY3Rp+AGvf8sJyY+NzrLywhg1lH7BM2wTLO16hS+qyS/vcTNoac7xCZXttsYRv++4jYZ1zrgZJAUs9IilgqUckBSz1iKSApR6R1IpX3q/e1w8Bi4L2+lshsCgIWOoRSQFLPSIpYKlHJAUs9YikgKUekRSw1COSApZ6RFLAUo9ICljqEUkBSz0iKWCpRyS1FyyiIGCRScAik4BFJgGLTAIWmQQsMglYZBKwyCRgkUl7wVr8XNju18p3T9prUbMbrOVPsu59rXz3pL0WtQFr92tR256wslx1w4Otg0WqdoOlfjZSNCyeY2nbC1a3YwfN9dZeSQtLQ0T7X0UHP8earlriPfEca0ubYfW/ka99CLzf44F1yvitcPdrURuwdr8WtfHK+/7Xooa/FZJRwCKTgEUmAYtMAhaZBCwyCVhkErDIJGCRScAik4BFJgGLTAIWmQQsMglYZBKwyCRgkUnAIpOARSYBi0wCFpkELDIJWGQSsMgkYJFJwCKTgEUmAYtMAhaZBCwyCVhkErDIJGCRScAik4BFJv0fr7/1Pq0fzzMAAAAASUVORK5CYII=", "text/plain": [ "plot without title" ] }, "metadata": { "image/png": { "height": 240, "width": 300 } }, "output_type": "display_data" } ], "source": [ "g_all = ggplot(data.frame(t_nonorth=(theta_nonorth - alpha)/se_nonorth,\n", " t_orth_nosplit=(theta_orth_nosplit - alpha)/se_orth_nosplit,\n", " t_dml=(theta_dml - alpha)/se_dml)) +\n", " geom_histogram(aes(x = t_nonorth, y=after_stat(density), colour = \"Non-orthogonal ML\", fill=\"Non-orthogonal ML\"),\n", " bins = 30, alpha = 0.3) +\n", " geom_histogram(aes(x = t_orth_nosplit, y=after_stat(density), colour = \"Double ML (no sample splitting)\", fill=\"Double ML (no sample splitting)\"),\n", " bins = 30, alpha = 0.3) +\n", " geom_histogram(aes(x = t_dml, y=after_stat(density), colour = \"Double ML with cross-fitting\", fill=\"Double ML with cross-fitting\"),\n", " bins = 30, alpha = 0.3) +\n", " geom_vline(aes(xintercept = 0), col = \"black\") +\n", " suppressWarnings(geom_function(fun = dnorm, aes(colour = \"N(0, 1)\", fill=\"N(0, 1)\"))) +\n", " scale_color_manual(name='',\n", " breaks=c(\"Non-orthogonal ML\", \"Double ML (no sample splitting)\", \"Double ML with cross-fitting\", \"N(0, 1)\"),\n", " values=c(\"Non-orthogonal ML\"=\"dark blue\",\n", " \"Double ML (no sample splitting)\"=\"dark orange\",\n", " \"Double ML with cross-fitting\"=\"dark green\",\n", " \"N(0, 1)\"='black')) +\n", " scale_fill_manual(name='',\n", " breaks=c(\"Non-orthogonal ML\", \"Double ML (no sample splitting)\", \"Double ML with cross-fitting\", \"N(0, 1)\"),\n", " values=c(\"Non-orthogonal ML\"=\"dark blue\",\n", " \"Double ML (no sample splitting)\"=\"dark orange\",\n", " \"Double ML with cross-fitting\"=\"dark green\",\n", " \"N(0, 1)\"=NA)) +\n", " xlim(c(-6.0, 6.0)) + xlab(\"\") + ylab(\"\") + theme_minimal()\n", "\n", "print(g_all)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Partialling out score\n", "\n", "Another debiased estimator, based on the partialling-out approach of Robinson(1988), is\n", "\n", "$$\n", "\\check{\\theta}_0 = \\left(\\frac{1}{n} \\sum_{i\\in I} \\hat{V}_i \\hat{V}_i \\right)^{-1} \\frac{1}{n} \\sum_{i\\in I} \\hat{V}_i (Y_i - \\hat{\\ell}_0(X_i)),\n", "$$\n", "\n", "with $\\ell_0(X_i) = E(Y|X)$.\n", "All nuisance parameters for the estimator with `score='partialling out'` are conditional mean functions, which can be directly estimated using ML methods. This is a minor advantage over the estimator with `score='IV-type'`.\n", "In the following, we repeat the above analysis with `score='partialling out'`. In a first part of the analysis, we estimate $\\theta_0$ without sample splitting. Again we observe a bias from overfitting.\n", "\n", "The following figure shows the distribution of the resulting estimates $\\hat{\\theta}_0$ without sample-splitting." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Replication 1/1000\r" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Replication 1000/1000\r" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Warning message:\n", "\"\u001b[1m\u001b[22mRemoved 27 rows containing non-finite outside the scale range (`stat_bin()`).\"\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlgAAAHgCAMAAABOyeNrAAAAFVBMVEUAAABNTU3r6+vxz6X/jAD/3bP///+dazXEAAAACXBIWXMAABJ0AAASdAHeZh94AAANcUlEQVR4nO3dgVriugJF4TBK3/+RBwFp06abdjcJSV3/vUeBDQK6PnScUcIAFBA+fQNwToSFIggLRRAWiiAsFEFYKIKwUARhoYhcYf2rOInbXPNmQCEse4JCWPYEhbDsCQph2RMUwrInKIRlT1AIy56gEJY9QSEse4JCWPYEhbDsCQph2ROUDWFdbqZHk2ciLETeh3V5vXgcJSxssDesC49Y2GJnWBc+FWKTQ2H9+4zwmast8e4/r31hXQYesbDJrrBmX8dPVfyIhrB+owmrFfvCekidi7AQ2f3tho8/YoX7/+pcF2HZCMueoOz4zvvkC/ilah/RMBBWD7r7u8Lw+//P3gy8QVj2BIWw7AlKb2GFgbC60GlYa7ebsFpBWPYEhbDsCQph2RMUwrInKJ2FFeIXn7oZeIuw7AkKYdkTFMKyJyiEZU9QCMueoPQVVli8/MjNwHuEZU9QCMueoBCWPUEhLHuCQlj2BKWrsML0FWE1rd+wkjedsFpBWPYEhbDsCQph2RMUwrInKD2FFeLXhNWyjsNK3XbCagVh2RMUwrInKIRlT1AIy56gdBRWmB8grIb1HFbixhNWKwjLnqAQlj1BISx7gkJY9gSFsOwJCmHZE5RDYdV9vsCwfqCKXO/yv6GfR6wgD1W7GdiGsOwJCmHZExTCsicohGVPUAjLnqAQlj1BISx7gkJY9gSFsOwJSjdhhXcH69wMbERY9gSFsOwJCmHZExTCsicohGVPUAjLnqAQlj1BISx7gkJY9gSFsOwJSi9hhfRhwmpV52HNbz9htYKw7AkKYdkTFMKyJyiEZU9QCMueoBCWPUEhLHuCQlj2BIWw7AkKYdkTlE7CCmtHCKtRvYc1uwOE1QrCsicohGVPUAjLnqAQlj1BISx7gkJY9gSFsOwJyoawLjepwxHCQuR9WJfXi/hwjLAQISx7grIvrGF++IWwEDkUVrVndAubjxWU/X1/anvD+swX7/GtFMd4xGpFH58KCas7hGVPUPr4UyFhdYew7AnKju+8XyaHFwgLkT7+rpCwukNY9gSFsOwJCmHZExTCsicohGVPULoIa3YjxVHCagVh2RMUwrInKIRlT1AIy56gEJY9QSEse4JCWPYEhbDsCQph2RMUwrInKIRlT1AIy56g9BDW/DaK44TVCsKyJyiEZU9QCMueoBCWPUEhLHuCQlj2BIWw7AkKYdkTFMKyJyiEZU9QCMueoBCWPUEhLHuC0kFYi5soTiCsVhCWPUEhLHuCQlj2BIWw7AkKYdkTFMKyJyiHwqrzdG6L5417f0IRud7lfwOPWPYEhbDsCQph2RMUwrInKIRlT1DaD2t5C8UphNUKwrInKIRlT1AIy56gEJY9QSEse4JCWPYEhbDsCQph2RMUwrInKIRlT1AIy56gEJY9QSEse4LSfFiJGyhOIqxWEJY9QSEse4JCWPYEhbDsCQph2RMUwrInKIRlT1AIy56gEJY9QSEse4LSelip2ydOI6xWEJY9QSEse4JCWPY09S1sewtnQ1j2NPV9XUVYhxAWYUVaCGv2OSM6SlidaiKs+CPwOPpFWF1rPqzE18CE1YHGwwqvD9mbsJ4nElYrCMuepghrjrDsaYqw5gjLnqa2hRUeUm8gzF4vTw7JwfHmsvF1bftHANaZLjepw5ECYf24fe1+prCiV5G1sMLrVUid3fI+rOfLtTNuufb357m8XsSHYwXC+nkZricMSz0KENZMjrB+/Ymwnp8Tw/N4GMbjk0+X4fkBnoX1vNTzfNNPryE+7Xme6Mj8Ombnfd2ax6ftEF/R6wa/sy+sYX74JUdY1+nrk4cV5q9fH9LoQyfDCkPi/Kk3G95cR2p+FRhfUagSVqYnAvz+unu++poeDeOp4/mTT09Y/jkL1TvpeFjj681hzS66+HCGlTOuXCaep2Gl3tg7+8Mq98X78hErXDc/Yj1O7fcRa3w9/aNjGKvaFNYwfhINYd7CynW8vpCbzIQ1O/UUYQ3RJfaFNf1k9e4Ra4gv8+Gw0l0RVjNhrbewHlZirh3WSleEtSOs2cPD45sJ4wcxJM66NazFm41OXL+O5bwSVqkv3te6IqxNYU2+tAnxH+FXv90wfnijCw+TOFa+3ZB4xNr+7YbHG5i9kd9vN6T/8iC24Sy/322/PA6mv/VOWBvCymfDh233hba9zVfSx65sM8I6f1jjI9fxK9uMsKqGZX3cDj9iTf+24OCVbUZYdcNq3ifDiv9ykLBO5aNhXb+mRRHWmTQdVrjuCOt+MmG1grDsaYofsZ8jLHuCQlj2BIWw7AkKYdkTFMKyJyiEZU9QCMueoBCWPUEhLHuCQlj2BIWw7Gm/2T8yj/598GzqXsthheuusH5Obzys579Yn/wX35npT/h1jrDsab/Jbz5IhvXcCWuCsDYYw4p+eiY6x7A8sUuEZU+zK0+b38Dx56sIa5M/H9YWhLUfYW0wVkVYW5UPa/z3mF2HNSx/Qnl+DsKaKB/W+MBFWB0gLHva7/nNBMLagbA2GLOJf3lomJ1jyy/daB1h2dMhYfVI4niPCMueDiGsjQqEFRKnniYs9Y4/Q1enCus29BPW2RGWPUEhLHuCUjWs2a8zIKwTqxvWdaxjIKxTIyx7gkJY9gSFsOwJCmHZE5RDYe19YrbZU8SlnkducjQkT/0nnkGu7FPL5XqX/w08YtkTlHbDCqlTCasXpwprCITVCsKyJyiEZU9QCMueoBCWPUEhLHuCQlj2BOUzYf0+nRxhndZnwvp9RVinRVj2BIWw7AlKs2GF5KlvwlITYVVFWPYEhbDsCQph2RMUwrInKIRlT1AIy56gEJY9QSEse4JCWPYEhbDsCUqrYYUvK6x/6xthVUVY9gSFsOwJCmHZE5Q6Yb3+LTJh/RWVwkpHQljnRVj2BIWw7AkKYdkTFMKyJyiNhhWuhNW3s4W1PhJWVYRlT1AIy56gEJY9QSEse4LSXVg3YXxmuuV1EVYbugvrPl6vhNW4DWFdbibH0mciLETeh3V5vbgfICxssTOsS51HrHAlrM7tfcRqPqzVe0RYVR0Ka/Mzsq08cdzq0SDPFMZnpkso9txy2d/3p8Yjlj1BISx7gkJY9gSFsOwJCmHZE5Qd33l/JEVY2KLNvyskrO41GdZPOe/CekrcjJW7RFhVdRvW82jiZhBWCwjLnqAQlj1BISx7gkJY9gSFsOwJCmHZE5QWw7o/t6of1sp9IqyqCMueoBCWPUEhLHuCQlj2BIWw7AkKYdkTlAbDundFWJ0rFNb3TNWw0neKsKoqFdYkh6/E99YJ6+wIy56gEJY9QWkvrOdz9hJW33oPa/ZDFYTVit7D+l2jm0FYn0dY9gTllGEl7xVhVdVcWOFKWGdAWPYEhbDsCQph2RMUwrInKOcMK3W3CKuq1sIKV8I6BcKyJyiEZU9QThpW4n4RVlWEZU9QCMueoDQW1tgMYfXtrGEt7xhhVXUorPWnYEs/Y9z7p5VbPp/cpjMlnmUu/7PL5XqX/w08YtkTFMKyJyiEZU9Q2gorjEePhrW4Z4RVFWHZExTCsicohGVPUHKFlf69RYT1Z2ULa/nx3x9WmBw9HNb8rhFWVYRlT1AIy56gnCWsJ8JqxVnCer4irFa0FFaYrsfDmt03wqrqZGFFv4aNsD7oZGE932DqvhFWVYRlT1AaCitEa4aw4jtHWFURlj1BISx7gkJY9gSlnbBCvOYIK7p3hFUVYdkTFMKyJyiEZU9QmgkrXAuENb17hFUVYdkTFMKyJyithBXma56wJvePsKoiLHuC0khYYbFmCmu8g4RVFWHZE5Q2wgrLlbD6ds6wpv9E+fceElZVTYQVEuuhsH5Pnd5Dwqrq3GHdhcdnRsKqqoWwQmrNEtbj1eM+ElZVDYQVkmvGsL7vd5Kwqvp8WCG95gzrfi8Jq6qPhxVW1qxh/dxNwqqKsAbCKuHTYa1Gkjes2/0krKo+HFZYuUz2sIbwb4h/s8MEYeW3IazLTepwxAorhLXL5A/rdk+jo1OEld/7sC6vF/HhmBHWLavl8xWWC+tf+P1+2QxhlfChsMLNNfVEmAXDGr7D82rnK2HlVzes8Os5Vg7rfux1EwirpENhTZ8ecJP5ZyFPnjez7RbzfIWmXI9Y2f8wLyZxm2veDCiEZU9QCMueoBCWPUEhLHuCsuM775fJ4QXCQqTQ8xUWnQirA4RlT1AIy56gEJY9QSEse4JCWPYEhbDsCQph2RMUwrInKIRlT1AIy56gEJY9QckVFhAhLBRBWCiCsFAEYaEIwkIRhIUiCAtFEBaKICwUkSus1Z8Ly36petfkXgpDtrDWf5I196XqXZN7KfwgrOyXwo+cYVW56IEPthcWLNnCsr8aaTosvsZy5Qrr/osdnMvtvZAblpOIe6/w4a+xxou2eE18jXXE4bAefyLf+yGY/DmesE6JPxVmvxR+EFb2S+EH33nPfykM/F0hCiEsFEFYKIKwUARhoQjCQhGEhSIIC0UQFoogLBRBWCiCsFAEYaEIwkIRhIUiCAtFEBaKICwUQVgogrBQBGGhCMJCEYSFIggLRRAWiiAsFEFYKIKwUARhoQjCQhGEhSIIC0UQFoogLBRBWCjiP60ukGO+PblgAAAAAElFTkSuQmCC", "text/plain": [ "plot without title" ] }, "metadata": { "image/png": { "height": 240, "width": 300 } }, "output_type": "display_data" } ], "source": [ "set.seed(4444)\n", "\n", "theta_orth_po_nosplit = rep(NA, n_rep)\n", "se_orth_po_nosplit = rep(NA, n_rep)\n", "\n", "for (i_rep in seq_len(n_rep)){\n", " cat(sprintf(\"Replication %d/%d\", i_rep, n_rep), \"\\r\", sep=\"\")\n", " flush.console()\n", " df = data[[i_rep]]\n", " obj_dml_data = double_ml_data_from_data_frame(df, y_col = \"y\", d_cols = \"d\")\n", " obj_dml_plr_orth_nosplit = DoubleMLPLR$new(obj_dml_data,\n", " ml_l, ml_m,\n", " n_folds=1,\n", " score='partialling out',\n", " apply_cross_fitting=FALSE)\n", " obj_dml_plr_orth_nosplit$fit()\n", " theta_orth_po_nosplit[i_rep] = obj_dml_plr_orth_nosplit$coef\n", " se_orth_po_nosplit[i_rep] = obj_dml_plr_orth_nosplit$se\n", "}\n", "\n", "g_nosplit_po = ggplot(data.frame(theta_rescaled=(theta_orth_po_nosplit - alpha)/se_orth_po_nosplit), aes(x = theta_rescaled)) +\n", " geom_histogram(aes(y=after_stat(density), x=theta_rescaled, colour = \"Double ML (no sample splitting)\", fill=\"Double ML (no sample splitting)\"),\n", " bins = 30, alpha = 0.3) +\n", " geom_vline(aes(xintercept = 0), col = \"black\") +\n", " suppressWarnings(geom_function(fun = dnorm, aes(colour = \"N(0, 1)\", fill=\"N(0, 1)\"))) +\n", " scale_color_manual(name='',\n", " breaks=c(\"Double ML (no sample splitting)\", \"N(0, 1)\"),\n", " values=c(\"Double ML (no sample splitting)\"=\"dark orange\", \"N(0, 1)\"='black')) +\n", " scale_fill_manual(name='',,\n", " breaks=c(\"Double ML (no sample splitting)\", \"N(0, 1)\"),\n", " values=c(\"Double ML (no sample splitting)\"=\"dark orange\", \"N(0, 1)\"=NA)) +\n", " xlim(c(-6.0, 6.0)) + xlab(\"\") + ylab(\"\") + theme_minimal()\n", "g_nosplit_po" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using sample splitting, overcomes the bias induced by overfitting.\n", "Again, the implementation automatically applies cross-fitting." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Replication 1000/1000\r" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlgAAAHgCAMAAABOyeNrAAAAFVBMVEUAAAAAZABNTU2lw6Wz0bPr6+v///83EGXEAAAACXBIWXMAABJ0AAASdAHeZh94AAANJklEQVR4nO3di1biSABF0RKZ/P8njy0ClVBAMLnk4T5rTfu4alvDXrRtqymdFKgs/Q5on4GlSGApEliKBJYigaVIYCkSWIo0G6z/llkfv/8LvVMCK7MKrMgqsCKrwIqsAiuyCqzIKrAiq8CKrAIrsgqsyCqwIqvAiqwaA+vwVf1k+6XAUt1zWIfLL6cnwXq+6mVYB/dYI1a9Cuvgj8Ixq6bB+m/5ytLvwFeJ22XzvQbr0LnHGrPqNViDj+N7LXMblvLwAGAt1muwTjVfbClYD08A1mK9/OmGVd1jfbN6cASwFgusxKpXPvNefQDfaInbsHRgrbVN/1vhj6r7ZwBrscBKrAIrsmrTsEoH1mrbA6z7hwBrscBKrAIrsgqsyKotw6pM3TsFWIsFVmIVWJFVYEVWbRhWj9SdY4C1WGAlVoEVWQVWZNV2YQ1Etc8B1mKBlVgFVmQVWJFVYEVWgRVZBVZk1WZh3YBqHgSsxQIrsQqsyCqwIqvAiqwCK7IKrMgqsCKrwIqs2iqshqfWScBaLLASq8CKrAIrsgqsyCqwIqvAiqwCK7IKrMiqabCWu0RguXlkwSsXznVb7Kpt3mM176caR3GPtVhgJVaBFVkFVmQVWJFVYEVWgRVZBVZkFViRVWBFVm0T1h1Nt2cBa7HASqwCK7IKrMgqsCKrwIqsAiuyCqzIKrAiqzYJ6x4msFbUnmDdHgasxdoSrI+fwNpAm4J1PAXWBgIrsQqsyCqwIqvAiqzaIqzyUY1grbRdwbo5DViLBVZiFViRVWBFVoEVWQVWZBVYkVWbgHX+qoYTrHIEawNtAtbPZ9yfwxo+CdZigZVYBVZkFViRVWBFVoEVWTUG1uGr1uP9wFLdc1iHyy/9xweBpTqwEqtehNUNH78GluomwXrTZds+Pk99PyxfD6tteDG5JS4uN/uNsodehbX4B+/l+Ogea/C0e6zF2twfhWBtI7ASq7b3t0KwttEmYZ3rwFptL3zm/VA9fts7YZ2f7sBabZv7t0KwthFYiVVgRVaBFVkFVmQVWJFVm4NVjo9h9Z8B1mKBlVgFVmQVWJFVYEVWgRVZBVZkFViRVWBFVoEVWQVWZNXWYJXjM1i954C1WGAlVoEVWQVWZBVYkVVgRVaBFVkFVmQVWJFVG4NVjs9h1c8Ca7HASqwCK7IKrMgqsCKrwIqsAiuyCqzIKrAiq7YFqxzHwKqeB9ZigZVYtWpYlx+7Ddb2WjWsk59PsDYYWIlV02CFL9fWu5zcv4dl8PR/7cvIvfvScnPdFrvKPVZiFViRVWBFVoEVWQVWZNWmYJXjOFjXZ4K1WGAlVm0b1lflfKnVOrCWb9Owzs8Da32BlVgFVmQVWJFVYEVWgRVZBVZkFViRVWBFVoEVWQVWZNWWYJWPsbAupwJrscBKrAIrsgqsyCqwIqvAiqwCK7IKrMgqsCKrwIqsAiuyCqzIKrAiqzYEqxzHwzofC6zFAiuxXvt40Mg3sc3ASqzXzu9rI7BGBVYzsKYGVjOwpgZWM7CmBlYzsKYGVjOwpgZWM7CmBlYzsKYGVrNxsMqp1hsog4e3z75zE5bm643t652pXvMXb2TEqxy+aj3eD6xmI2H1HvS6B6tcHty/Ccudt/m86h36rcynL3G4/NJ/fBBYzV6C1bo1wHr4RsAaDevnz8Ty83Tprk9Xf1yWnxu9D6tUv5TTi59f/7Sf39Y3yJ9h+PD8aDn/CV3q96Mr/T8l7/UarG74+LX5bqXrv9LWsMrxFVg/59ogrDJ8WC5Pl+pF78MaLFelt2+z+RtWb+hyj9V/P7r2R4P9JsHKXP7vfDnCz971Csvg6Z+H5XzhwkHvvGbho/9702FdHz6H9XPXUr9C/5Wqh73foBtIaMAavD/Peh1W/IP3y23hHmsAq/qrY19Q/YZehXW5/6nvh8D6U7C63mu0YdV3ZONg9Wmd7C4A644rsNptAlb74Xth3XMFVruXP49V34Cne5DL00Mj9U3ef0sDFz1YvTd0T8xjWIEP3u+6Aqvd6595L/2/1t/9dMP1Jh987NVbBl6qT2U8+nRD7w2U0nM326cbLp9tP5wevfOpd7Ca7fPfCueBNS6wmu0O1nwfY40MrGa7g9X7E/PRi831+4HVbH+wRrYVWOX4GqzTwcBaLLAS6zWwpgZWM7CmBlYz32I/NbBUB1ZiFViRVWBFVm0FVhmCAmvl7RbW98nAWqx9wGr9BR6sRdsHrPMz6jcD1qKBlVgFVmQVWJH11QZfk974mqexX1+3msBKrK92/nL36r/B90hcX2AjgZVYX636nocmrPO3L4A1ZW3AKjegnsP6d7Ttwep9Z1bvJbrbZ645sBJr791o13+Zm2/fA+scWBMC635gTeiqCqxhYE3oRxJYjcCaEFj3A2tCP59MAKsRWBO6shn8nI7BS4z6HuR1BFZinVC5+0Tj6RUHVmKdEFiDkrDKLajdwnp0i2zI1Z5hfZ1ti7B2EliJVWBFVoEVWQVWZBVYkVVgRVaBFVkFVmTVNFiZq7TdXlauNC4z17us3PkZ/Tf0rkvLzXVb7Cr3WIlVm4BVGqDAWnl7htUVsBYLrMSqNcEaXGMcrE23JlgDN2BtObASq8CKrAIrsgqsyCqwIqvAiqwCK7JqC7BKC9QoWE9OB1YwsBKrwIqsAiuyCqzIKrAiq8CKrAIrsgqsyCqwIqvAiqzaAKzyOR7W8NrQ/z08HljB9gVreM8F1mKBlVgFVmQVWJFVYEVWgRVZBVZkFViRVeuHVY5TYD08H1jBwEqsAiuyCqzIKrAiq8CKrFo9rHIEa5PtHdajA4IVDKzEqjGwDl9VT915KbBU9xzW4fLL9yNgjVj1KqyDe6wxq16+xwrAGl44AKw9NAnWPFdla10t7vxIaQ33LyvXurxc/uJys98oe2gN91jHz+Yd0r9H/t0hTbvHenBC91jBwEqsAiuyCqzIKrAiq175zPuJFFgjVq3h3wrB2mPrhvXNZiKs+0cEKxhYiVVgBQ6kDqzEgdSBlTiQOrASB1IHVuJA6sBKHEjdymGdLq06FdbdM4IVDKz5D6QOrMSB1IGVOJA6sBIHUgdW4kDqwEocSN26YZ1cTYd175BgBQNr/gOpAytxIHVgJQ6kDqzEgdStGtb5kr3TYd05JVjBwJr/QOrAShxIHViJA6kDK3EgdWuGVY7zwWofE6xgYM1/IHVgJQ6kDqzEgdSBlTiQukVhXX8MN1j7a0lYFw4tWOU4J6zmOcEKtk9Y58BarH3COj8Ea7HAmu9AqgJrvgOpCqz5DqSqtcIqx3lhtQ4KVjCw5juQqibBmnY1tkdXi/u+XNzDF3h4WbnG5eWCV5eb67bYVe6x5juQqsCa70CqAmu+A6lqpbBqL/PAapwUrGBgzXcgVYE134FUBdZ8B1IVWPMdSFXrhFXqZ8wE6/aoYAUDa74DqQqs+Q6kKrDmO5CqVgmr9HzMBevmrGAFA2u+A6kKrPkOpKo1wip9HxNgnWsfFqxg+4b1+fOwfViwgoE134FUtUJYZ1fzwxqcFqxgYM13IFWBNd+BVLU+WOefaZuA1T8uWMHeD+vj+nOxwNpvC8A63jAAa3+BNflAarU6WJefEBmB1TsvWMHAmnwgtVobrOtPL8rAqg8MVjCwJh9IrVYGq/rO+hCs6sRgBQNr8oHUal2wekxmhNX7wqzSeKemHEitVgWr9z0Uc8K6vMXekcEK9gdhXc4MVrA1wSpDBmBttz8G61T5eO1dfnnVmmANviI5AuvnYXntXX55FVjTD6RW74M1/DKsIawyZBGE9VFGvcu/XvVOWHfc/MAqNyySsL7PDVawPKzbLxhtwSq3LKKw/h0crGBvgPWN476bb1jl+G5YXycHK9j8sD6GjYDV5JGEdfqkw8fHmAP9YtUYWIevWo/3u8Jq3coPYZVy+wphWN+/liNYsZ7DOlx+6T8+6Newvlg1r2Kfh/VZyoPjgzWpRWGVfx0bP9HvTbD+/e2w3NMF1qQWgFUunV9gQVin/wnnmgd6fFy1mwSrvmhfGd3NR/dTmvmtjc71Cp80/z3We9fH7/9C75TAyqwCK7IKrMgqsCKrXvnM+6F6/DawVLfgD16bZQVrpYGVWAVWZBVYkVVgRVaBFVkFVmQVWJFVYEVWgRVZBVZkFViRVfPBkurAUiSwFAksRQJLkcBSJLAUCSxFAkuRwFKk2WDd/8aw4Osu8ptOfN0/0lywHnwra+51F/lNJ77uXwmsN7/uX2lWWO9+A5Nv4Cmw9LD5YE38sGNbsHyM9azZYH3/ZIffv/avXnUarN/jmHbYv9EaPsa6voEt/KZz/L5/oOmwTn/1/t3/6+qv7WDtqzXcY/lb4Q4D682v+1fymfd3v+4fyb8VKhJYigSWIoGlSGApEliKBJYigaVIYCkSWIoEliKBpUhgKRJYigSWIoGlSGApEliKBJYigaVIYCkSWIoEliKBpUhgKRJYigSWIoGlSGApEliKBJYigaVIYCkSWIoEliL9D8WrXoc6Ef/rAAAAAElFTkSuQmCC", "text/plain": [ "plot without title" ] }, "metadata": { "image/png": { "height": 240, "width": 300 } }, "output_type": "display_data" } ], "source": [ "set.seed(5555)\n", "\n", "theta_dml_po = rep(NA, n_rep)\n", "se_dml_po = rep(NA, n_rep)\n", "\n", "for (i_rep in seq_len(n_rep)) {\n", " cat(sprintf(\"Replication %d/%d\", i_rep, n_rep), \"\\r\", sep=\"\")\n", " flush.console()\n", " df = data[[i_rep]]\n", " obj_dml_data = double_ml_data_from_data_frame(df, y_col = \"y\", d_cols = \"d\")\n", " obj_dml_plr = DoubleMLPLR$new(obj_dml_data,\n", " ml_l, ml_m,\n", " n_folds=2,\n", " score='partialling out')\n", " obj_dml_plr$fit()\n", " theta_dml_po[i_rep] = obj_dml_plr$coef\n", " se_dml_po[i_rep] = obj_dml_plr$se\n", "}\n", "\n", "g_dml_po = ggplot(data.frame(theta_rescaled=(theta_dml_po - alpha)/se_dml_po), aes(x = theta_rescaled)) +\n", " geom_histogram(aes(y=after_stat(density), x=theta_rescaled, colour = \"Double ML with cross-fitting\", fill=\"Double ML with cross-fitting\"),\n", " bins = 30, alpha = 0.3) +\n", " geom_vline(aes(xintercept = 0), col = \"black\") +\n", " suppressWarnings(geom_function(fun = dnorm, aes(colour = \"N(0, 1)\", fill=\"N(0, 1)\"))) +\n", " scale_color_manual(name='',\n", " breaks=c(\"Double ML with cross-fitting\", \"N(0, 1)\"),\n", " values=c(\"Double ML with cross-fitting\"=\"dark green\", \"N(0, 1)\"='black')) +\n", " scale_fill_manual(name='',,\n", " breaks=c(\"Double ML with cross-fitting\", \"N(0, 1)\"),\n", " values=c(\"Double ML with cross-fitting\"=\"dark green\", \"N(0, 1)\"=NA)) +\n", " xlim(c(-6.0, 6.0)) + xlab(\"\") + ylab(\"\") + theme_minimal()\n", "g_dml_po" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, let us compare all distributions." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Warning message in geom_function(fun = dnorm, aes(colour = \"N(0, 1)\", fill = \"N(0, 1)\")):\n", "\"\u001b[1m\u001b[22mAll aesthetics have length 1, but the data has 1000 rows.\n", "\u001b[36mℹ\u001b[39m Did you mean to use `annotate()`?\"\n", "Warning message:\n", "\"\u001b[1m\u001b[22mRemoved 27 rows containing non-finite outside the scale range (`stat_bin()`).\"\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlgAAAHgCAMAAABOyeNrAAAASFBMVEUAAAAAAIsAHmIAZAA1O0VMKmJNTU10kpF+m5qHjWaOk2ylpc6lw6Wpr3SzgACzs9yzuX6z0bPKp5rr6+vxz6X/jAD/3bP////BsPMoAAAACXBIWXMAABJ0AAASdAHeZh94AAARYklEQVR4nO3djXqiyBZG4co4x+mkZzrpmU5z/3d6FFCqStjCRxVUkfWeM0Hd/iWuh9hGwTVABm7vO4BjIixkQVjIgrCQBWEhC8JCFoSFLAgLWaQK678NR8Z93vJuwEJY8ggWwpJHsBCWPIKFsOQRLIQlj2AhLHkEC2HJI1gISx7BQljyCBbCkkewEJY8goWw5BEsM8I6X/hHR89EWAg8D+t8/9IdJSzMsDSsM2sszLEwrDO/CjHLqrD+24fb52Zz/PiPa1lY54Y1FmZZFFb0PN634SPq3PSdJqxSLAurM3YuwkJg8csNu6+xXPu/bW6LsGSEJY9gWfDKu/cE/tFmj6hrCKsG1f2t0N3+v+/dwBOEJY9gISx5BEttYbmGsKpQaVhT95uwSkFY8ggWwpJHsBCWPIKFsOQRLJWF5cIve90NPEVY8ggWwpJHsBCWPIKFsOQRLIQlj2CpKyz38HWXu4HnCEsewUJY8ggWwpJHsBCWPIKFsOQRLFWF5fwFYRWt3rBG7zphlYKw5BEshCWPYCEseQQLYckjWGoKy4VLwipZxWGN3XfCKgVhySNYCEsewUJY8ggWwpJHsFQUlosPEFbBag5r5M4TVikISx7BQljyCBbCkkewEJY8goWw5BEshCWPYFkV1rb7C3TTBzaR6kf+NdSzxnLmoc3uBuYhLHkEC2HJI1gISx7BQljyCBbCkkewEJY8goWw5BEshCWPYCEseQRLNWG5Zwe3uRuYibDkESyEJY9gISx5BAthySNYCEsewUJY8ggWwpJHsBCWPIKFsOQRLIQlj2ApPqxTx52GEWFVoPywfrYmworvP2GVgrDkESyEJY9gISx5BAthySNYCEsewUJY8ggWwpJHsBCWPIKFsOQRLIQlj2CpJCz3k7DqUntY0TdAWKUoN6z+bQ2EVaeCw+qeXBFWnQhLHsFCWPIIFsKSR7AQljyChbDkESwzwjpfjB0OEBYCz8M637+Eh0OEhQBhySNYloXVxIfvCAuBVWFl3Yvb6e9Wu3CX5TAJ9ya32b7lkv/sD21pWPs8eXc/J9dY4THWWKWo41chYVWHsOQRLHX8q5CwqlNPWL2GsKqw4JX3s3f4Qf6w+qMP95mwilTH3woJqzqEJY9gISx5BAthySNYCEsewUJY8giWKsJyP42wgqOEVQrCkkewEJY8goWw5BEshCWPYCEseQQLYckjWAhLHsFCWPIIFsKSR7AQljyChbDkESyEJY9gqSEs99MMyz9OWKUgLHkEC2HJI1gISx7BQljyCBbCkkewEJY8goWw5BEshCWPYCEseQQLYckjWAhLHsFCWPIIFsKSR7BUEJb7+SQs7wTCKgVhySNYCEsewUJY8giWAsO6bXibsGpWYlhtQX8TVtUISx7BsiqsPLtv8/cnd1244Oh/IzuS22bPcql+5F9D+WusP9wfV6yx6lJ+WN/ctyvCqgthySNYCEsewUJY8sj3Yph3DUdTflhdV2ZYwym7hfU+ibBWISzCClQX1oXr97Q6IKziVBfWz36rRsFFCKs49YT1x/11UsKqQD1hDSsuwqoAYckjH2HFCEse+QgrRljyyEdYMcKSR755Ybno3j6z9MGJrt+le3iXIyx55JsZlgvvrcXNPWN0Ge9mCCt04LC6O0pYCxDWsrCcc106zg0PgX/i9eB96obl6KXvx72bcfwqjBw5rNuK6BaY38PjiS4+d+Oth9z4vCGsqVEYVt+VHdb926guLP/OP5wYnzC19K+8Iayp0aHD6h/t1GF1vwuHfxU6/3Z2UXBY3R8HjxaWtyYZll4XUlj+epCwpka3sLqSvkRYwXh5WA/PsQhr7MSDh9WMrpzuy2dhjV06OH1sHbYDwpJHvsWvvPsvENwfg+BE9/jixOPLDcO8GQ+rf+Fie4Qlj3wb/q1wv5XQIoQlj3zbhBWu38pGWPLIt9Eaa6dfawrCkkc+3jYTIyx55COsWOlhudOssG7fB2GVgrDkkY+P2MeqDSt61HYOC7Fqw4p+z+wc1qdh3jUcDWHJI9/n70mEtQphEVaghLCi3xmnHy3CqloRYYWPwOnj6p+JsK5v0bpuw4GwylZ+WD/8FVjb1/U1U8IqXOFhuW+nt86TsPpvhLBKQVjyyEdYMcKSRz7CihGWPPLNC8t4P6eLlo8nu9GB4sllw9saOfOs255xpvPF2OFAhrCuT9ndj0OFFSwCU2FFb5OPzy55Hlbjved++RXMPM/5/iU8HMoQ1jUn93bAsKy1AGFFUoTV+xphRZ+HuG2Gwf/oRHvU3b76j1R/qWjLDv7hcOsO4aYe3PhlvPFt6xG3z2hEn+WYU9aysJr48F2CsG57Vv0SYbl4eX9Ig4fODMs1I+cfu1r35DbGxvcCwxtym4SVaEeAn79ap386/fL0euFe28XF6a/WdeGuy++tl/uV5N9nofVDWh/WsJwdVnTRh4fTTZxx4jLh2A9r7MqeWR5Wtifv3Zrq48NbY7m32Wus7jupd401LP1/OrqhqllhNcMv0eBT+9ZtBNuDI6wDh9UEl1gWlv/L6tkaqwkvs3NY410RVjFhTbcwHdbIeOuwJroirAVhRauH7sWE4UF0I2edG9bD1QYnTt/G43girFxP3qe6IqxZYXlPbaLNMUy+3DA8vMGFGy+OiZcbRtZY819u6K4gupJh0xHPfyIzznJ7tf3cHRx/6Z2wZoSVzoyHbfGF5l3nPel1NzYbYR0/rGHNtf7GZiOsTcOSHrfVa6xw48xrr20ewto2rOLtGdb9k3eEdTy7hvX7V/ejJ6zjKTos97YgrPZbIaxSEJY8Cr4VPmIfISx5BAthySPfyTDvGo6GsOSRr98/yxjCWoWwCCtAWPLIR1gxwpJHPsKKVRvW4zNjwipJtWH1R6NvhbBKUXtY3qaJCasktYf1PjzXIqySEJY88s0LK+FGQeJziI/j7aPNq65k/JoTXU+OsNzborCu30vhYQWLwFRYw3vcpx+p2W++m7pV/QqeXvVqhDU/rLEfOmFNIKyFYa3ZKIj/2Zt+T+X3DXc0/nX3GwRx3onBtd8vaWwARERY8si3OCwXL939uPPOOh1WNPHfjB5f5+gNeld0X2OF92PWp7wmEZY88q0Pa1g+D6tftfgXCC/kLYMbaKIHfCSs6P7ICEse+ZKGNWujIEvDuq9//PUQYV2Ww+bejx5WE1xiPCx/RTYvrDCtYWM0Xz6sYcVFWFpY40vCOkxYfgi3X1LLNgpyWwU1QRdBWMEVTRVjh/VFnrwfIaxUGwWJwoo3DRp8Ytl4uSG4gokNgKgISx75jvm3QsJ6b9/h4F5eCCuRIz/Hcm8Lwrp+ce+ElczUX8tnXz7R/UgZ1p+t5WG9O8IqRolhtQW9ElbVCEse+QgrtmlY0eYMjhQWn4SObBtWV9Dv44WFGGHJI1gISx7BQljyCBbCkkewrApr6Y7Z+v3H/fqMjka7lev3I+e6xW23csPe5Vy/e7lh73LfX9qv7sW+/XVS/ci/BtZY8giWcsNyb4RVsUOF9e4IqxSEJY9gKSGsYCfjhHUMRYTlvw/ryGFF70kfec/T6vfXFeNgYQ1by9Lu4ZLRct5eveMPQDycoXYHC+t7d0y+h0tGy3mfeRgN6/bxBcIaENYMQ1jBJ7OCczSPJ1Zpn7Buu5M7UFhuXHieh4/vEdYzy8K6LQ4U1hyEtRxhzTBURVhzEdYMfUmEtUCKsP7sP/fVFuROQlgvjrAKUVJYQUKHDWvYoChhzUBYMwzZRNvpiM6x7jPIZSAsebSKmzwycrxGhCWPViGsmQhrIeMHf4SuCEsfwUJY8ggWwpJHsBCWPIKFsOQRLKWG5V7nhXXTh/XdEVYZag/rW7DmIqxiEJY8goWw5BEs24R1fy8yYX0VG4UVBEVYXwBhySNYCEsewUJY8ggWwpJHsBQalnsjrLodLax3R1hFICx5BAthySNYigjL/6AqYR1DEWE9JkRYtasurB8/frjLf4RVuOrCaofdmouwCjYjrPOFd2z8TISFwPOwzvcv7QHCwhwLwzpvs8Zyb4RVuaVrrJRh3T8JkTKsa1mEtb9VYc3eI9tnsP+4qf3IDTuQc95+5KLdyr320363cn8Fu5e77l3OXRbsVm53u66xPv7xX3JnjXUkhCWPYDlmWC8juz4hrE0dM6z3/uj8eyiMYCEseQTLglfeu6QIC3Ps+bdCwjqwIsO6lvMsrB+tkbAuZRHW/qoNqz9KWIUiLHkEC2HJI1gISx7BQljyCBbCkkewEJY8gqXEsNp9q+phvTvC2h9hySNYCEsewUJY8ggWwpJHsBCWPIKFsOQRLAWG1XZFWJXLFNZnZNOw3h1h7S5XWF5Cv7zP1BPWV0FY8ggWwpJHsJQXVr/P3iRh+Z9bJaxN1R7Wj/5DFeNh+SsuwtpU7WHdpoRVGMKSR7AcMqwXR1h7Ky4s90ZYR0BY8ggWwpJHsBCWPIKFsOQRLLuEddsMd7aw+hccCGs/+4TV76BwJCz3RliHcJSweoRViqOE1f8tmrBKcdCwurIIaz+EJY9gISx5BEthYQ3NEFbdjhpWt99CwtrNqrCe7kfu1+fY4rY/uZHdyj3uT25qt3LRdGzvcu3ye7K9zKX6kX8NrLHkESyEJY9gISx5BEtZYbkhlrVhefvUIawdEJY8goWw5BEsu4b1Z4ewDihVWOPbLXoS1ttr1AxhHUaysH4H7WhhOe/o6rCGHQkQ1g4ISx7Bsk9Yt+dWG4UVbHVmBGGlt09YtyrShdVvdGY8rPbr93fC2tBRwuoXhFWKksJyfiTrw7pvPZmwdnCwsPoXOwhrdwcLq0dYuztYWP0VEtbuCgrLBZEkCOvdEdZuCKshrBwIqyGsHAirIawcygnLhZGkCOvFEdZeNg3r9moAYR3ftmF9fIRvGSWsw9o2rPgto4R1WNuGFUfiheXeMoTlb8KBsDZFWA1h5UBYDWHlUEpYLo4kTVjeB6K/v0++kZSw0vtSYfWnPtx5wkqvkLDcQySJwho+BUZYmzpmWN77sghrH2WE5R4jWRWWt6dowtrHRmH9r7VZWP6eot1DWA9P4gkrvY3C+rc1FZYbiWRVWLdTx8N6j9dchJXescNqudNEWPcVF2GlV0JYwXvdk4bVLdxEWPcVF2GlV0BYbjSShGGdHGFtbpuwwvdhhWG58UhShtW+TEpYm9oorIlIXv2X3DOGdS2LsDaVOaz4LaOE9VXkDuvj47qh0emwJiNJG9alLMLaVOqwTpF4K6NRWC44mius7kWHU7hlh4mwoldPCUs0I6zzxdjhgL+KGt7V3j60r5MZuOAFrHxhtX/fcd9O3b9Nn4QVvnpKWKLnYZ3vX8LDoXtYwSrKDOuSVbxx25xh/eVcH1YkXEURVhrJw3p8wMfCchdvDxteyxvWt9P1Vt2//Ytqt+Xn9Xn9UBJhpbFtWO6mH24cVhvS7S4QVlarwvJ25udmiZ/aa9Jczbx7zP4KRanWWMk3dm2MjPu85d2AhbDkESyEJY9gISx5BAthySNYFrzyfvYOPyAsBDLtrzDriLAqQFjyCBbCkkewEJY8goWw5BEshCWPYCEseQQLYckjWAhLHsFCWPIIFsKSR7AQljyCJVVYQICwkAVhIQvCQhaEhSwIC1kQFrIgLGRBWMiCsJBFqrAmPxeW/FLb3ZJ6KTTJwpr+JGvqS213S+qlcEVYyS+Fq5RhbXLRFQ+2FhYkycKSn40UHRbPsVSpwmo37KBcbumF1LCURNTvCjs/xxouWuIt8RxrjdVhdf8iX/oQeP+OJ6xD4l+FyS+FK8JKfilc8cp7+kuh4W+FyISwkAVhIQvCQhaEhSwIC1kQFrIgLGRBWMiCsJAFYSELwkIWhIUsCAtZEBayICxkQVjIgrCQBWEhC8JCFoSFLAgLWRAWsiAsZEFYyIKwkAVhIQvCQhaEhSwIC1kQFrIgLGRBWMiCsJAFYSGL/wNxoHv5BP0/UAAAAABJRU5ErkJggg==", "text/plain": [ "plot without title" ] }, "metadata": { "image/png": { "height": 240, "width": 300 } }, "output_type": "display_data" } ], "source": [ "g_all_po = ggplot(data.frame(t_nonorth=(theta_nonorth - alpha)/se_nonorth,\n", " t_orth_nosplit=(theta_orth_po_nosplit - alpha)/se_orth_po_nosplit,\n", " t_dml=(theta_dml_po - alpha)/se_dml_po)) +\n", " geom_histogram(aes(x = t_nonorth, y=after_stat(density), colour = \"Non-orthogonal ML\", fill=\"Non-orthogonal ML\"),\n", " bins = 30, alpha = 0.3) +\n", " geom_histogram(aes(x = t_orth_nosplit, y=after_stat(density), colour = \"Double ML (no sample splitting)\", fill=\"Double ML (no sample splitting)\"),\n", " bins = 30, alpha = 0.3) +\n", " geom_histogram(aes(x = t_dml, y=after_stat(density), colour = \"Double ML with cross-fitting\", fill=\"Double ML with cross-fitting\"),\n", " bins = 30, alpha = 0.3) +\n", " geom_vline(aes(xintercept = 0), col = \"black\") +\n", " suppressWarnings(geom_function(fun = dnorm, aes(colour = \"N(0, 1)\", fill=\"N(0, 1)\"))) +\n", " scale_color_manual(name='',\n", " breaks=c(\"Non-orthogonal ML\", \"Double ML (no sample splitting)\", \"Double ML with cross-fitting\", \"N(0, 1)\"),\n", " values=c(\"Non-orthogonal ML\"=\"dark blue\",\n", " \"Double ML (no sample splitting)\"=\"dark orange\",\n", " \"Double ML with cross-fitting\"=\"dark green\",\n", " \"N(0, 1)\"='black')) +\n", " scale_fill_manual(name='',\n", " breaks=c(\"Non-orthogonal ML\", \"Double ML (no sample splitting)\", \"Double ML with cross-fitting\", \"N(0, 1)\"),\n", " values=c(\"Non-orthogonal ML\"=\"dark blue\",\n", " \"Double ML (no sample splitting)\"=\"dark orange\",\n", " \"Double ML with cross-fitting\"=\"dark green\",\n", " \"N(0, 1)\"=NA)) +\n", " xlim(c(-6.0, 6.0)) + xlab(\"\") + ylab(\"\") + theme_minimal()\n", "g_all_po" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Warning message in geom_function(fun = dnorm, aes(colour = \"N(0, 1)\", fill = \"N(0, 1)\")):\n", "\"\u001b[1m\u001b[22mAll aesthetics have length 1, but the data has 1000 rows.\n", "\u001b[36mℹ\u001b[39m Did you mean to use `annotate()`?\"\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Warning message:\n", "\"\u001b[1m\u001b[22mRemoved 927 rows containing non-finite outside the scale range (`stat_bin()`).\"\n", "Warning message in geom_function(fun = dnorm, aes(colour = \"N(0, 1)\", fill = \"N(0, 1)\")):\n", "\"\u001b[1m\u001b[22mAll aesthetics have length 1, but the data has 1000 rows.\n", "\u001b[36mℹ\u001b[39m Did you mean to use `annotate()`?\"\n", "Warning message:\n", "\"\u001b[1m\u001b[22mRemoved 927 rows containing non-finite outside the scale range (`stat_bin()`).\"\n", "Warning message:\n", "\"\u001b[1m\u001b[22mRemoved 27 rows containing non-finite outside the scale range (`stat_bin()`).\"\n", "Warning message in geom_function(fun = dnorm, aes(colour = \"N(0, 1)\", fill = \"N(0, 1)\")):\n", "\"\u001b[1m\u001b[22mAll aesthetics have length 1, but the data has 1000 rows.\n", "\u001b[36mℹ\u001b[39m Did you mean to use `annotate()`?\"\n", "Warning message:\n", "\"\u001b[1m\u001b[22mRemoved 27 rows containing non-finite outside the scale range (`stat_bin()`).\"\n" ] } ], "source": [ "# save all plots\n", "ggsave(filename = \"../guide/figures/r_non_orthogonal.svg\", plot = g_nonorth, dpi = 300, units = \"in\", width = 6, height = 4)\n", "ggsave(filename = \"../guide/figures/r_dml_nosplit.svg\", plot = g_nosplit, dpi = 300, units = \"in\", width = 6, height = 4)\n", "ggsave(filename = \"../guide/figures/r_dml.svg\", plot = g_dml, dpi = 300, units = \"in\", width = 6, height = 4)\n", "ggsave(filename = \"../guide/figures/r_all.svg\", plot = g_all, dpi = 300, units = \"in\", width = 6, height = 4)\n", "\n", "ggsave(filename = \"../guide/figures/r_dml_po_nosplit.svg\", plot = g_nosplit_po, dpi = 300, units = \"in\", width = 6, height = 4)\n", "ggsave(filename = \"../guide/figures/r_dml_po.svg\", plot = g_dml_po, dpi = 300, units = \"in\", width = 6, height = 4)\n", "ggsave(filename = \"../guide/figures/r_po_all.svg\", plot = g_all_po, dpi = 300, units = \"in\", width = 6, height = 4)" ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "4.3.3" } }, "nbformat": 4, "nbformat_minor": 2 }