In this example, we will demonstrate the use of the DoubleML package in a real-data industry example: Estimation of price elasticity of demand. This notebook is based on a blogpost by Lars Roemheld (Roemheld, 2021) with code and preprocessed data being available from GitHub. The original data file is made available as a public domain (CC0 1.0 Universal) data set and shared on kaggle. It contains data on sales from an online retailer in the period of December 2010 until December 2011.
The data preprocessing is performed in a separate notebook that is available online. To keep the computational effort at a moderate level, we will only use a subset of the data that is used in Roemheld (2021). Our main goal is to illustrate the main steps of elasticity estimation with DoubleML.
The following case study is organized according to the steps of the DoubleML workflow.
“Supply” and “demand” are probably the very first terms that economics and business students hear in their studies. In industry, the price elasticity of demand is a very important quantity: It indicates how much the demand for a product (= the quantity sold by the firm) changes due to a change in its price. As a retailer, this quantity is of great interest because it makes it possible to increase revenues, and eventually profits, by optimally adjusting prices according to elasticities.
The price elasticity of demand is formally defined as the relative change of the demanded quantity (\(q\)) of a product given a percent-change of the price (\(p\))
\[\theta_0 = \frac{\partial q/q}{\partial p/p}.\]
In words, the parameter \(\theta_0\) can be interpreted as follows: Provided the price for a product increases by \(1\%\), the demanded quantity changes by \(\theta_0\%\).
In general, it would be possible to estimate \(\theta_0\) based on an experiment or A/B test. However, this is not possible in our case as the data set only contains information on actual purchases in the period of consideration.
The causal problem of price estimation based on an observational study is quite complex: It involves many (simultaneous) decisions made by the customers and the sellers. One approach for estimation of the causal parameter \(\theta_0\) would be to account for confounding variables, that might have an effect to both the price and the quantity sold. The approach taken in Roemheld (2021) is to flexibly account for and construct confounding variables, for example including similarities in their product description or seasonal patterns, and thereby justifying identification of \(\theta_0\).
We can use a partially linear regression (PLR) model for estimation of \(\theta_0\)
\[\log Q = \theta_0 \log P + g_0(X) + \zeta,\]
with \(\mathbb{E}(\zeta|D,X)=0\). The confounders can enter the regression equation nonlinearily via the function \(g_0(X)\). In order to equip \(\theta_0\) (approximately) with the interpretation of a price elasticity, we applied the \(\log()\) to both the demanded quantity (\(Q\)) and the prices (\(P\)), i.e., we set up a \(\log\)-\(\log\)-regression.
Before we proceed with the data analysis, it is important to mention a potential drawback to our analysis: The data only contains information on sales, not on stock days. Hence, based on this data set, it is not possible to assess what happened on days without sales (sales = 0). This drawback must be kept in mind when we draw causal conclusions from this analysis.
To give an idea on the general setting we briefly load an exemplary data excerpt from the original data set. We can see that the data lists the transaction of a (online) retailer selling products like inflatable political globes or fancy pens.
# load required packages
library(data.table)
library(mlr3)
library(mlr3learners)
library(DoubleML)
library(ggplot2)
# suppress messages during fitting
lgr::get_logger("mlr3")$set_threshold("warn")
# Load example data set from URL
url = 'https://raw.githubusercontent.com/DoubleML/doubleml-docs/master/doc/examples/data/orig_demand_data_example.csv'
data_example = fread(url)
data_example
## V1 Date StockCode Country Description
## 1: 0 2010-12-01 10002 France INFLATABLE POLITICAL GLOBE
## 2: 1 2010-12-01 10002 United Kingdom INFLATABLE POLITICAL GLOBE
## 3: 2 2010-12-01 10125 United Kingdom MINI FUNKY DESIGN TAPES
## 4: 3 2010-12-01 10133 United Kingdom COLOURING PENCILS BROWN TUBE
## 5: 4 2010-12-01 10135 United Kingdom COLOURING PENCILS BROWN TUBE
## 6: 5 2010-12-01 11001 United Kingdom ASSTD DESIGN RACING CAR PEN
## 7: 6 2010-12-01 15044B United Kingdom BLUE PAPER PARASOL
## 8: 7 2010-12-01 15056BL United Kingdom EDWARDIAN PARASOL BLACK
## 9: 8 2010-12-01 15056N United Kingdom EDWARDIAN PARASOL NATURAL
## 10: 9 2010-12-01 15056P United Kingdom EDWARDIAN PARASOL PINK
## Quantity revenue UnitPrice
## 1: 48 40.80 0.850
## 2: 12 10.20 0.850
## 3: 2 1.70 0.850
## 4: 5 4.25 0.850
## 5: 1 2.51 2.510
## 6: 3 10.08 3.360
## 7: 1 2.95 2.950
## 8: 20 113.00 5.650
## 9: 50 236.30 4.726
## 10: 48 220.80 4.600
In our analysis, we will use a preprocessed data set. Each row corresponds to the sales of a product at a specific date \(t\).
In the data we have,
Quantity
: Quantity demandedrevenue
: RevenueUnitPrice
: Price per unitmonth
: MonthDoM
: Day of monthDoW
: Day of weekstock_age_days
: Number of days product has been sold /
observed in the datasku_avg_p
: Average (=median) price of the product2010-12-01
, …: Date dummiesAustralia
, …: Country dummies1
, 2
, … : Numerical variables constructed
to capture similarities in product descriptions (n-grams)dLnP
: Change in PricedLnQ
: Change in QuantityNote that we do not include product dummies as the price and quantity variables have been demeaned to account for product characteristics.
# Load data set from URL
url2 = 'https://raw.githubusercontent.com/DoubleML/doubleml-docs/master/doc/examples/data/elasticity_subset.csv'
demand_data = fread(url2)
# Replace column names by names that are conform to R
names(demand_data) = make.names(names(demand_data), unique = TRUE)
# replace
names(demand_data)[1:20]
## [1] "V1" "Quantity" "revenue"
## [4] "UnitPrice" "month" "DoM"
## [7] "DoW" "stock_age_days" "sku_avg_p"
## [10] "X2010.12.01.00.00.00" "X2010.12.02.00.00.00" "X2010.12.03.00.00.00"
## [13] "X2010.12.05.00.00.00" "X2010.12.06.00.00.00" "X2010.12.07.00.00.00"
## [16] "X2010.12.08.00.00.00" "X2010.12.09.00.00.00" "X2010.12.10.00.00.00"
## [19] "X2010.12.12.00.00.00" "X2010.12.13.00.00.00"
# Print dimensions of data set
dim(demand_data)
## [1] 10000 906
# Glimpse at first rows of data set
head(demand_data[,1:10])
## V1 Quantity revenue UnitPrice month DoM DoW stock_age_days sku_avg_p
## 1: 189628 5 8.15 1.630000 9 6 1 278 0.85
## 2: 37914 19 41.93 2.206842 1 25 1 55 2.10
## 3: 80103 24 20.40 0.850000 3 31 3 120 0.85
## 4: 75019 12 23.40 1.950000 3 24 3 113 2.08
## 5: 99878 4 39.80 9.950000 5 5 3 155 9.95
## 6: 20013 1 1.25 1.250000 12 23 3 22 1.25
## X2010.12.01.00.00.00
## 1: 0
## 2: 0
## 3: 0
## 4: 0
## 5: 0
## 6: 0
To initiate the data backend, we create a new DoubleMLData
object. During instantiation, we assign the roles of the variables,
i.e., dLnQ
as the dependent var, dLnP
as the
treatment variable and the remaining variables as confounders.
feature_names = names(demand_data)[5:(dim(demand_data)[2]-2)]
data_dml = DoubleMLData$new(demand_data,
y_col = "dLnQ",
d_cols = 'dLnP',
x_cols = feature_names)
data_dml
## ================= DoubleMLData Object ==================
##
##
## ------------------ Data summary ------------------
## Outcome variable: dLnQ
## Treatment variable(s): dLnP
## Covariates: month, DoM, DoW, stock_age_days, sku_avg_p, X2010.12.01.00.00.00, X2010.12.02.00.00.00, X2010.12.03.00.00.00, X2010.12.05.00.00.00, X2010.12.06.00.00.00, X2010.12.07.00.00.00, X2010.12.08.00.00.00, X2010.12.09.00.00.00, X2010.12.10.00.00.00, X2010.12.12.00.00.00, X2010.12.13.00.00.00, X2010.12.14.00.00.00, X2010.12.15.00.00.00, X2010.12.16.00.00.00, X2010.12.17.00.00.00, X2010.12.19.00.00.00, X2010.12.20.00.00.00, X2010.12.21.00.00.00, X2010.12.22.00.00.00, X2010.12.23.00.00.00, X2011.01.04.00.00.00, X2011.01.05.00.00.00, X2011.01.06.00.00.00, X2011.01.07.00.00.00, X2011.01.09.00.00.00, X2011.01.10.00.00.00, X2011.01.11.00.00.00, X2011.01.12.00.00.00, X2011.01.13.00.00.00, X2011.01.14.00.00.00, X2011.01.16.00.00.00, X2011.01.17.00.00.00, X2011.01.18.00.00.00, X2011.01.19.00.00.00, X2011.01.20.00.00.00, X2011.01.21.00.00.00, X2011.01.23.00.00.00, X2011.01.24.00.00.00, X2011.01.25.00.00.00, X2011.01.26.00.00.00, X2011.01.27.00.00.00, X2011.01.28.00.00.00, X2011.01.30.00.00.00, X2011.01.31.00.00.00, X2011.02.01.00.00.00, X2011.02.02.00.00.00, X2011.02.03.00.00.00, X2011.02.04.00.00.00, X2011.02.06.00.00.00, X2011.02.07.00.00.00, X2011.02.08.00.00.00, X2011.02.09.00.00.00, X2011.02.10.00.00.00, X2011.02.11.00.00.00, X2011.02.13.00.00.00, X2011.02.14.00.00.00, X2011.02.15.00.00.00, X2011.02.16.00.00.00, X2011.02.17.00.00.00, X2011.02.18.00.00.00, X2011.02.20.00.00.00, X2011.02.21.00.00.00, X2011.02.22.00.00.00, X2011.02.23.00.00.00, X2011.02.24.00.00.00, X2011.02.25.00.00.00, X2011.02.27.00.00.00, X2011.02.28.00.00.00, X2011.03.01.00.00.00, X2011.03.02.00.00.00, X2011.03.03.00.00.00, X2011.03.04.00.00.00, X2011.03.06.00.00.00, X2011.03.07.00.00.00, X2011.03.08.00.00.00, X2011.03.09.00.00.00, X2011.03.10.00.00.00, X2011.03.11.00.00.00, X2011.03.13.00.00.00, X2011.03.14.00.00.00, X2011.03.15.00.00.00, X2011.03.16.00.00.00, X2011.03.17.00.00.00, X2011.03.18.00.00.00, X2011.03.20.00.00.00, X2011.03.21.00.00.00, X2011.03.22.00.00.00, X2011.03.23.00.00.00, X2011.03.24.00.00.00, X2011.03.25.00.00.00, X2011.03.27.00.00.00, X2011.03.28.00.00.00, X2011.03.29.00.00.00, X2011.03.30.00.00.00, X2011.03.31.00.00.00, X2011.04.01.00.00.00, X2011.04.03.00.00.00, X2011.04.04.00.00.00, X2011.04.05.00.00.00, X2011.04.06.00.00.00, X2011.04.07.00.00.00, X2011.04.08.00.00.00, X2011.04.10.00.00.00, X2011.04.11.00.00.00, X2011.04.12.00.00.00, X2011.04.13.00.00.00, X2011.04.14.00.00.00, X2011.04.15.00.00.00, X2011.04.17.00.00.00, X2011.04.18.00.00.00, X2011.04.19.00.00.00, X2011.04.20.00.00.00, X2011.04.21.00.00.00, X2011.04.26.00.00.00, X2011.04.27.00.00.00, X2011.04.28.00.00.00, X2011.05.01.00.00.00, X2011.05.03.00.00.00, X2011.05.04.00.00.00, X2011.05.05.00.00.00, X2011.05.06.00.00.00, X2011.05.08.00.00.00, X2011.05.09.00.00.00, X2011.05.10.00.00.00, X2011.05.11.00.00.00, X2011.05.12.00.00.00, X2011.05.13.00.00.00, X2011.05.15.00.00.00, X2011.05.16.00.00.00, X2011.05.17.00.00.00, X2011.05.18.00.00.00, X2011.05.19.00.00.00, X2011.05.20.00.00.00, X2011.05.22.00.00.00, X2011.05.23.00.00.00, X2011.05.24.00.00.00, X2011.05.25.00.00.00, X2011.05.26.00.00.00, X2011.05.27.00.00.00, X2011.05.29.00.00.00, X2011.05.31.00.00.00, X2011.06.01.00.00.00, X2011.06.02.00.00.00, X2011.06.03.00.00.00, X2011.06.05.00.00.00, X2011.06.06.00.00.00, X2011.06.07.00.00.00, X2011.06.08.00.00.00, X2011.06.09.00.00.00, X2011.06.10.00.00.00, X2011.06.12.00.00.00, X2011.06.13.00.00.00, X2011.06.14.00.00.00, X2011.06.15.00.00.00, X2011.06.16.00.00.00, X2011.06.17.00.00.00, X2011.06.19.00.00.00, X2011.06.20.00.00.00, X2011.06.21.00.00.00, X2011.06.22.00.00.00, X2011.06.23.00.00.00, X2011.06.24.00.00.00, X2011.06.26.00.00.00, X2011.06.27.00.00.00, X2011.06.28.00.00.00, X2011.06.29.00.00.00, X2011.06.30.00.00.00, X2011.07.01.00.00.00, X2011.07.03.00.00.00, X2011.07.04.00.00.00, X2011.07.05.00.00.00, X2011.07.06.00.00.00, X2011.07.07.00.00.00, X2011.07.08.00.00.00, X2011.07.10.00.00.00, X2011.07.11.00.00.00, X2011.07.12.00.00.00, X2011.07.13.00.00.00, X2011.07.14.00.00.00, X2011.07.15.00.00.00, X2011.07.17.00.00.00, X2011.07.18.00.00.00, X2011.07.19.00.00.00, X2011.07.20.00.00.00, X2011.07.21.00.00.00, X2011.07.22.00.00.00, X2011.07.24.00.00.00, X2011.07.25.00.00.00, X2011.07.26.00.00.00, X2011.07.27.00.00.00, X2011.07.28.00.00.00, X2011.07.29.00.00.00, X2011.07.31.00.00.00, X2011.08.01.00.00.00, X2011.08.02.00.00.00, X2011.08.03.00.00.00, X2011.08.04.00.00.00, X2011.08.05.00.00.00, X2011.08.07.00.00.00, X2011.08.08.00.00.00, X2011.08.09.00.00.00, X2011.08.10.00.00.00, X2011.08.11.00.00.00, X2011.08.12.00.00.00, X2011.08.14.00.00.00, X2011.08.15.00.00.00, X2011.08.16.00.00.00, X2011.08.17.00.00.00, X2011.08.18.00.00.00, X2011.08.19.00.00.00, X2011.08.21.00.00.00, X2011.08.22.00.00.00, X2011.08.23.00.00.00, X2011.08.24.00.00.00, X2011.08.25.00.00.00, X2011.08.26.00.00.00, X2011.08.28.00.00.00, X2011.08.30.00.00.00, X2011.08.31.00.00.00, X2011.09.01.00.00.00, X2011.09.02.00.00.00, X2011.09.04.00.00.00, X2011.09.05.00.00.00, X2011.09.06.00.00.00, X2011.09.07.00.00.00, X2011.09.08.00.00.00, X2011.09.09.00.00.00, X2011.09.11.00.00.00, X2011.09.12.00.00.00, X2011.09.13.00.00.00, X2011.09.14.00.00.00, X2011.09.15.00.00.00, X2011.09.16.00.00.00, X2011.09.18.00.00.00, X2011.09.19.00.00.00, X2011.09.20.00.00.00, X2011.09.21.00.00.00, X2011.09.22.00.00.00, X2011.09.23.00.00.00, X2011.09.25.00.00.00, X2011.09.26.00.00.00, X2011.09.27.00.00.00, X2011.09.28.00.00.00, X2011.09.29.00.00.00, X2011.09.30.00.00.00, X2011.10.02.00.00.00, X2011.10.03.00.00.00, X2011.10.04.00.00.00, X2011.10.05.00.00.00, X2011.10.06.00.00.00, X2011.10.07.00.00.00, X2011.10.09.00.00.00, X2011.10.10.00.00.00, X2011.10.11.00.00.00, X2011.10.12.00.00.00, X2011.10.13.00.00.00, X2011.10.14.00.00.00, X2011.10.16.00.00.00, X2011.10.17.00.00.00, X2011.10.18.00.00.00, X2011.10.19.00.00.00, X2011.10.20.00.00.00, X2011.10.21.00.00.00, X2011.10.23.00.00.00, X2011.10.24.00.00.00, X2011.10.25.00.00.00, X2011.10.26.00.00.00, X2011.10.27.00.00.00, X2011.10.28.00.00.00, X2011.10.30.00.00.00, X2011.10.31.00.00.00, X2011.11.01.00.00.00, X2011.11.02.00.00.00, X2011.11.03.00.00.00, X2011.11.04.00.00.00, X2011.11.06.00.00.00, X2011.11.07.00.00.00, X2011.11.08.00.00.00, X2011.11.09.00.00.00, X2011.11.10.00.00.00, X2011.11.11.00.00.00, X2011.11.13.00.00.00, X2011.11.14.00.00.00, X2011.11.15.00.00.00, X2011.11.16.00.00.00, X2011.11.17.00.00.00, X2011.11.18.00.00.00, X2011.11.20.00.00.00, X2011.11.21.00.00.00, X2011.11.22.00.00.00, X2011.11.23.00.00.00, X2011.11.24.00.00.00, X2011.11.25.00.00.00, X2011.11.27.00.00.00, X2011.11.28.00.00.00, X2011.11.29.00.00.00, X2011.11.30.00.00.00, X2011.12.01.00.00.00, X2011.12.02.00.00.00, X2011.12.04.00.00.00, X2011.12.05.00.00.00, X2011.12.06.00.00.00, X2011.12.07.00.00.00, X2011.12.08.00.00.00, X2011.12.09.00.00.00, Australia, Austria, Bahrain, Belgium, Brazil, Canada, Channel.Islands, Cyprus, Czech.Republic, Denmark, EIRE, European.Community, Finland, France, Germany, Greece, Hong.Kong, Iceland, Israel, Italy, Japan, Lebanon, Lithuania, Malta, Netherlands, Norway, Poland, Portugal, RSA, Saudi.Arabia, Singapore, Spain, Sweden, Switzerland, USA, United.Arab.Emirates, United.Kingdom, Unspecified, X0, X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X18, X19, X20, X21, X22, X23, X24, X25, X26, X27, X28, X29, X30, X31, X32, X33, X34, X35, X36, X37, X38, X39, X40, X41, X42, X43, X44, X45, X46, X47, X48, X49, X50, X51, X52, X53, X54, X55, X56, X57, X58, X59, X60, X61, X62, X63, X64, X65, X66, X67, X68, X69, X70, X71, X72, X73, X74, X75, X76, X77, X78, X79, X80, X81, X82, X83, X84, X85, X86, X87, X88, X89, X90, X91, X92, X93, X94, X95, X96, X97, X98, X99, X100, X101, X102, X103, X104, X105, X106, X107, X108, X109, X110, X111, X112, X113, X114, X115, X116, X117, X118, X119, X120, X121, X122, X123, X124, X125, X126, X127, X128, X129, X130, X131, X132, X133, X134, X135, X136, X137, X138, X139, X140, X141, X142, X143, X144, X145, X146, X147, X148, X149, X150, X151, X152, X153, X154, X155, X156, X157, X158, X159, X160, X161, X162, X163, X164, X165, X166, X167, X168, X169, X170, X171, X172, X173, X174, X175, X176, X177, X178, X179, X180, X181, X182, X183, X184, X185, X186, X187, X188, X189, X190, X191, X192, X193, X194, X195, X196, X197, X198, X199, X200, X201, X202, X203, X204, X205, X206, X207, X208, X209, X210, X211, X212, X213, X214, X215, X216, X217, X218, X219, X220, X221, X222, X223, X224, X225, X226, X227, X228, X229, X230, X231, X232, X233, X234, X235, X236, X237, X238, X239, X240, X241, X242, X243, X244, X245, X246, X247, X248, X249, X250, X251, X252, X253, X254, X255, X256, X257, X258, X259, X260, X261, X262, X263, X264, X265, X266, X267, X268, X269, X270, X271, X272, X273, X274, X275, X276, X277, X278, X279, X280, X281, X282, X283, X284, X285, X286, X287, X288, X289, X290, X291, X292, X293, X294, X295, X296, X297, X298, X299, X300, X301, X302, X303, X304, X305, X306, X307, X308, X309, X310, X311, X312, X313, X314, X315, X316, X317, X318, X319, X320, X321, X322, X323, X324, X325, X326, X327, X328, X329, X330, X331, X332, X333, X334, X335, X336, X337, X338, X339, X340, X341, X342, X343, X344, X345, X346, X347, X348, X349, X350, X351, X352, X353, X354, X355, X356, X357, X358, X359, X360, X361, X362, X363, X364, X365, X366, X367, X368, X369, X370, X371, X372, X373, X374, X375, X376, X377, X378, X379, X380, X381, X382, X383, X384, X385, X386, X387, X388, X389, X390, X391, X392, X393, X394, X395, X396, X397, X398, X399, X400, X401, X402, X403, X404, X405, X406, X407, X408, X409, X410, X411, X412, X413, X414, X415, X416, X417, X418, X419, X420, X421, X422, X423, X424, X425, X426, X427, X428, X429, X430, X431, X432, X433, X434, X435, X436, X437, X438, X439, X440, X441, X442, X443, X444, X445, X446, X447, X448, X449, X450, X451, X452, X453, X454, X455, X456, X457, X458, X459, X460, X461, X462, X463, X464, X465, X466, X467, X468, X469, X470, X471, X472, X473, X474, X475, X476, X477, X478, X479, X480, X481, X482, X483, X484, X485, X486, X487, X488, X489, X490, X491, X492, X493, X494, X495, X496, X497, X498, X499, X500, X501, X502, X503, X504, X505, X506, X507, X508, X509, X510, X511, X512, X513, X514, X515, X516, X517, X518, X519, X520, X521, X522, X523, X524, X525, X526, X527, X528, X529, X530, X531, X532, X533, X534, X535, X536, X537, X538, X539, X540, X541, X542, X543, X544, X545, X546, X547, X548, X549, X550, X551
## Instrument(s):
## No. Observations: 10000
We already stated that a partially linear regression model in a \(\log\)-\(\log\)-specification will allow us to interpret the regression coefficient \(\theta_0\) as the price elasticity of demand. We restate the main regression as well as the auxiliary regression that is required for orthogonality
\[\begin{aligned}\log Q &= \theta_0 \log P + g_0(X) + \zeta,\\ \log P &= m_0(X) + V\end{aligned},\]
with \(\mathbb{E}(\zeta|D,X)=0\) and \(\mathbb{E}(V|X)=0\). As stated above, we hope to justify the assumption \(\mathbb{E}(\zeta|D,X)=0\) by sufficiently accounting for the confounding variables \(X\).
We start with the linear regression model as a benchmark lerner for learning nuisance parameters \(g_0(X)\) and \(m_0(X)\). We additionally set up two models based on a lasso learner as well as a random forest learner and compare our results.
ml_l_lin_reg = lrn("regr.lm")
ml_m_lin_reg = lrn("regr.lm")
ml_l_lasso = lrn("regr.cv_glmnet", s = "lambda.min")
ml_m_lasso = lrn("regr.cv_glmnet", s = "lambda.min")
ml_l_forest = lrn("regr.ranger", num.trees = 50,
min.node.size = 3)
ml_m_forest = lrn("regr.ranger", num.trees = 50,
min.node.size = 3)
For each learner configuration, we initialize a new DoubleMLPLR
object. We stick to the default options, i.e.,
dml_procedure = 'dml2'
,
score = "partialling out"
, n_folds = 5.
set.seed(123)
dml_plr_lin_reg = DoubleMLPLR$new(data_dml,
ml_l = ml_l_lin_reg,
ml_m = ml_m_lin_reg)
set.seed(123)
dml_plr_lasso = DoubleMLPLR$new(data_dml,
ml_l = ml_l_lasso,
ml_m = ml_m_lasso)
set.seed(123)
dml_plr_forest = DoubleMLPLR$new(data_dml,
ml_l = ml_l_forest,
ml_m = ml_m_forest)
To estimate our target parameter \(\theta_0\), we call the fit()
method. The results can be summarized by calling the
summary()
method.
dml_plr_lin_reg$fit(store_predictions = TRUE)
## Warning in predict.lm(object = self$model, newdata = newdata, se.fit = se_fit):
## Vorhersage durch Fit ohne vollen Rang mag täuschen
## Warning in predict.lm(object = self$model, newdata = newdata, se.fit = se_fit):
## Vorhersage durch Fit ohne vollen Rang mag täuschen
## Warning in predict.lm(object = self$model, newdata = newdata, se.fit = se_fit):
## Vorhersage durch Fit ohne vollen Rang mag täuschen
## Warning in predict.lm(object = self$model, newdata = newdata, se.fit = se_fit):
## Vorhersage durch Fit ohne vollen Rang mag täuschen
## Warning in predict.lm(object = self$model, newdata = newdata, se.fit = se_fit):
## Vorhersage durch Fit ohne vollen Rang mag täuschen
## Warning in predict.lm(object = self$model, newdata = newdata, se.fit = se_fit):
## Vorhersage durch Fit ohne vollen Rang mag täuschen
## Warning in predict.lm(object = self$model, newdata = newdata, se.fit = se_fit):
## Vorhersage durch Fit ohne vollen Rang mag täuschen
## Warning in predict.lm(object = self$model, newdata = newdata, se.fit = se_fit):
## Vorhersage durch Fit ohne vollen Rang mag täuschen
## Warning in predict.lm(object = self$model, newdata = newdata, se.fit = se_fit):
## Vorhersage durch Fit ohne vollen Rang mag täuschen
## Warning in predict.lm(object = self$model, newdata = newdata, se.fit = se_fit):
## Vorhersage durch Fit ohne vollen Rang mag täuschen
dml_plr_lin_reg$summary()
## Estimates and significance testing of the effect of target variables
## Estimate. Std. Error t value Pr(>|t|)
## dLnP -1.83549 0.04343 -42.26 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
dml_plr_lasso$fit(store_predictions = TRUE)
dml_plr_lasso$summary()
## Estimates and significance testing of the effect of target variables
## Estimate. Std. Error t value Pr(>|t|)
## dLnP -1.81615 0.04381 -41.45 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
dml_plr_forest$fit(store_predictions = TRUE)
dml_plr_forest$summary()
## Estimates and significance testing of the effect of target variables
## Estimate. Std. Error t value Pr(>|t|)
## dLnP -1.80559 0.04359 -41.43 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Define function for RMSE of nuisance components
pred_acc_plr = function(obj, nuis) {
# A function to calculate prediction accuracy values for every repetition
# of a Double Machine Learning model using PLR, DoubleMLPLR
# Parameters
# DoubleML: DoubleML::DoubleMLPLR
# A DoubleML PLR model object
# nuis: character (1l)
# Indicates nuisance component for evaluation of RMSE, either
# 'ml_l' or 'ml_m'
# Export data, fitted coefficient and predictions of the DoubleML model
y = obj$data$data_model[[obj$data$y_col]]
d = obj$data$data_model[[obj$data$treat_col]]
theta = obj$coef
ml_nuis = obj$predictions[[nuis]]
# Dimensions of prediction array
h = obj$data$n_obs
if (nuis == "ml_l") {
export_pred = theta*d + ml_nuis
} else if (nuis == "ml_m") {
export_pred = ml_nuis
}
rmse = mlr3measures::rmse(y, ml_nuis)
return(rmse)
}
rmse_lin_reg_ml_l = pred_acc_plr(dml_plr_lin_reg, 'ml_l')
rmse_lin_reg_ml_m = pred_acc_plr(dml_plr_lin_reg, 'ml_m')
rmse_lasso_ml_l = pred_acc_plr(dml_plr_lasso, 'ml_l')
rmse_lasso_ml_m = pred_acc_plr(dml_plr_lasso, 'ml_m')
rmse_forest_ml_l = pred_acc_plr(dml_plr_forest, 'ml_l')
rmse_forest_ml_m = pred_acc_plr(dml_plr_forest, 'ml_m')
estimators = c("regression", "lasso", "forest")
estimators = ordered(estimators, levels = estimators)
plr_rmse = data.table(
"ML" = estimators,
"RMSE" = c(rmse_lin_reg_ml_l, rmse_lasso_ml_l,
rmse_forest_ml_l,
rmse_lin_reg_ml_m, rmse_lasso_ml_m,
rmse_forest_ml_m),
"nuis" = c(rep("ml_l", 3), rep("ml_m", 3)))
plr_rmse
## ML RMSE nuis
## 1: regression 1.133228 ml_l
## 2: lasso 1.104634 ml_l
## 3: forest 1.087215 ml_l
## 4: regression 1.295019 ml_m
## 5: lasso 1.287731 ml_m
## 6: forest 1.289340 ml_m
g_rmse_ml_l = ggplot(plr_rmse[nuis == 'ml_l',], aes(x = ML, y = RMSE,
fill = ML)) +
geom_point(size = 5, color = "darkblue") +
theme_minimal() + ylab("RMSE") +
ggtitle("RMSE, ml_l") +
xlab("learner") + theme(legend.position = "none")
g_rmse_ml_l
g_rmse_ml_m = ggplot(plr_rmse[nuis == 'ml_m',], aes(x = ML, y = RMSE,
fill = ML)) +
geom_point(size = 5, color = "darkblue") +
theme_minimal() + ylab("RMSE") +
ggtitle("RMSE, ml_m") +
xlab("learner") + theme(legend.position = "none")
g_rmse_ml_m
We can visualize and summarize our findings so far. We can conclude that the price elasticity of demand, as indicated by the causal parameter \(\theta_0\), is around \(-1.8\). In all models, the coefficient is significantly different from zero.
models = list(dml_plr_lin_reg, dml_plr_lasso, dml_plr_forest)
plr_summary_list = lapply(models,
function(x) {
ci = x$confint()
return(list("coef" = x$coef,
"lower" = ci[1],
"upper" = ci[2]))
})
plr_summary = data.table::rbindlist(plr_summary_list)
plr_summary[, "ML" := estimators]
plr_summary
## coef lower upper ML
## 1: -1.835487 -1.920613 -1.750361 regression
## 2: -1.816147 -1.902015 -1.730280 lasso
## 3: -1.805594 -1.891020 -1.720169 forest
g_ci = ggplot(plr_summary, aes(x = ML, y = coef)) +
geom_point() +
geom_errorbar(aes(ymin = lower, ymax = upper), color = "grey") +
theme_minimal() + ylab("Coefficients and 0.95-CI") +
xlab("learner") +
theme(axis.text.x = element_text(angle = 90), legend.position = "none",
text = element_text(size = 20)) +
geom_hline(yintercept = 0, color = "darkgrey") +
ylim(-3, 0.1)
g_ci
Acknowledgement
We would like to thank Lars Roemheld for setting up the blog post on demand estimation using double machine learning as well as for sharing the code and preprocessed data set. We hope that with this notebook, we illustrate how to run such an analysis using DoubleML. Moreover, we would like to thank Anzony Quispe for excellent assistance in creating this notebook.