R: Estimation of Price Elasticities with Double/Debiased Machine Learning

In this example, we will demonstrate the use of the DoubleML package in a real-data industry example: Estimation of price elasticity of demand. This notebook is based on a blogpost by Lars Roemheld (Roemheld, 2021) with code and preprocessed data being available from GitHub. The original data file is made available as a public domain (CC0 1.0 Universal) data set and shared on kaggle. It contains data on sales from an online retailer in the period of December 2010 until December 2011.

The data preprocessing is performed in a separate notebook that is available online. To keep the computational effort at a moderate level, we will only use a subset of the data that is used in Roemheld (2021). Our main goal is to illustrate the main steps of elasticity estimation with DoubleML.

The following case study is organized according to the steps of the DoubleML workflow.

0. Problem Formulation: Estimation of Price Elasticity of Demand

“Supply” and “demand” are probably the very first terms that economics and business students hear in their studies. In industry, the price elasticity of demand is a very important quantity: It indicates how much the demand for a product (= the quantity sold by the firm) changes due to a change in its price. As a retailer, this quantity is of great interest because it makes it possible to increase revenues, and eventually profits, by optimally adjusting prices according to elasticities.

The price elasticity of demand is formally defined as the relative change of the demanded quantity (\(q\)) of a product given a percent-change of the price (\(p\))

\[\theta_0 = \frac{\partial q/q}{\partial p/p}.\]

In words, the parameter \(\theta_0\) can be interpreted as follows: Provided the price for a product increases by \(1\%\), the demanded quantity changes by \(\theta_0\%\).

In general, it would be possible to estimate \(\theta_0\) based on an experiment or A/B test. However, this is not possible in our case as the data set only contains information on actual purchases in the period of consideration.

The causal problem of price estimation based on an observational study is quite complex: It involves many (simultaneous) decisions made by the customers and the sellers. One approach for estimation of the causal parameter \(\theta_0\) would be to account for confounding variables, that might have an effect to both the price and the quantity sold. The approach taken in Roemheld (2021) is to flexibly account for and construct confounding variables, for example including similarities in their product description or seasonal patterns, and thereby justifying identification of \(\theta_0\).

We can use a partially linear regression (PLR) model for estimation of \(\theta_0\)

\[\log Q = \theta_0 \log P + g_0(X) + \zeta,\]

with \(\mathbb{E}(\zeta|D,X)=0\). The confounders can enter the regression equation nonlinearily via the function \(g_0(X)\). In order to equip \(\theta_0\) (approximately) with the interpretation of a price elasticity, we applied the \(\log()\) to both the demanded quantity (\(Q\)) and the prices (\(P\)), i.e., we set up a \(\log\)-\(\log\)-regression.

Before we proceed with the data analysis, it is important to mention a potential drawback to our analysis: The data only contains information on sales, not on stock days. Hence, based on this data set, it is not possible to assess what happened on days without sales (sales = 0). This drawback must be kept in mind when we draw causal conclusions from this analysis.

1. Data-Backend

To give an idea on the general setting we briefly load an exemplary data excerpt from the original data set. We can see that the data lists the transaction of a (online) retailer selling products like inflatable political globes or fancy pens.

# load required packages
library(data.table)
library(mlr3)
library(mlr3learners)
library(DoubleML)
library(ggplot2)

# suppress messages during fitting
lgr::get_logger("mlr3")$set_threshold("warn")

# Load example data set from URL
url = 'https://raw.githubusercontent.com/DoubleML/doubleml-docs/master/doc/examples/data/orig_demand_data_example.csv'
data_example = fread(url)
data_example

##     V1       Date StockCode        Country                  Description
##  1:  0 2010-12-01     10002         France   INFLATABLE POLITICAL GLOBE
##  2:  1 2010-12-01     10002 United Kingdom   INFLATABLE POLITICAL GLOBE
##  3:  2 2010-12-01     10125 United Kingdom      MINI FUNKY DESIGN TAPES
##  4:  3 2010-12-01     10133 United Kingdom COLOURING PENCILS BROWN TUBE
##  5:  4 2010-12-01     10135 United Kingdom COLOURING PENCILS BROWN TUBE
##  6:  5 2010-12-01     11001 United Kingdom  ASSTD DESIGN RACING CAR PEN
##  7:  6 2010-12-01    15044B United Kingdom           BLUE PAPER PARASOL
##  8:  7 2010-12-01   15056BL United Kingdom      EDWARDIAN PARASOL BLACK
##  9:  8 2010-12-01    15056N United Kingdom    EDWARDIAN PARASOL NATURAL
## 10:  9 2010-12-01    15056P United Kingdom       EDWARDIAN PARASOL PINK
##     Quantity revenue UnitPrice
##  1:       48   40.80     0.850
##  2:       12   10.20     0.850
##  3:        2    1.70     0.850
##  4:        5    4.25     0.850
##  5:        1    2.51     2.510
##  6:        3   10.08     3.360
##  7:        1    2.95     2.950
##  8:       20  113.00     5.650
##  9:       50  236.30     4.726
## 10:       48  220.80     4.600

In our analysis, we will use a preprocessed data set. Each row corresponds to the sales of a product at a specific date \(t\).

In the data we have,

Quantity: Quantity demanded
revenue: Revenue
UnitPrice: Price per unit
month: Month
DoM: Day of month
DoW: Day of week
stock_age_days: Number of days product has been sold / observed in the data
sku_avg_p: Average (=median) price of the product
2010-12-01, …: Date dummies
Australia, …: Country dummies
1, 2, … : Numerical variables constructed to capture similarities in product descriptions (n-grams)
dLnP: Change in Price
dLnQ: Change in Quantity

Note that we do not include product dummies as the price and quantity variables have been demeaned to account for product characteristics.

# Load data set from URL
url2 = 'https://raw.githubusercontent.com/DoubleML/doubleml-docs/master/doc/examples/data/elasticity_subset.csv'
demand_data = fread(url2)

# Replace column names by names that are conform to R
names(demand_data) = make.names(names(demand_data), unique = TRUE)

# replace
names(demand_data)[1:20]

##  [1] "V1"                   "Quantity"             "revenue"             
##  [4] "UnitPrice"            "month"                "DoM"                 
##  [7] "DoW"                  "stock_age_days"       "sku_avg_p"           
## [10] "X2010.12.01.00.00.00" "X2010.12.02.00.00.00" "X2010.12.03.00.00.00"
## [13] "X2010.12.05.00.00.00" "X2010.12.06.00.00.00" "X2010.12.07.00.00.00"
## [16] "X2010.12.08.00.00.00" "X2010.12.09.00.00.00" "X2010.12.10.00.00.00"
## [19] "X2010.12.12.00.00.00" "X2010.12.13.00.00.00"

# Print dimensions of data set
dim(demand_data)

## [1] 10000   906

# Glimpse at first rows of data set
head(demand_data[,1:10])

##        V1 Quantity revenue UnitPrice month DoM DoW stock_age_days sku_avg_p
## 1: 189628        5    8.15  1.630000     9   6   1            278      0.85
## 2:  37914       19   41.93  2.206842     1  25   1             55      2.10
## 3:  80103       24   20.40  0.850000     3  31   3            120      0.85
## 4:  75019       12   23.40  1.950000     3  24   3            113      2.08
## 5:  99878        4   39.80  9.950000     5   5   3            155      9.95
## 6:  20013        1    1.25  1.250000    12  23   3             22      1.25
##    X2010.12.01.00.00.00
## 1:                    0
## 2:                    0
## 3:                    0
## 4:                    0
## 5:                    0
## 6:                    0

To initiate the data backend, we create a new DoubleMLData object. During instantiation, we assign the roles of the variables, i.e., dLnQ as the dependent var, dLnP as the treatment variable and the remaining variables as confounders.

feature_names = names(demand_data)[5:(dim(demand_data)[2]-2)]

data_dml = DoubleMLData$new(demand_data,
                            y_col = "dLnQ",
                            d_cols = 'dLnP',
                            x_cols = feature_names)
data_dml

## ================= DoubleMLData Object ==================
## 
## 
## ------------------ Data summary      ------------------
## Outcome variable: dLnQ
## Treatment variable(s): dLnP
## Covariates: month, DoM, DoW, stock_age_days, sku_avg_p, X2010.12.01.00.00.00, X2010.12.02.00.00.00, X2010.12.03.00.00.00, X2010.12.05.00.00.00, X2010.12.06.00.00.00, X2010.12.07.00.00.00, X2010.12.08.00.00.00, X2010.12.09.00.00.00, X2010.12.10.00.00.00, X2010.12.12.00.00.00, X2010.12.13.00.00.00, X2010.12.14.00.00.00, X2010.12.15.00.00.00, X2010.12.16.00.00.00, X2010.12.17.00.00.00, X2010.12.19.00.00.00, X2010.12.20.00.00.00, X2010.12.21.00.00.00, X2010.12.22.00.00.00, X2010.12.23.00.00.00, X2011.01.04.00.00.00, X2011.01.05.00.00.00, X2011.01.06.00.00.00, X2011.01.07.00.00.00, X2011.01.09.00.00.00, X2011.01.10.00.00.00, X2011.01.11.00.00.00, X2011.01.12.00.00.00, X2011.01.13.00.00.00, X2011.01.14.00.00.00, X2011.01.16.00.00.00, X2011.01.17.00.00.00, X2011.01.18.00.00.00, X2011.01.19.00.00.00, X2011.01.20.00.00.00, X2011.01.21.00.00.00, X2011.01.23.00.00.00, X2011.01.24.00.00.00, X2011.01.25.00.00.00, X2011.01.26.00.00.00, X2011.01.27.00.00.00, X2011.01.28.00.00.00, X2011.01.30.00.00.00, X2011.01.31.00.00.00, X2011.02.01.00.00.00, X2011.02.02.00.00.00, X2011.02.03.00.00.00, X2011.02.04.00.00.00, X2011.02.06.00.00.00, X2011.02.07.00.00.00, X2011.02.08.00.00.00, X2011.02.09.00.00.00, X2011.02.10.00.00.00, X2011.02.11.00.00.00, X2011.02.13.00.00.00, X2011.02.14.00.00.00, X2011.02.15.00.00.00, X2011.02.16.00.00.00, X2011.02.17.00.00.00, X2011.02.18.00.00.00, X2011.02.20.00.00.00, X2011.02.21.00.00.00, X2011.02.22.00.00.00, X2011.02.23.00.00.00, X2011.02.24.00.00.00, X2011.02.25.00.00.00, X2011.02.27.00.00.00, X2011.02.28.00.00.00, X2011.03.01.00.00.00, X2011.03.02.00.00.00, X2011.03.03.00.00.00, X2011.03.04.00.00.00, X2011.03.06.00.00.00, X2011.03.07.00.00.00, X2011.03.08.00.00.00, X2011.03.09.00.00.00, X2011.03.10.00.00.00, X2011.03.11.00.00.00, X2011.03.13.00.00.00, X2011.03.14.00.00.00, X2011.03.15.00.00.00, X2011.03.16.00.00.00, X2011.03.17.00.00.00, X2011.03.18.00.00.00, X2011.03.20.00.00.00, X2011.03.21.00.00.00, X2011.03.22.00.00.00, X2011.03.23.00.00.00, X2011.03.24.00.00.00, X2011.03.25.00.00.00, X2011.03.27.00.00.00, X2011.03.28.00.00.00, X2011.03.29.00.00.00, X2011.03.30.00.00.00, X2011.03.31.00.00.00, X2011.04.01.00.00.00, X2011.04.03.00.00.00, X2011.04.04.00.00.00, X2011.04.05.00.00.00, X2011.04.06.00.00.00, X2011.04.07.00.00.00, X2011.04.08.00.00.00, X2011.04.10.00.00.00, X2011.04.11.00.00.00, X2011.04.12.00.00.00, X2011.04.13.00.00.00, X2011.04.14.00.00.00, X2011.04.15.00.00.00, X2011.04.17.00.00.00, X2011.04.18.00.00.00, X2011.04.19.00.00.00, X2011.04.20.00.00.00, X2011.04.21.00.00.00, X2011.04.26.00.00.00, X2011.04.27.00.00.00, X2011.04.28.00.00.00, X2011.05.01.00.00.00, X2011.05.03.00.00.00, X2011.05.04.00.00.00, X2011.05.05.00.00.00, X2011.05.06.00.00.00, X2011.05.08.00.00.00, X2011.05.09.00.00.00, X2011.05.10.00.00.00, X2011.05.11.00.00.00, X2011.05.12.00.00.00, X2011.05.13.00.00.00, X2011.05.15.00.00.00, X2011.05.16.00.00.00, X2011.05.17.00.00.00, X2011.05.18.00.00.00, X2011.05.19.00.00.00, X2011.05.20.00.00.00, X2011.05.22.00.00.00, X2011.05.23.00.00.00, X2011.05.24.00.00.00, X2011.05.25.00.00.00, X2011.05.26.00.00.00, X2011.05.27.00.00.00, X2011.05.29.00.00.00, X2011.05.31.00.00.00, X2011.06.01.00.00.00, X2011.06.02.00.00.00, X2011.06.03.00.00.00, X2011.06.05.00.00.00, X2011.06.06.00.00.00, X2011.06.07.00.00.00, X2011.06.08.00.00.00, X2011.06.09.00.00.00, X2011.06.10.00.00.00, X2011.06.12.00.00.00, X2011.06.13.00.00.00, X2011.06.14.00.00.00, X2011.06.15.00.00.00, X2011.06.16.00.00.00, X2011.06.17.00.00.00, X2011.06.19.00.00.00, X2011.06.20.00.00.00, X2011.06.21.00.00.00, X2011.06.22.00.00.00, X2011.06.23.00.00.00, X2011.06.24.00.00.00, X2011.06.26.00.00.00, X2011.06.27.00.00.00, X2011.06.28.00.00.00, X2011.06.29.00.00.00, X2011.06.30.00.00.00, X2011.07.01.00.00.00, X2011.07.03.00.00.00, X2011.07.04.00.00.00, X2011.07.05.00.00.00, X2011.07.06.00.00.00, X2011.07.07.00.00.00, X2011.07.08.00.00.00, X2011.07.10.00.00.00, X2011.07.11.00.00.00, X2011.07.12.00.00.00, X2011.07.13.00.00.00, X2011.07.14.00.00.00, X2011.07.15.00.00.00, X2011.07.17.00.00.00, X2011.07.18.00.00.00, X2011.07.19.00.00.00, X2011.07.20.00.00.00, X2011.07.21.00.00.00, X2011.07.22.00.00.00, X2011.07.24.00.00.00, X2011.07.25.00.00.00, X2011.07.26.00.00.00, X2011.07.27.00.00.00, X2011.07.28.00.00.00, X2011.07.29.00.00.00, X2011.07.31.00.00.00, X2011.08.01.00.00.00, X2011.08.02.00.00.00, X2011.08.03.00.00.00, X2011.08.04.00.00.00, X2011.08.05.00.00.00, X2011.08.07.00.00.00, X2011.08.08.00.00.00, X2011.08.09.00.00.00, X2011.08.10.00.00.00, X2011.08.11.00.00.00, X2011.08.12.00.00.00, X2011.08.14.00.00.00, X2011.08.15.00.00.00, X2011.08.16.00.00.00, X2011.08.17.00.00.00, X2011.08.18.00.00.00, X2011.08.19.00.00.00, X2011.08.21.00.00.00, X2011.08.22.00.00.00, X2011.08.23.00.00.00, X2011.08.24.00.00.00, X2011.08.25.00.00.00, X2011.08.26.00.00.00, X2011.08.28.00.00.00, X2011.08.30.00.00.00, X2011.08.31.00.00.00, X2011.09.01.00.00.00, X2011.09.02.00.00.00, X2011.09.04.00.00.00, X2011.09.05.00.00.00, X2011.09.06.00.00.00, X2011.09.07.00.00.00, X2011.09.08.00.00.00, X2011.09.09.00.00.00, X2011.09.11.00.00.00, X2011.09.12.00.00.00, X2011.09.13.00.00.00, X2011.09.14.00.00.00, X2011.09.15.00.00.00, X2011.09.16.00.00.00, X2011.09.18.00.00.00, X2011.09.19.00.00.00, X2011.09.20.00.00.00, X2011.09.21.00.00.00, X2011.09.22.00.00.00, X2011.09.23.00.00.00, X2011.09.25.00.00.00, X2011.09.26.00.00.00, X2011.09.27.00.00.00, X2011.09.28.00.00.00, X2011.09.29.00.00.00, X2011.09.30.00.00.00, X2011.10.02.00.00.00, X2011.10.03.00.00.00, X2011.10.04.00.00.00, X2011.10.05.00.00.00, X2011.10.06.00.00.00, X2011.10.07.00.00.00, X2011.10.09.00.00.00, X2011.10.10.00.00.00, X2011.10.11.00.00.00, X2011.10.12.00.00.00, X2011.10.13.00.00.00, X2011.10.14.00.00.00, X2011.10.16.00.00.00, X2011.10.17.00.00.00, X2011.10.18.00.00.00, X2011.10.19.00.00.00, X2011.10.20.00.00.00, X2011.10.21.00.00.00, X2011.10.23.00.00.00, X2011.10.24.00.00.00, X2011.10.25.00.00.00, X2011.10.26.00.00.00, X2011.10.27.00.00.00, X2011.10.28.00.00.00, X2011.10.30.00.00.00, X2011.10.31.00.00.00, X2011.11.01.00.00.00, X2011.11.02.00.00.00, X2011.11.03.00.00.00, X2011.11.04.00.00.00, X2011.11.06.00.00.00, X2011.11.07.00.00.00, X2011.11.08.00.00.00, X2011.11.09.00.00.00, X2011.11.10.00.00.00, X2011.11.11.00.00.00, X2011.11.13.00.00.00, X2011.11.14.00.00.00, X2011.11.15.00.00.00, X2011.11.16.00.00.00, X2011.11.17.00.00.00, X2011.11.18.00.00.00, X2011.11.20.00.00.00, X2011.11.21.00.00.00, X2011.11.22.00.00.00, X2011.11.23.00.00.00, X2011.11.24.00.00.00, X2011.11.25.00.00.00, X2011.11.27.00.00.00, X2011.11.28.00.00.00, X2011.11.29.00.00.00, X2011.11.30.00.00.00, X2011.12.01.00.00.00, X2011.12.02.00.00.00, X2011.12.04.00.00.00, X2011.12.05.00.00.00, X2011.12.06.00.00.00, X2011.12.07.00.00.00, X2011.12.08.00.00.00, X2011.12.09.00.00.00, Australia, Austria, Bahrain, Belgium, Brazil, Canada, Channel.Islands, Cyprus, Czech.Republic, Denmark, EIRE, European.Community, Finland, France, Germany, Greece, Hong.Kong, Iceland, Israel, Italy, Japan, Lebanon, Lithuania, Malta, Netherlands, Norway, Poland, Portugal, RSA, Saudi.Arabia, Singapore, Spain, Sweden, Switzerland, USA, United.Arab.Emirates, United.Kingdom, Unspecified, X0, X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X17, X18, X19, X20, X21, X22, X23, X24, X25, X26, X27, X28, X29, X30, X31, X32, X33, X34, X35, X36, X37, X38, X39, X40, X41, X42, X43, X44, X45, X46, X47, X48, X49, X50, X51, X52, X53, X54, X55, X56, X57, X58, X59, X60, X61, X62, X63, X64, X65, X66, X67, X68, X69, X70, X71, X72, X73, X74, X75, X76, X77, X78, X79, X80, X81, X82, X83, X84, X85, X86, X87, X88, X89, X90, X91, X92, X93, X94, X95, X96, X97, X98, X99, X100, X101, X102, X103, X104, X105, X106, X107, X108, X109, X110, X111, X112, X113, X114, X115, X116, X117, X118, X119, X120, X121, X122, X123, X124, X125, X126, X127, X128, X129, X130, X131, X132, X133, X134, X135, X136, X137, X138, X139, X140, X141, X142, X143, X144, X145, X146, X147, X148, X149, X150, X151, X152, X153, X154, X155, X156, X157, X158, X159, X160, X161, X162, X163, X164, X165, X166, X167, X168, X169, X170, X171, X172, X173, X174, X175, X176, X177, X178, X179, X180, X181, X182, X183, X184, X185, X186, X187, X188, X189, X190, X191, X192, X193, X194, X195, X196, X197, X198, X199, X200, X201, X202, X203, X204, X205, X206, X207, X208, X209, X210, X211, X212, X213, X214, X215, X216, X217, X218, X219, X220, X221, X222, X223, X224, X225, X226, X227, X228, X229, X230, X231, X232, X233, X234, X235, X236, X237, X238, X239, X240, X241, X242, X243, X244, X245, X246, X247, X248, X249, X250, X251, X252, X253, X254, X255, X256, X257, X258, X259, X260, X261, X262, X263, X264, X265, X266, X267, X268, X269, X270, X271, X272, X273, X274, X275, X276, X277, X278, X279, X280, X281, X282, X283, X284, X285, X286, X287, X288, X289, X290, X291, X292, X293, X294, X295, X296, X297, X298, X299, X300, X301, X302, X303, X304, X305, X306, X307, X308, X309, X310, X311, X312, X313, X314, X315, X316, X317, X318, X319, X320, X321, X322, X323, X324, X325, X326, X327, X328, X329, X330, X331, X332, X333, X334, X335, X336, X337, X338, X339, X340, X341, X342, X343, X344, X345, X346, X347, X348, X349, X350, X351, X352, X353, X354, X355, X356, X357, X358, X359, X360, X361, X362, X363, X364, X365, X366, X367, X368, X369, X370, X371, X372, X373, X374, X375, X376, X377, X378, X379, X380, X381, X382, X383, X384, X385, X386, X387, X388, X389, X390, X391, X392, X393, X394, X395, X396, X397, X398, X399, X400, X401, X402, X403, X404, X405, X406, X407, X408, X409, X410, X411, X412, X413, X414, X415, X416, X417, X418, X419, X420, X421, X422, X423, X424, X425, X426, X427, X428, X429, X430, X431, X432, X433, X434, X435, X436, X437, X438, X439, X440, X441, X442, X443, X444, X445, X446, X447, X448, X449, X450, X451, X452, X453, X454, X455, X456, X457, X458, X459, X460, X461, X462, X463, X464, X465, X466, X467, X468, X469, X470, X471, X472, X473, X474, X475, X476, X477, X478, X479, X480, X481, X482, X483, X484, X485, X486, X487, X488, X489, X490, X491, X492, X493, X494, X495, X496, X497, X498, X499, X500, X501, X502, X503, X504, X505, X506, X507, X508, X509, X510, X511, X512, X513, X514, X515, X516, X517, X518, X519, X520, X521, X522, X523, X524, X525, X526, X527, X528, X529, X530, X531, X532, X533, X534, X535, X536, X537, X538, X539, X540, X541, X542, X543, X544, X545, X546, X547, X548, X549, X550, X551
## Instrument(s): 
## No. Observations: 10000

2. Causal Model

We already stated that a partially linear regression model in a \(\log\)-\(\log\)-specification will allow us to interpret the regression coefficient \(\theta_0\) as the price elasticity of demand. We restate the main regression as well as the auxiliary regression that is required for orthogonality

\[\begin{aligned}\log Q &= \theta_0 \log P + g_0(X) + \zeta,\\ \log P &= m_0(X) + V\end{aligned},\]

with \(\mathbb{E}(\zeta|D,X)=0\) and \(\mathbb{E}(V|X)=0\). As stated above, we hope to justify the assumption \(\mathbb{E}(\zeta|D,X)=0\) by sufficiently accounting for the confounding variables \(X\).

3. ML Methods

We start with the linear regression model as a benchmark lerner for learning nuisance parameters \(g_0(X)\) and \(m_0(X)\). We additionally set up two models based on a lasso learner as well as a random forest learner and compare our results.

ml_l_lin_reg = lrn("regr.lm")
ml_m_lin_reg = lrn("regr.lm")

ml_l_lasso = lrn("regr.cv_glmnet", s = "lambda.min")
ml_m_lasso = lrn("regr.cv_glmnet", s = "lambda.min")

ml_l_forest = lrn("regr.ranger", num.trees = 50,
                  min.node.size = 3)
ml_m_forest = lrn("regr.ranger", num.trees = 50,
                  min.node.size = 3)

4. DML Specifications

For each learner configuration, we initialize a new DoubleMLPLR object. We stick to the default options, i.e., dml_procedure = 'dml2', score = "partialling out", n_folds = 5.

set.seed(123)
dml_plr_lin_reg = DoubleMLPLR$new(data_dml,
                                  ml_l = ml_l_lin_reg,
                                  ml_m = ml_m_lin_reg)

set.seed(123)
dml_plr_lasso = DoubleMLPLR$new(data_dml,
                                ml_l = ml_l_lasso,
                                ml_m = ml_m_lasso)

set.seed(123)
dml_plr_forest = DoubleMLPLR$new(data_dml,
                                 ml_l = ml_l_forest,
                                 ml_m = ml_m_forest)

5. Estimation

To estimate our target parameter \(\theta_0\), we call the fit() method. The results can be summarized by calling the summary() method.

dml_plr_lin_reg$fit(store_predictions = TRUE)

## Warning in predict.lm(object = self$model, newdata = newdata, se.fit = se_fit):
## Vorhersage durch Fit ohne vollen Rang mag täuschen

## Warning in predict.lm(object = self$model, newdata = newdata, se.fit = se_fit):
## Vorhersage durch Fit ohne vollen Rang mag täuschen

## Warning in predict.lm(object = self$model, newdata = newdata, se.fit = se_fit):
## Vorhersage durch Fit ohne vollen Rang mag täuschen

## Warning in predict.lm(object = self$model, newdata = newdata, se.fit = se_fit):
## Vorhersage durch Fit ohne vollen Rang mag täuschen

## Warning in predict.lm(object = self$model, newdata = newdata, se.fit = se_fit):
## Vorhersage durch Fit ohne vollen Rang mag täuschen

## Warning in predict.lm(object = self$model, newdata = newdata, se.fit = se_fit):
## Vorhersage durch Fit ohne vollen Rang mag täuschen

## Warning in predict.lm(object = self$model, newdata = newdata, se.fit = se_fit):
## Vorhersage durch Fit ohne vollen Rang mag täuschen

## Warning in predict.lm(object = self$model, newdata = newdata, se.fit = se_fit):
## Vorhersage durch Fit ohne vollen Rang mag täuschen

## Warning in predict.lm(object = self$model, newdata = newdata, se.fit = se_fit):
## Vorhersage durch Fit ohne vollen Rang mag täuschen

## Warning in predict.lm(object = self$model, newdata = newdata, se.fit = se_fit):
## Vorhersage durch Fit ohne vollen Rang mag täuschen

dml_plr_lin_reg$summary()

## Estimates and significance testing of the effect of target variables
##      Estimate. Std. Error t value Pr(>|t|)    
## dLnP  -1.83549    0.04343  -42.26   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

dml_plr_lasso$fit(store_predictions = TRUE)
dml_plr_lasso$summary()

## Estimates and significance testing of the effect of target variables
##      Estimate. Std. Error t value Pr(>|t|)    
## dLnP  -1.81615    0.04381  -41.45   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

dml_plr_forest$fit(store_predictions = TRUE)
dml_plr_forest$summary()

## Estimates and significance testing of the effect of target variables
##      Estimate. Std. Error t value Pr(>|t|)    
## dLnP  -1.80559    0.04359  -41.43   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Define function for RMSE of nuisance components
pred_acc_plr = function(obj, nuis) {
    # A function to calculate prediction accuracy values for every repetition
    # of a Double Machine Learning model using PLR, DoubleMLPLR
    
    # Parameters
    # DoubleML: DoubleML::DoubleMLPLR 
        # A DoubleML PLR model object
    # nuis: character (1l)
        # Indicates nuisance component for evaluation of RMSE, either
        # 'ml_l' or 'ml_m'
    
    # Export data, fitted coefficient and predictions of the DoubleML model
    y = obj$data$data_model[[obj$data$y_col]]
    d = obj$data$data_model[[obj$data$treat_col]]
    theta = obj$coef
    ml_nuis = obj$predictions[[nuis]]
    
    # Dimensions of prediction array
    h = obj$data$n_obs
    
    if (nuis == "ml_l") {
        export_pred = theta*d + ml_nuis
    } else if (nuis == "ml_m") {
        export_pred = ml_nuis
    }    
    rmse = mlr3measures::rmse(y, ml_nuis)
    return(rmse)
}

rmse_lin_reg_ml_l = pred_acc_plr(dml_plr_lin_reg, 'ml_l')
rmse_lin_reg_ml_m = pred_acc_plr(dml_plr_lin_reg, 'ml_m')

rmse_lasso_ml_l = pred_acc_plr(dml_plr_lasso, 'ml_l')
rmse_lasso_ml_m = pred_acc_plr(dml_plr_lasso, 'ml_m')

rmse_forest_ml_l = pred_acc_plr(dml_plr_forest, 'ml_l')
rmse_forest_ml_m = pred_acc_plr(dml_plr_forest, 'ml_m')

estimators = c("regression", "lasso", "forest")
estimators = ordered(estimators, levels = estimators)

plr_rmse = data.table(
    "ML" = estimators,
    "RMSE" = c(rmse_lin_reg_ml_l, rmse_lasso_ml_l,
               rmse_forest_ml_l,              
               rmse_lin_reg_ml_m, rmse_lasso_ml_m,
               rmse_forest_ml_m),
    "nuis" = c(rep("ml_l", 3), rep("ml_m", 3)))

plr_rmse

##            ML     RMSE nuis
## 1: regression 1.133228 ml_l
## 2:      lasso 1.104634 ml_l
## 3:     forest 1.087215 ml_l
## 4: regression 1.295019 ml_m
## 5:      lasso 1.287731 ml_m
## 6:     forest 1.289340 ml_m

g_rmse_ml_l = ggplot(plr_rmse[nuis == 'ml_l',], aes(x = ML, y = RMSE,
                                               fill = ML)) +
        geom_point(size = 5, color = "darkblue") +
        theme_minimal() + ylab("RMSE") +
        ggtitle("RMSE, ml_l") +
        xlab("learner") + theme(legend.position = "none")
g_rmse_ml_l

g_rmse_ml_m = ggplot(plr_rmse[nuis == 'ml_m',], aes(x = ML, y = RMSE,
                                               fill = ML)) +
        geom_point(size = 5, color = "darkblue") +
        theme_minimal() + ylab("RMSE") +
        ggtitle("RMSE, ml_m") +
        xlab("learner") + theme(legend.position = "none")
g_rmse_ml_m

7. Inference

We can visualize and summarize our findings so far. We can conclude that the price elasticity of demand, as indicated by the causal parameter \(\theta_0\), is around \(-1.8\). In all models, the coefficient is significantly different from zero.

models = list(dml_plr_lin_reg, dml_plr_lasso, dml_plr_forest)

plr_summary_list = lapply(models,
                    function(x) {
                        ci = x$confint()
                        return(list("coef" = x$coef,
                                    "lower" = ci[1],
                                    "upper" = ci[2]))
                        })

plr_summary = data.table::rbindlist(plr_summary_list)
plr_summary[, "ML" := estimators]

plr_summary

##         coef     lower     upper         ML
## 1: -1.835487 -1.920613 -1.750361 regression
## 2: -1.816147 -1.902015 -1.730280      lasso
## 3: -1.805594 -1.891020 -1.720169     forest

g_ci = ggplot(plr_summary, aes(x = ML, y = coef)) +
    geom_point() +
    geom_errorbar(aes(ymin = lower, ymax = upper), color = "grey") +
    theme_minimal() + ylab("Coefficients and 0.95-CI") +
    xlab("learner") +
    theme(axis.text.x = element_text(angle = 90), legend.position = "none",
          text = element_text(size = 20)) + 
    geom_hline(yintercept = 0, color = "darkgrey") +
    ylim(-3, 0.1)

g_ci

Acknowledgement

We would like to thank Lars Roemheld for setting up the blog post on demand estimation using double machine learning as well as for sharing the code and preprocessed data set. We hope that with this notebook, we illustrate how to run such an analysis using DoubleML. Moreover, we would like to thank Anzony Quispe for excellent assistance in creating this notebook.