doubleml.datasets.make_plr_turrell2018#
- doubleml.datasets.make_plr_turrell2018(n_obs=100, dim_x=20, theta=0.5, return_type='DoubleMLData', **kwargs)#
Generates data from a partially linear regression model used in a blog article by Turrell (2018). The data generating process is defined as
\[ \begin{align}\begin{aligned}d_i &= m_0(x_i' b) + v_i, & &v_i \sim \mathcal{N}(0,1),\\y_i &= \theta d_i + g_0(x_i' b) + u_i, & &u_i \sim \mathcal{N}(0,1),\end{aligned}\end{align} \]with covariates \(x_i \sim \mathcal{N}(0, \Sigma)\), where \(\Sigma\) is a random symmetric, positive-definite matrix generated with
sklearn.datasets.make_spd_matrix()
. \(b\) is a vector with entries \(b_j=\frac{1}{j}\) and the nuisance functions are given by\[ \begin{align}\begin{aligned}m_0(x_i) &= \frac{1}{2 \pi} \frac{\sinh(\gamma)}{\cosh(\gamma) - \cos(x_i-\nu)},\\g_0(x_i) &= \sin(x_i)^2.\end{aligned}\end{align} \]- Parameters:
n_obs – The number of observations to simulate.
dim_x – The number of covariates.
theta – The value of the causal parameter.
return_type –
If
'DoubleMLData'
orDoubleMLData
, returns aDoubleMLData
object.If
'DataFrame'
,'pd.DataFrame'
orpd.DataFrame
, returns apd.DataFrame
.If
'array'
,'np.ndarray'
,'np.array'
ornp.ndarray
, returnsnp.ndarray
’s(x, y, d)
.**kwargs – Additional keyword arguments to set non-default values for the parameters \(\nu=0\), or \(\gamma=1\).
References
Turrell, A. (2018), Econometrics in Python part I - Double machine learning, Markov Wanderer: A blog on economics, science, coding and data. https://aeturrell.com/blog/posts/econometrics-in-python-parti-ml/.