doubleml.datasets.make_plr_turrell2018#

doubleml.datasets.make_plr_turrell2018(n_obs=100, dim_x=20, theta=0.5, return_type='DoubleMLData', **kwargs)#

Generates data from a partially linear regression model used in a blog article by Turrell (2018). The data generating process is defined as

\[ \begin{align}\begin{aligned}d_i &= m_0(x_i' b) + v_i, & &v_i \sim \mathcal{N}(0,1),\\y_i &= \theta d_i + g_0(x_i' b) + u_i, & &u_i \sim \mathcal{N}(0,1),\end{aligned}\end{align} \]

with covariates \(x_i \sim \mathcal{N}(0, \Sigma)\), where \(\Sigma\) is a random symmetric, positive-definite matrix generated with sklearn.datasets.make_spd_matrix(). \(b\) is a vector with entries \(b_j=\frac{1}{j}\) and the nuisance functions are given by

\[ \begin{align}\begin{aligned}m_0(x_i) &= \frac{1}{2 \pi} \frac{\sinh(\gamma)}{\cosh(\gamma) - \cos(x_i-\nu)},\\g_0(x_i) &= \sin(x_i)^2.\end{aligned}\end{align} \]
Parameters:
  • n_obs – The number of observations to simulate.

  • dim_x – The number of covariates.

  • theta – The value of the causal parameter.

  • return_type

    If 'DoubleMLData' or DoubleMLData, returns a DoubleMLData object.

    If 'DataFrame', 'pd.DataFrame' or pd.DataFrame, returns a pd.DataFrame.

    If 'array', 'np.ndarray', 'np.array' or np.ndarray, returns np.ndarray’s (x, y, d).

  • **kwargs – Additional keyword arguments to set non-default values for the parameters \(\nu=0\), or \(\gamma=1\).

References

Turrell, A. (2018), Econometrics in Python part I - Double machine learning, Markov Wanderer: A blog on economics, science, coding and data. https://aeturrell.com/blog/posts/econometrics-in-python-parti-ml/.