3.2.8. doubleml.plm.datasets.make_plr_turrell2018#
- doubleml.plm.datasets.make_plr_turrell2018(n_obs=100, dim_x=20, theta=0.5, return_type='DoubleMLData', **kwargs)#
- Generates data from a partially linear regression model used in a blog article by Turrell (2018). The data generating process is defined as \[ \begin{align}\begin{aligned}d_i &= m_0(x_i' b) + v_i, & &v_i \sim \mathcal{N}(0,1),\\y_i &= \theta d_i + g_0(x_i' b) + u_i, & &u_i \sim \mathcal{N}(0,1),\end{aligned}\end{align} \]- with covariates \(x_i \sim \mathcal{N}(0, \Sigma)\), where \(\Sigma\) is a random symmetric, positive-definite matrix generated with - sklearn.datasets.make_spd_matrix(). \(b\) is a vector with entries \(b_j=\frac{1}{j}\) and the nuisance functions are given by\[ \begin{align}\begin{aligned}m_0(x_i) &= \frac{1}{2 \pi} \frac{\sinh(\gamma)}{\cosh(\gamma) - \cos(x_i-\nu)},\\g_0(x_i) &= \sin(x_i)^2.\end{aligned}\end{align} \]- Parameters:
- n_obs – The number of observations to simulate. 
- dim_x – The number of covariates. 
- theta – The value of the causal parameter. 
- return_type – - If - 'DoubleMLData'or- DoubleMLData, returns a- DoubleMLDataobject.- If - 'DataFrame',- 'pd.DataFrame'or- pd.DataFrame, returns a- pd.DataFrame.- If - 'array',- 'np.ndarray',- 'np.array'or- np.ndarray, returns- np.ndarray’s- (x, y, d).
- **kwargs – Additional keyword arguments to set non-default values for the parameters \(\nu=0\), or \(\gamma=1\). 
 
 - References - Turrell, A. (2018), Econometrics in Python part I - Double machine learning, Markov Wanderer: A blog on economics, science, coding and data. https://aeturrell.com/blog/posts/econometrics-in-python-parti-ml/. 
 
    
  
  
    