1.6. doubleml.data.DoubleMLDIDData#
- class doubleml.data.DoubleMLDIDData(data, y_col, d_cols, x_cols=None, z_cols=None, t_col=None, cluster_cols=None, use_other_treat_as_covariate=True, force_all_x_finite=True, force_all_d_finite=True)#
- Double machine learning data-backend for Difference-in-Differences models. - DoubleMLDIDDataobjects can be initialized from- pandas.DataFrame’s as well as- numpy.ndarray’s.- Parameters:
- data ( - pandas.DataFrame) – The data.
- y_col (str) – The outcome variable. 
- t_col (str) – The time variable for DiD models. 
- x_cols (None, str or list) – The covariates. If - None, all variables (columns of- data) which are neither specified as outcome variable- y_col, nor treatment variables- d_cols, nor instrumental variables- z_cols, nor time variable- t_colare used as covariates. Default is- None.
- z_cols (None, str or list) – The instrumental variable(s). Default is - None.
- cluster_cols (None, str or list) – The cluster variable(s). Default is - None.
- use_other_treat_as_covariate (bool) – Indicates whether in the multiple-treatment case the other treatment variables should be added as covariates. Default is - True.
- force_all_x_finite (bool or str) – Indicates whether to raise an error on infinite values and / or missings in the covariates - x. Possible values are:- True(neither missings- np.nan,- pd.NAnor infinite values- np.infare allowed),- False(missings and infinite values are allowed),- 'allow-nan'(only missings are allowed). Note that the choice- Falseand- 'allow-nan'are only reasonable if the machine learning methods used for the nuisance functions are capable to provide valid predictions with missings and / or infinite values in the covariates- x. Default is- True.
- force_all_d_finite (bool) – Indicates whether to raise an error on infinite values and / or missings in the treatment variables - d. Default is- True. Examples
- -------- 
- DoubleMLDIDData (>>> from doubleml import) 
- make_did_SZ2020 (>>> from doubleml.did.datasets import) 
- pandas.DataFrame (>>> # initialization from) 
- make_did_SZ2020(return_type='DataFrame') (>>> df =) 
- DoubleMLDIDData(df (>>> obj_dml_data_from_df =) 
- 'y' 
- 'd' 
- 't') 
- np.ndarray (>>> # initialization from) 
- (x (>>>) 
- y 
- d 
- make_did_SZ2020(return_type='array') (t) =) 
- DoubleMLDIDData.from_arrays(x (>>> obj_dml_data_from_array =) 
- y 
- d 
- t=t) 
 
 - Methods - from_arrays(x, y, d[, z, t, cluster_vars, ...])- Initialize - DoubleMLDIDDataobject from- numpy.ndarray's.- set_x_d(treatment_var)- Function that assigns the role for the treatment variables in the multiple-treatment case. - Attributes - all_variables- All variables available in the dataset. - binary_outcome- Logical indicating whether the outcome variable is binary with values 0 and 1. - binary_treats- Series with logical(s) indicating whether the treatment variable(s) are binary with values 0 and 1. - cluster_cols- The cluster variable(s). - cluster_vars- Array of cluster variable(s). - d- Array of treatment variable; Dynamic! Depends on the currently set treatment variable; To get an array of all treatment variables (independent of the currently set treatment variable) call - obj.data[obj.d_cols].values.- d_cols- The treatment variable(s). - data- The data. - force_all_d_finite- Indicates whether to raise an error on infinite values and / or missings in the treatment variables - d.- force_all_x_finite- Indicates whether to raise an error on infinite values and / or missings in the covariates - x.- is_cluster_data- Flag indicating whether this data object is being used for cluster data. - n_cluster_vars- The number of cluster variables. - n_coefs- The number of coefficients to be estimated. - n_instr- The number of instruments. - n_obs- The number of observations. - n_treat- The number of treatment variables. - t- Array of time variable. - t_col- The time variable. - use_other_treat_as_covariate- Indicates whether in the multiple-treatment case the other treatment variables should be added as covariates. - x- Array of covariates; Dynamic! May depend on the currently set treatment variable; To get an array of all covariates (independent of the currently set treatment variable) call - obj.data[obj.x_cols].values.- x_cols- The covariates. - y- Array of outcome variable. - y_col- The outcome variable. - z- Array of instrumental variables. - z_cols- The instrumental variable(s). 
- classmethod DoubleMLDIDData.from_arrays(x, y, d, z=None, t=None, cluster_vars=None, use_other_treat_as_covariate=True, force_all_x_finite=True, force_all_d_finite=True)#
- Initialize - DoubleMLDIDDataobject from- numpy.ndarray’s.- Parameters:
- x ( - numpy.ndarray) – Array of covariates.
- y ( - numpy.ndarray) – Array of the outcome variable.
- d ( - numpy.ndarray) – Array of treatment variables.
- t ( - numpy.ndarray) – Array of the time variable for DiD models.
- z (None or - numpy.ndarray) – Array of instrumental variables. Default is- None.
- cluster_vars (None or - numpy.ndarray) – Array of cluster variables. Default is- None.
- use_other_treat_as_covariate (bool) – Indicates whether in the multiple-treatment case the other treatment variables should be added as covariates. Default is - True.
- force_all_x_finite (bool or str) – Indicates whether to raise an error on infinite values and / or missings in the covariates - x. Possible values are:- True(neither missings- np.nan,- pd.NAnor infinite values- np.infare allowed),- False(missings and infinite values are allowed),- 'allow-nan'(only missings are allowed). Note that the choice- Falseand- 'allow-nan'are only reasonable if the machine learning methods used for the nuisance functions are capable to provide valid predictions with missings and / or infinite values in the covariates- x. Default is- True.
- force_all_d_finite (bool) – Indicates whether to raise an error on infinite values and / or missings in the treatment variables - d. Default is- True.
 
 - Examples - >>> from doubleml import DoubleMLDIDData >>> from doubleml.did.datasets import make_did_SZ2020 >>> (x, y, d, t) = make_did_SZ2020(return_type='array') >>> obj_dml_data_from_array = DoubleMLDIDData.from_arrays(x, y, d, t=t) 
 
    
  
  
    