Stratifying in train_test_split and time-series data

Mohammad_Amin_Saghizadeh_4LHMB · June 15, 2020, 5:06pm

In Section 3, Unit 1 of decision tree models course of the ML track, it is said that we should use stratify parameter of 'train_test_split' method in order to preserve the ratio of label classes in train set, after spliting.

Isn't it violate the temporal nature and order of time-series data? I also think that it will lead to look-ahead bias, too.

Akshay_Nautiyal_2ycOh · June 16, 2020, 8:25am

Hello Mohammad,

Thanks for pointing that out. If fact, not just the parameter stratify … even using the train_test_split method will lead to look ahead bias as it samples indices randomly which may lead to an index from the end being used in training and an index from the beginning to be used in testing. We'll rectify this and let you know. Thanks.

Mohammad_Amin_Saghizadeh_4LHMB · June 18, 2020, 4:03pm

Thanks.