fx 3. Trading with Machine Learning Regression:
3.22 Cross Validation, Test and Train (and i a previous ex)
There are some examples of you normalizing the whole data sample before splitting into train/test datasets. Isn't that a mistake? as you are using info from the test set to normalise the train set.
Step 1: Scale the data
# First we put scaling and then linear regression in the pipeline.
steps = [('scaler', StandardScaler()),
('linear', LinearRegression())]
# Define pipeline
pipeline = Pipeline(steps)
Step laster: splitting the data
# We are using 80%-20% split, therefore splitting ratio will be 0.80
splitting_ratio = .80
# Split the data into two parts
# Use int to ensure that result is of integer data type.
split = int(splitting_ratio*len(gold_prices))
# Define train dataset
X_train = X[:split]
yU_train = yU[:split]
yD_train = yD[:split]
# Define test data
X_test = X[split:]
yU_test = yU[split:]
yD_test = yD[split:]