MinMaxScaler problem

Mario_San_Cristobal_2ywwt · March 24, 2020, 9:42pm

Hi everyone!

I was wondering how to solve this "issue".

When I normalize features by using MinMaxScaler, what I do is:

Split data into train and validation
Use MinMaxScaler(0,1) in train dataset (fit_transform)
Use the scaler to normalize validation dataset (transform)

First question here is if this process is correct, or it should be:
Use MinMaxScaler(0,1) in dataset
Split data into train and validation

Then, after training the NN, I save the scaler object for incoming future data.

My question here is, what should I do when the incoming data is outside of the original scaler range and gives negative outputs. The reason why I ask this is because the prediction of the NN gives 0 automatically, so I don't know how to handle this. Should I always scale the new incoming data (ie, when a new bar forms) and don't use the scaler object from trainning?

The NN I'm using is 2 LSTM layers and 2 Dense layers, using the last Dense layer as output with 1 neuron. Dense layers has relu activation function and LSTM has default activation functions.

How can I solve this things?

PD: The same issue happens when I start getting values that are greater than the maximum scaler value, but I don't get 0. The output I get is almost 1.

Ishan_Shah · March 25, 2020, 8:40am

Hello Mario,

- Split data into train and validation

- Use MinMaxScaler(0,1) in train dataset (fit_transform)

- Use the scaler to normalize validation dataset (transform)

This is the correct way of approaching the problem. Otherwise, you would inadvertently introduce look-ahead bias in scaling. That is using data (validation dataset) to create the scaler which otherwise you won't have access to.

However, min max scaler is very sensitive to the outliers, and therefore you are getting undesired output.

The way to solve this problem is to make the feature stationary by using fractional differentiation so that there is minimal loss of information compared to taking the difference. This is covered in this notebook.

Additional reading (optional):

1. https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html

2. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html

Mario_San_Cristobal_2ywwt · March 26, 2020, 12:13am

Thank you for your answer, Ishan.

In this case, fractional differentiation would replace min max scaler?

And if it is the case, how can I do the inverse operation to get back the original numbers and predictions?

Akshay_Nautiyal_2ycOh · March 26, 2020, 10:51am

Hello Mario,

The minmaxscalar does not alter the relative value of data for a given feature. So, you can apply the scalar after getting the new time series using fractional differentiation. The reason we apply the minmaxscalar is to get all columns within the same range. The reason we apply fractional differentiation is to keep the price level memory while simultaneously passing the stationarity test.

You can reverse the output of the scalar using the inverse_transform function.

If you're wondering fractional differentiation can't be reversed. Just like you can't get the original price series from the daily percentage change series.

Mario_San_Cristobal_2ywwt · March 26, 2020, 2:27pm

Hi Akshay!

Thank you for your answer.

So, the correct way to go would be to do a fractional differentiation and then use minmaxscaler, right? If I'm doing a FD, is really minmaxscaler necessary? Because I noticed that FD already give me inputs in the 0-1 range.

And also, should I also apply FD and/or minmaxscaler to my target? For example, lets say that my target is the close price of the next candle. All my features would be FD'ed, but should my target (next candle's close) also be FD'ed/scaled?

And I guess that I should recalculate all features, not apply FD to each column, right?

Akshay_Nautiyal_2ycOh · March 27, 2020, 12:28pm

You're welcome, Mario!

"So, the correct way to go would be to do a fractional differentiation and then use minmaxscaler, right?" - Correct. It gives you a new series. Think of it as the price data on which you do all standard operations.

"Because I noticed that FD already give me inputs in the 0-1 range" - This won't be the case for all price data. Also, you might have some indicators as your input features. So, it's better to get all of them in a standard range.

"And also, should I also apply FD and/or minmaxscaler to my target?" - You can try both and see the results empirically. But intuitively, I think they target should be FDed as well.

And I guess that I should recalculate all features, not apply FD to each column, right? - apply the FD once in the beginning. I am assuming most columns would derive from this…

Mario_San_Cristobal_2ywwt · March 28, 2020, 7:24pm

Hello again, Akshay.

I did as the suggestions above. Did the fractional differentiation and then, minmaxscaler. I still have this problem.

When the input goes below the original min, the output of the neuron is 0.

The NN has 2 LSTM layers and then 2 Dense layers. Both Dense layers use relu as activation function. How do I manage this cases? This are almost 2 years of data. Should I use batch normalization between layers or change the activation function?

Akshay_Nautiyal_2ycOh · March 31, 2020, 4:45am

Hello Mario,

That is right, the output range for reLu is (0,inf) so for all negative values, it returns a strict 0. You can perhaps retry with other activation functions like tanh, sigmoid.

You can try batch-norm, but I suggest you make one change at a time to gauge its effect exclusively.

Do tell me if further help is needed.