Target variable for Close Price Feature that uses Fractional Differentiation?

Want : Implement Neural Network using only 1 feature (1Minute currency close price that has been stationarised using fractional differentiation)





Situation :

Quantra by QuantInsti | Courses on Algorithmic and Quantitative Trading

Stated on this course (Feature Enginerring s16 u5) shows how can we turn the closing price to stationary. Where we turn it to lognormal. and traditional NN uses daily % return as one of the feature, and target variable as this too, but uses next timestamp. 

Unfortunately in this case, we cant do the same thing too



Problem :

In this case, what would be the best target variable be ?

Or perhaps if theres better feature ideas that are stationary too, i am open for new suggestion too

 

Hey Dwi



Can you please elaborate what is the issue that you are facing with daily % returns? If we can predict it as a target variable, we can use that to get the prediction for a closing price as well.



Other than that, you can refer to the Triple Barrier Method in Section 15, Unit 15 of the course for a stationary process as an alternative.



If you wish to work with a non-stationary process, you can look into more advanced techniques like Long short-term memory (LSTM) networks.



Thanks,

Rishabh

I have several questions :


  1. So if we choose to use LSTM, it is completely fine to use non-stationary data for all of the features ?


  2. Right now we are confused on what kind of features and the target value to be used on NN in the blueshift. For simplicity, We were thinking of using just :

        - feature : log of the closing price, and stationarised it using this method      

        - target variable : using Triple barrier method (but here uses return, not log price like my proposed feature, is it still make sense, or we must set both to log price or returns too?)

        Perhaps you can give us input on how to select the best pair of features & tarvet variable ? (or even, features & larget variables is completely fine if theres no relationship/connection between them ?)


  3. Can you share the blueshift template for LSTM to my email at dwihdyn789@gmail.com


  4. Does more different kind of features being inputted to the NN MLPClassfier (or LSTM), translates to better predictions ?





    Thank you!

Hello Dwi,


  1. The LSTM layer can handle non-stationary data, but as with all other machine learning models, it's a good practice to use stationary data. Here is a good reading resource for the same.


  2. The choice of feature and target variables does not have a set rule, but rather there are general guidelines/empirical assumptions one makes to try and improve the model. These are things like having uncorrelated features which are weakly of predictive nature. It helps to have stationary data for a good model fit. For machine learning models, the relationship between features and target may not be exactly known before making the model. Only after model validation, we can understand if there was some relationship, and if there was, how strong was it. Generally, people tend to use the feature target combination where they have some intuition or financial basis for a possibility of a relationship, and then they try to model that relationship using ML models.


  3. The blueshift template for LSTM is not available at the moment as it might be resource-intensive to run on the server. Instead, we have attached the live trading template for IBridgePy, which can be adapted with any other model from the course codes to run locally.


  4. You can try different combinations of uncorrelated weakly predictive input features. Maybe start with moving averages of different time frames or other technical indicators of your choice.



    Hope this helped!

    Please reach out should you have any more questions!



    Thanks,

    Gaurav

Hi Gaurav,



Appreciate the help! I do have few question from that :


  • How can we wish to implement & deploy LSTM in blueshift ? (where should we put the LSTM-weights-best.hdf5 file)


  • as per your answer no3, the only file we can find in the course are here. Where can we find the template that we can immediatelt run it in ibridgepy ?


  • as per answer no2 & 4, good guideline on feature selction is to have uncorrelated different features ? (ie : when we plot all of the features in a heatmap, we should see most of them are close to 0)


  • as for no4, have you (or anyone that you know) have tried using data from different timeframe, and it actually helps the model to become more accurate ?


  • as for model validation you mentioned in answer2, are you referring to this ? if its so, where should we put the CV_weights-best.hdf5 file if we wish to deploy it in blueshift ?


  • what do you mean by 'uncorrelated weakly predictive input features' that you mentioned in ans2 & 4 ?





    Thank you!

    Dwi

- How can we wish to implement & deploy LSTM in blueshift ? (where should we put the LSTM-weights-best.hdf5 file)

Blueshift currently doesn’t allow to import hdf5 files. We recommend using IBridgePy for now until that feature is added to Blueshift. It is not expected to be available in the immediate future in Blueshift.

- as per your answer no3, the only file we can find in the course are here. Where can we find the template that we can immediatelt run it in ibridgepy ?

The template file can be found at the below link. Before using this file, you have to train, test & cross-validate your model. Once the model is ready for paper trading or live trading. Then you import that model to generate trading signals and place orders using IBridgePy.
https://quantra.quantinsti.com/startCourseDetails?cid=70&section_no=10&unit_no=2#course_type=paid&unit_type=ZipFiles


- as per answer no2 & 4, good guideline on feature selction is to have uncorrelated different features ? (ie : when we plot all of the features in a heatmap, we should see most of them are close to 0)
That is correct. In this way, we can reduce the bias on a particular set of features that show a high correlation.

- as for no4, have you (or anyone that you know) have tried using data from different timeframe, and it actually helps the model to become more accurate ?
We have used features from different timeframes and it does help in prediction. However, it is recommended that you try this, backtest and see the performance on the test dataset to validate the effectiveness of different timeframes for your trading strategy.

- as for model validation you mentioned in answer2, are you referring to this ? if its so, where should we put the CV_weights-best.hdf5 file if we wish to deploy it in blueshift ?
Blueshift doesn’t allow importing of hdf5 files currently. This feature is planned for addition in Blueshift. We suggest that meanwhile, you use IBridgePy to automate your trades. The template shared above in answer number 2 would be helpful.

- what do you mean by 'uncorrelated weakly predictive input features' that you mentioned in ans2 & 4 ?
  • For the ML model, we would like to input features that can accurately predict the target variable. These are called strongly predictive input or features. With such features, we get good accuracy, precision and f1score for the model. But instead of just using strongly predictive features you can also pass weakly predictive features as input. This helps the model better adapt to unseen data in future. 
  • The uncorrelated point is the same as above. It helps to reduce bias while training a machine learning model. That is, the ML model doesn’t overlearn a particular set of features that are highly correlated. 
  • Therefore, you should pass uncorrelated and weakly predictive features to train the ML model.

I hope this helps.