Complete list

How to Use Machine Learning in Trading?

Did you know that machine learning algorithms can add value to your trading strategies or investment portfolio?

Machine learning is like having a financial Sherlock Holmes on your team, sifting through heaps of historical data to uncover hidden patterns and practical insights. The ML model doesn't just stop at pointing out trading opportunities but also suggests exit points and fine-tuning trade sizes.

In this post, we will look at the step-by-step process of creating and backtesting a machine learning-based trading system. If you are new to machine learning, Python for Machine Learning course would be helpful.

Note that backtesting results do not guarantee future performance. The presented strategy results are intended solely for educational purposes and should not be interpreted as investment advice.

The following topics are covered in this Quantra Classroom:

The Need for Machine Learning in Trading
Real-World Applications of Machine Learning in Trading
Anatomy of a Machine Learning-Based Trading System
Backtesting
Analysing performance

The Need for Machine Learning in Trading

The need for machine learning in trading has grown significantly in recent years, as financial markets have become increasingly complex and competitive.

Machine learning techniques offer a powerful tool for traders and financial institutions to make sense of vast amounts of data, detect patterns, and make informed decisions in real-time. These algorithms can analyse historical market data, news sentiment, and a wide array of economic indicators to identify potential trading opportunities and manage risks more effectively.

As markets become more data-driven and sophisticated, the integration of machine learning in trading is no longer a luxury but a necessity for those seeking a competitive edge.

Real-World Applications of Machine Learning in Trading

One of the most prominent applications of machine learning is algorithmic trading, where machine learning models are employed to automate the execution of trades, optimising strategies, and responding to market changes at lightning speed.

Sentiment analysis, another crucial application, leverages natural language processing to process news articles, social media, and other textual data to gauge market sentiment and make informed trading decisions.

Risk management is also greatly enhanced through machine learning, as predictive models can assess and mitigate potential risks in real-time. Machine learning has undoubtedly become a cornerstone of modern trading, providing invaluable insights, risk mitigation, and efficiency in a dynamic and data-rich financial landscape.

Anatomy of a Machine Learning-Based Trading System

Figure: Anatomy of Machine Learning Model

This is a general structure for using machine learning for trading. It involves the following steps:

Data Collection
Data Preprocessing
Feature Engineering
Data Split
Train Model with Training Data
Fit Test Data on Model
Evaluate the Model

Continuous monitoring and optimisation are essential to keep the system adaptive and profitable in evolving market conditions.

Let’s try to make this classroom interesting. In this classroom, you will try to answer the problem statement, “Should you buy JP Morgan stock or not?”

You will need some data before you decide to buy or sell JP Morgan. This leads us to the first step.

Step 1: Data Collection

You can gather historical market data, such as stock prices, volume, and relevant economic indicators.

How do you collect data?

You should access data from trusted sources or APIs, ensuring data quality and accuracy.

Let’s take JP Morgan’s price data from January 3, 2017 to December 31, 2019.

Figure: JP Morgan’s OHLCV Data

This is the 15-minute price and volume data of JP Morgan, which includes the Open, High, Low, and Close prices along with the Volume data.

Step 2: Data Preprocessing

Before you input the data into the machine learning model, you should make sure that the data is correct.

For example, in the table below, from 10:45 to 11:00 am, the close price was $87. But from 11:15 to 11:30 am it was recorded as $890. This means that there is some error in the data.

Also, from 11:30 to 11:45 am, the close price was 0, which seems odd.

Date-Time	Close
2024-03-04 10:45:00-11:00:00	$87
2024-03-04 11:00:00-11:15:00	$890
2024-03-04 11:15:00-11:30:00	$89
2024-03-04 11:30:00-11:45:00	$0

These are a few things you should check before using the data further. There are various functions in Python which handle all pre-processing tasks efficiently.

Step 3: Feature Engineering

Before we started talking about the machine learning model, we had put a question, “Should you buy JP Morgan or not?”

A simple answer to this would be, “If I am sure that JP Morgan will increase in the next 15 minutes or next time period, then I will buy.”

This means we are trying to predict whether the stock price will move up or not in the next 15 minutes or next time period.

Let’s introduce the concept of the target variable here. The target variable is what the machine learning model tries to predict to solve the problem statement. It is referred to as y.

How to create the target variable?

We will create a column, target. First, we will calculate the percentage change of the close prices. If the percentage change is positive, we will make the target variable as 1, otherwise 0. Since we are creating a model to predict future price movements, we will also shift the target variable by 1 time period.

Thus, the target column will have two labels, 1 and 0. Whenever the label is 1, the model indicates a buy signal. And whenever the label is 0, the model indicates do not buy.

Let’s say on January 3, from 10 am to 10:15 am, if the close price moves up, then you will keep the target variable as 1 for January 3, from 9:45 to 10 am.

You can learn more about the target variable by watching the video in the course Python for Machine Learning in Finance

All the concepts covered in this email are taken from the Quantra course Python for Machine Learning in Finance. You can preview the concepts taught in this classroom by clicking on the free preview button and going to Section 5 and Unit 1 of the course.

Note: The links in this tutorial will be accessible only after logging into quantra.quantinsti.com.

Figure: Target Variable

Now comes the interesting part. In order to predict the signal, you will create the input variables for the ML model. These input variables are called features. It is referred to as X. You can create features on the basis of:

Percentage change in the last n time periods: 1-period, 2-period.
Technical indicators: RSI, ADX
Volatility and other parameters.

Figure: Input Features

Step 4: Data Split

Before you can apply machine learning to a trading strategy, you have to train the model first. This is similar to learning about a certain topic and then taking an exam on it.

But a machine learning model expects features to be stationary so that it can learn properly. Thus, you would apply a technique called scaling. Here, you will scale the input features in a certain range. There are different methods available, such as Min-Max scaling, Normalisation etc.

After scaling, the features would look something like this.

Figure: Scaled Data

Now that our data is scaled, we will split the data.

In our case, you will split the X and y datasets in the ratio 80:20 for “train” and “test”. This means that you will use 80% of your data to train the model and then 20% for testing the model on how well the model learned.

Figure: Visualisation of Train and Test

You can see that the dataset is from 2017 to 2019. 80% of the dataset, i.e. from January 2017 to 28 May 2019 is in the train dataset, X_train and the rest of the dataset is in the test dataset X_test.

Step 5: Train Model with Training Data

Choose a machine learning algorithm such as a decision tree, random forest, or neural network. Train the model using the training dataset, optimising hyperparameters as necessary.

For this classroom, you will use the Random Forest Classifier model.

Figure: Train the ML Model

In essence, it will check the features for a given time period and will try to understand which values will give the target variable as buy, which is represented by 1.

Step 6: Fit Test Data on Model

Once the model is trained, you use the trained model to make predictions on the test dataset. Remember that when we split the dataset, the test dataset was from the date June 1, 2019 to December 31, 2019.

Now, it will check the input feature values in the test dataset and try to predict whether the stock will move up or not. This prediction will be saved in the column “y_pred”.

Figure: Predict on Test Data

Step 7: Evaluate the Model

Assess model performance using various metrics like accuracy, precision, recall, and F1-score for the model.

Accuracy is basically checking how many times was the model right in predicting the correct class. For example, if the model correctly predicts the target variable 90 out of 100 times, then the model’s accuracy is 90%. Similarly, there are other measures to help us understand the model’s performance.

Backtesting

Once, you are confident that your model is reasonably good at predicting whether JP Morgan should be bought or not, you will deploy the model to make real-time predictions based on current market data.

Interpret model predictions to generate buy, sell, or hold signals.

Here, if the model predicts that the stock price would move up in the next time period, we buy the stock. If the model predicts that the stock price will not move up, then we take no position, or sell the stock if we have already bought the stock.

Further, we will implement risk management rules, such as setting stop-loss and take-profit levels.

Finally, you will simulate the trading strategy using historical data to assess its performance over time. Calculate returns, drawdowns, and other relevant statistics. Make adjustments to the strategy based on backtesting results.

Figure: Strategy returns of machine learning-based trading strategy

Following a backtesting analysis conducted on JP Morgan's stock price spanning from June 2019 to January 2020, the cumulative returns of this strategy amounted to 1.28 times the initial investment. This translates to a CAGR of 52%. It is important to note that backtesting results do not guarantee future performance. The presented strategy results are intended solely for educational purposes and should not be interpreted as investment advice. A comprehensive strategy evaluation across multiple parameters is necessary to assess its effectiveness.

It is important to note that backtesting results do not guarantee future performance. The presented strategy results are intended solely for educational purposes and should not be interpreted as investment advice. A comprehensive evaluation of the strategy across multiple parameters is necessary to assess its effectiveness.

The creation of target and feature variables presented in the classroom has been covered in detail along with the Python code in this unit. You need to take a Free Preview of the course by clicking on the green-coloured Free Preview button on the right corner of the screen next to the FAQs tab and go to Section 6 and Unit 8 of the course.

What to do next?

Go to this course
Click on
Go through 10-15% of course content
Drop us your comments and queries on the community

IMPORTANT DISCLAIMER: This email is for educational purposes only and is not a solicitation or recommendation to buy or sell any securities. Investing in financial markets involves risks and you should seek the advice of a licensed financial advisor before making any investment decisions. Your investment decisions are solely your responsibility. The information provided is based on publicly available data and our own analysis, and we do not guarantee its accuracy or completeness. By no means is this communication sent as the licensed equity analysts or financial advisors and it should not be construed as professional advice or a recommendation to buy or sell any securities or any other kind of asset.

Check out our free courses

Co-authored by
Level: Beginner
Basic terminology, Research Papers, Working Models

Co-authored by
Level: Beginner
Covered Call, Protective Put, Iron Condor, Bull Call

Co-authored by
Level: Beginner
IBridgePy API, Installations, Order & Portfolio Management

RELATED KEYWORDS

Trending Courses

Learning Tracks

Quick Links

How to Use Machine Learning in Trading?

How to Use Machine Learning in Trading?

Our Cookie Policy