A set of doubts concerning the course "Trading with Machine learning: Regression"

Ghery_Cardenas · May 5, 2023, 2:39am

Hello there:

I have a question concering the course " Trading with machine learning: Regression", in this course… an ohlcv datafrme for some dates is used in order to evaluate some input variables … to be used to evaluate the High and Low for the next period…

With the ohlcv data… a set of other variables is constructed, (moving averages, correlations, and others) in order to be used as input variables… I wonder… how do you determine these features are suitable for predcting gold prices ??

Now I wonder…in the course… a new variables yD and yU are used as the difference between the next high with the next open… and the next open with the next low… I was wondering…

Why is this variable used ? couldn't we just use the next low and next high as the output variables ??

I ws wondering also… What input variables should be considered as inputs for american stocks?? and why??

I am planning to evaluate a machine learning algorithm for predicting the next for stocks… I intend to dip buy with this prediction… could that be a good strategy ??

I sincerely would thank your help…

_Rushda_Ansari · May 5, 2023, 11:01am

Hi Ghery,

In order to ensure that I answer all your questions, I have addressed each of the questions individually.

Q.1 how do you determine these features are suitable for predcting gold prices ??

Determining which features are suitable for predicting gold prices requires a combination of domain knowledge, statistical analysis, and experimentation.

To determine features, you can start by brainstorming a list of potential features that may be related to gold prices. These can include technical indicators, price and volume data. Once you have a list of potential features, you can conduct statistical analysis to determine which features have the strongest correlation with gold prices. Finally, you can experiment with different combinations of features and select those combinations that result in the most accurate predictions.

If you want to understand why we selected these specific features, you’ll find a detailed answer here.

Q.2 Why is this variable (yD and yU) used ? couldn't we just use the next low and next high as the output variables ??

While using the next low and next high as output variables may seem like a simpler approach, using yD and yU instead can provide a more accurate representation of price movements by smoothing out short-term fluctuations and capturing the direction as well as magnitude of price movements.

Q.3 What input variables should be considered as inputs for american stocks?? and why??

The concepts taught in this course are applicable to any stock of your choice and are not specific to a certain geographical area.

Q.4 I am planning to evaluate a machine learning algorithm for predicting the next for stocks… I intend to dip buy with this prediction… could that be a good strategy ??

Please note that the effectiveness of a strategy cannot be guaranteed. There is no guarantee that the algorithm will make accurate predictions or that it will result in profitable trades.

Therefore, before you go live with any strategy it is very important to backtest the strategy. The decision of whether or not you should go live with a strategy would depend on your backtest results. Apart from this, setting up risk management measures such as stop-loss is also important. You can enrol to the "Backtesting Trading Strategies" course on Quantra to get a detailed understanding of these concepts.

Hope this answers all your questions!

Thanks,

Rushda Ansari

Ghery_Cardenas · May 5, 2023, 7:08pm

Thank you very much, neverteless… I still have some doubts concerning this…

When I asked for example… about american stocks… I didn't mean the fact that they are in a "certain geographical area." … Buth What I meant was that they have a particularity that is… the data usually provided by most data providers is the data from 9:30 to 16:00 New York time, that means market open… But the stocks also can be bought premarket (from 4:00 a.m. to 9:30 a.m.) or afterhours… (from 16:00 to 20:00) and usually the ohlcv data provided by most sites does take into account only market open…

Maybe premarket and afterhours can have some effect… right??

_Rushda_Ansari · May 8, 2023, 9:13am

Hey Ghery,

Yes, the fact that stocks can be traded in premarket and after-hours sessions can have an effect on the stock prices. It's important to note that premarket and after-hours trading can be less liquid and have wider bid-ask spreads than regular market hours, which can result in greater volatility and potentially larger price movements. So it may be useful to include data from these sessions in the analysis to capture this additional market activity.

However, it's also important to keep in mind that premarket and after-hours trading can be more difficult for retail traders to access and may require specialized trading accounts or platforms. Additionally, the trading volume during these sessions may not be representative of the broader market, and any analysis should be done with caution and with a full understanding of the risks involved.

Thanks,

Rushda

Ghery_Cardenas · May 14, 2023, 2:56pm

Hello there:

Thanks for the info right there… greatly appreciate your help … but even if the trading on this hours is not representative of the broader market… there are times… when… there is enough volume so that stocks can move significantly in those times…

Also you said "Once you have a list of potential features, you can conduct statistical analysis to determine which features have the strongest correlation …" … well yes but I am having a hard time tryin to find those… I am evaluating the correlation with the Pearson correlation coefficient … is that ok?? or maybe I need to consider some other metrics to evaluate correlation…

However… I am having a hard time trying to find those input values… I don't know which ones to consider… I evaluated already with Exponential moving averages and also tested the differencies between the open and the previous open… and also the differece between the prevous close and the open… the results were disaapointing… None of them had a Pearson correlation coefficient greater than 0,6…

Thank you very much…

Anudeep_Verma_EJZu7 · May 15, 2023, 12:39pm

Yes, you can use Pearson correlation to evaluate.

There are some standard input features you can use in your prediction model in addition to what you are currently using. They are- moving averages (you can try using different lookback periods here e.g., 9, 14, 21 days), Bollinger Bands, ATR, and Chaikin. These indicators can help capture trends and volatility.

Remember, the selection of input features may not guarantee accurate results. It's essential to perform thorough feature selection and experimentation to identify the most relevant and predictive features for your asset.

Ghery_Cardenas · June 9, 2023, 12:43pm

To Anudeep Verma:

Thanks a lot for the info… But how do I know if the input features that I am using are good enough??

And by the way… how many features do I have to use…??

Anudeep_Verma_EJZu7 · June 13, 2023, 3:44am

Hello Ghery,

There's no fixed number of features that works universally for all problems, and their quality depends on the specific task, dataset, and algorithm used. We need to determine their goodness -

Your features should be relevant to the problem by capturing meaningful information about the target variable.
Features are more informative when they exhibit high variance within a class and low variance across classes.
Features that strongly correlate with the target variable are more informative and valuable and should be independent of each other.
Experts in the domain field have insights into which features are likely essential for the task.
More features can lead to overfitting and increased computational complexity. We need to aim for a balance between having enough features and keeping the feature space manageable.

It's important to note that feature engineering and selection is an iterative process. You can start with a good set of features, evaluate their performance, and then iterate by adding, removing, or transforming features based on the results.