Trading with Machine Learning : Regression

hello sir anil sir i need some your guidance how to get stocks option data histrocial data in python tell gemini propmpt yha code pleasee

Hi Iyrics
I am using MQL5 and get data from my Broker using MetaTrader5 library in Pandas.
Sorry I won’t be able to help, as I am not aware of other data collection methods.

thank you sir for repling

1 Like

[A] Your assumption is correct. Using a basic LinearRegression model with K-Fold cross-validation and evaluating it using MSE and R² can help you decide whether the regression model fits your data well. If you later want to fine-tune hyperparameters or test more complex models, you might consider using GridSearchCV, but for an initial evaluation, your approach is valid.

[B] With K-Fold, you don’t create separate training and test datasets outside of the loop, so the challenge comes when you try to merge predictions from the folds back into your original dataset. The error you’re encountering is likely due to a misalignment between the indices of the predictions and those of your original DataFrame.
Can you try it in this manner
X[‘yU_predict’] = yU_predict
X[‘yD_predict’] = yD_predict

[C] You can use Spyder to run the python scripts as well

Hi Rekhit
Thanks for the reply.

The error with
X[‘yU_predict’] = yU_predict
X[‘yD_predict’] = yD_predict

[C:\Users\anilh\AppData\Local\Temp\ipykernel_4540\2336015915.py:2](file:///C:/Users/anilh/AppData/Local/Temp/ipykernel_4540/2336015915.py#line=1): SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: Indexing and selecting data — pandas 2.2.3 documentation X[‘yU_predict’] = yU_predict C:\Users\anilh\AppData\Local\Temp\ipykernel_4540\2336015915.py:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: Indexing and selecting data — pandas 2.2.3 documentation X[‘yD_predict’] = yD_predict

Just a thought, I could add yU/yD predicted values columns in the original rates_df (rather than in a slice of it) as these values are now just being added/subtracted from ‘Open’ to arrive at Predicted High and Low and from there calculate the strategy ‘pnl’

Let me know if this is workable option and if so, then how to implement it?

Hello Anil,

Thank you for your reply, let me go through it and reply.

Hi LL,

This query is replied at: How to get stocks option historcal data python i times many times with gemini-i-get-not-right-code - #2 by Ajay_Pawar

Hello Anil,

Yes, you can directly add to the rates_df dataframe as well.

rates_df[‘yU_predict’] = pd.Series(yU_predict, index=rates_df.index)
rates_df[‘yD_predict’] = pd.Series(yD_predict, index=rates_df.index)

Once you have the values, you can create the new columns of Predicted prices and calculate strategy pnl.

Hello Rekhit

The suggested solution worked without any error. I will take the code for next calculations.

With issue in mind, do you suggest to have a index column based on int numbers and keep datetime as a normal datetime column?

Do you mind if I move the code from JupyterNoteBook to Spyder? I found it more user friendly. If you okay than I would be sharing code in Spyder from now onwards.

Hi Anil,
Usually, the date column is in date time format and is considered the index for easy data analysis.

Sure, Spyder format also works.

2 Likes

Hello Rekhit

I have changed the code to Spyder (.py) format.

I did noticed some errors for training the model with K-Fold which I have rectified now.

There is no calculation errors now, and you need to help me improve the predictions (R2). Make sure you give enough time to read the code, and highlight if I am making some logical or fundamental errors.

PredictHighLow_v1.02

Hello Anil,

Let me go through it and reply

Hi
Review specially K-Fold spits, if I am doing it correctly or not.
I have finished code in Chapter3 of (ML In Trading). Nice book co-authored by you.

1 Like

Hello,

It seems the code is correct but for scaling when it comes to time series data, can you perform the scaling within each fold to prevent data leakage?
I see that now you are using a random forest model, which is giving comparatively better R-squared values.
Further to K-fold, maybe you can try time-series split, which was done for specifically time-series data
Now comes the point where we have to realise the basic concept of a machine learning model. ou are using certain features to help predict the target variable. One of the reasons for low performance can be the features are not exactly great at predicting the target variable. You can check feature importance in the random forest model.

And finally try to keep a part of data as a verification set where you are applying the predictions to check after the k-fold cv is done, does the performance stay the same or not.

Hope this helps