Course Name: Financial Time Series Analysis for Trading, Section No: 24, Unit No: 11, Unit type: Notebook
Hello, why in this model we use 252 previous data to predict the next one and we don't use as in the previous models 70% of the data to predict and 30% to test?
https://imgur.com/NiNKUtg
Hello Daniel,
The AR, AR-MA, ARIMA-based models are models where the current value of a variable is predicted based on its past values. It assumes that the future value of a variable depends linearly on its past values and a random error term.
The ARCH model is used to model the volatility of a time series, particularly in financial markets where volatility clustering is observed (periods of high volatility followed by periods of low volatility).
Both have different objectives and thus different approaches when it comes to their prediction.
Unlike AR and ARIMA models that forecast future values, ARCH models focus on the conditional variance or volatility of the data. The goal is not to predict future values but rather to model the behaviour of volatility over time.
Also, as mentioned in the notebook, the sliding window method is used to fit the ARCH model on historical data. In the sliding window method, the ARCH model constants and coefficients are calculated every day by fitting the ARCH model on the latest data of a fixed number of periods. After completing the notebook, you will be able to fit the ARCH model using the sliding window method and predict the volatility.
This is why we do not split the dataset as we did previously.
Hope this helps.
So does this mean when applying ARCH or GARCH is not neccesary to split data into train/test to avoid overfit or test our model?
Hi Daniel,
It will not be correct to say that you don't have to split the data at all.
Let's take a step back and understand why we split the data.
If you had a moving average crossover strategy and saw that short moving average of 5 and long moving average of 10 gives good results.
Now you are not sure if this works because you have seen the data and figured out, or it is genuinely a good trading strategy. Here an "unseen dataset by the model" will be of help.
You will try the strategy on the unseen dataset and if you are getting a comparable performance, then it means that your strategy is not overfit.
In a similar manner, after creating a strategy, you want to try it out on unseen data, then you can split the data.
If you have not used some parameter, based on the data, then you might not have to split the dataset. Without looking at data, you said that you would go long on Monday and exit on Friday, you don't have to split the data. But if you saw the returns and deduced that you have better chances by going long on Monday and selling on Friday, then you will need a "unseen dataset" to check whether this strategy is in fact a good strategy.
Hope this helps.
I fully understood your point about why it would be neccesary to split the data, anyway I don't understand yet why here we used 252 as sliding window instead of 70% of the data, however I think we can here split the data and use 70% as training and 30% as test, right?
Hi Daniel,
Here we are using 252 days as the latest data to base our calculations for the ARCH model and forecast the volatility of the next day.
You can split the data and checkif there is any difference in the performance of the model.