Out of sample and in sample

 

Im looking at a lot of 'backtests' online, and I realize that they are nothing like the codes on your GitHub. They use a method (usually used to backtest) that is used twice on data that is split. Similar to training and testing. But Im wondering, how do I run this code on zipline or live?



Usually, you use insample to get hyperparameters and out-of-sample to see how the parameters perform.



Should I create two Python files? One for training and testing, and one for deployment? 

I would imagine that after training and testing, I use the same parameters in the live version. I don't think I would use the method twice again. But just once to make a prediction. The method would use all the data available, and then the last record (row) would be used to make a prediction on whether to buy, sell, or hold. 

 

Am I missing anything or did i get anything wrong?

 

Hello Emma,



Yes, you have a decent understanding of the process but let's take an example to clarify a few things.

For the sake of illustration, let's say you are creating a strategy on moving average. If the close price of previous day is more than the simple moving average, you buy. Otherwise, you will sell or have no position.



But what should the moving average be? Assuming you keep it as 10, you will test it. 

Once you have finalised the parameter, you will live trade (or paper trade as it helps you build confidence) it with this moving average.



The reason why the code might seem different is that we use vectorised code for backtesting. This saves computation time. But in live trading, we are using event driven code. 



You are right in a way, that you will use the last row, but depending on the code, you would need more data as well. For example, for the moving average example, you will retrieve the last 10 days data as well.



Hope this helps.



 

So would you run the code with in and out of sample splits every time its rebalanced ? Or just one big dataframe up to the current day?

Hello Emma,



Usually, once you have optimised the parameters, you will only retrieve the data points which are required. For example, if your moving average parameter is 10 and you are running the strategy daily, you will only retrieve the data for the last 10 days, calculate the moving average, check the entry condition and trade accordingly.



It is only when you judge if the strategy performance is declining, you would go back and try to optimise the parameters again.



I hope this helps.