Trading Alphas: Mining, Optimisation, and System Design Section 25, Unit 1

Intuitively most of us would

probably create two back tests,

one with in-sample

and one with out-of-sample data separately.

However this is in fact very inefficient.

We better generate all data first

and split the set afterwards then

we can apply our metric of choice

to the full data set.



Can you rephrase this? Im finding it hard to comprehend. It sounds like the same thing.

Hi Jane,



Here's a rephrased version of the explanation:

Instead of creating separate backtests with in-sample and out-of-sample data, which may seem intuitive, but is actually inefficient. A better approach is to generate all the data first and then split it into in-sample and out-of-sample sets. This way, we can apply our chosen metric to the full data set, which allows for a more efficient and comprehensive analysis.



Hope this resolves your doubts



Thanks,

Rushda Ansari

Do you have a peer reviewed reference or additional or supplementary reading on this " better approach" ?

It would seem that this is very unusual or not something Ive seen before please dont feel offened my my question. It could be that his is a very new or advanced way to do this and for this Im curious and thankful.

Hi Jane,



In this particular case, since we are working with daily trading signals, and we are calculating the results for the entire parameter space, the splitting of the sets after calculation is trivial.



However, the case would be different if we use path-dependent strategies and/or alphas that extend across multiple trading periods. The micro-alpha approach avoids this in order to achieve better robustness out of sample.



Thanks,

Rushda

By chance do you have a paper or peer reviewed article on this? This truly would be amazing if you did, once again thanks for going above and beyond in helping me along this journey.

Since this a simple data split, there wouldn't be any peer-reviewed articles on this. Such articles discuss new findings in the field that have relevance for academic discussions. A simple data split does not constitute such a new finding.

How you create alphas by doing a grid search. And then grouping these alphas backtest performances then using the least correlated parameters, Im trying to get an article on this. 



Look here to your blog:

https://blog.quantinsti.com/kalman-filter-techniques-statistical-arbitrage-china-futures-market-python/



Or here:

https://blog.quantinsti.com/statistical-arbitrage-pair-trading-brazil-stock-market-project-luiz-guedes/



You see in the second blog he uses a grid seach but I dont see him apply this way of generating alphas and thn backtesting these alphas. 




Loop for first lookback. This could take some time, using tqdm for progress bar

for i in tqdm(range(2, 50, 1)):



    # Loop for second lookback (needs to be larger than first, hence starting with i)

    for j in range(i, 60, 1):



        # Make sure lookbacks are different

        if not i == j:



            # Append lookbacks and signals to lists

            lkbks.append((i, j))

            sigs.append(ma(i, j))


Run backtest for the signals using the backtest function from above

bt = backtest(sigs)





This is very different form the many pair trading systems I see on your website.

 

Hi Jane,



The authors of the blog/projects and the course are not the same. Different authors can explain the same concept using different techniques, hence the variation.