Hi all,
I've started reviewing "Short Selling in Trading" course. The signals are based on "swings" which are computed using all the data. Then, a strategy is based on these signals ignoring the lookahead information. For me, it seems that there is a lookahead bias in the evaluation process of this strategy. I am not sure if I am missing something on this.
Any clarification is appreciated.
Thank you.
Ciao Francesco,
This a valid concern. Thank You very miuch. I shoud have elaborated more on this and i am sorry for the confusion.
We use the function argrelextrema to calculate swings. We use a window of 20 periods (feel free to experiment with shorter windows). argrelextrema does not reset automatically when there is a higher high or lower low. This process has to be done manually as in the function swings. As a result, when testing regime_fc on its own, we use a lag of a similar duration. We could use a shorter window of half to a third of that duration.
On the other hand, moving averages and breakouts are delayed by one day. Signal happens on bar [n]. Trade happens on [n+1], standard stuff.
When using regime_fc in combination with moving average crossover, 2 conditions have to be met:
- regime reversal, function of swings
- moving average crossover
Swings are usually registered 1 to 2 periods after they occur. That is just how argrelextrema works. Unless extremely short durations are used, swing discovery will systematically precede moving average crossover.
The necessary lag is therefore a function of the slowest moving component, i-e moving average.
Moving average will confirm regime reversal. Trade can take place 1 day after moving average crossover signal.
I understand it can be a bit confusing at first glance, and may be interpreted as "peeking bias". I invite to verify for yourself to be absolutely sure. One way to do this is to run a For Loop over a small data sample where a regime occurs. Swing discovery happens first, which leads to regime reversal, which is then confirmed by moving average crossover.
Now, if you want to use regime on its own, you may want to play with the lag window. 20 periods lag is roughly a month. People rarely wait for an entire month for confirmation. Somewhere between 7 to 10 days should be fine, but please bear in mind that the number of false positives will rise as you shorten the lag. This will adversely affect your gain expectancy.
I hope it clarified the matter. Once again, that was a very valid concern. I should have elaborated more in the course. Thank You very much for picking this up!
Thanks Laurent,
Could you please provide a python code for the strategy without using any lookahead?
Thanks again,
Laurent,
Still have a question on lookahead bias and some parts of your comment.
“Unless extremely short durations are used, swing discovery will systematically precede moving average crossover. The necessary lag is therefore a function of the slowest moving component, i.e. moving average. Moving average will confirm regime reversal. Trade can take place 1 day after moving average crossover signal”
For the backtesting 20 day window used for argrelextrema function. Does it mean that on average moving average crossover signal is lagging swing discovery by 20 days (or as you mentioned 7-10 days should be ok)? And this effectively prevents from lookahead bias?
“Swings are usually registered 1 to 2 periods after they occur. That is just how argrelextrema works.”
Are you referring to argrelextrema functionality specific if latest point is highest or lowest it is not detected as local extremum till some other lower / higher points are added at the end? I.e.
argrelextrema(np.array([1, 2, 3]), np.greater, order=2) will return (array(, dtype=int64),) whereas argrelextrema(np.array([1, 2, 3, 2]), np.greater, order=2) will return (array([2], dtype=int64),).
Thank you,
Alex
I believe, since the argelextrema uses 20 data points forward and backward, a simple shift of the dataframe should help to avoid any kind of look ahead bias.
high_low['swing_high'] = high_low['swing_high'].shift(argrel_window)
high_low['swing_low'] = high_low['swing_low'].shift(argrel_window)
But as Laurent says, moving average cross over is a lagging indicator so some future data points are removed due to it. But the challenge is to figure out the exact days it will lag. I feel this is a very good discussion which we are having on this forum.
Order sets the number of values surrounding the peak. In the first example order = 2 identifies the second peak. Order = 3 requires 3 values both left and right of the peak to be lower.
- Moving averages: since swings are identified one bar after the peak, moving averages are de facto slower. They act as a confirmation filter. Personally, I am not an advocate of moving averages, but I recognize their usefulness in this case
- Time: the longer the lag after a swing has been discovered, the more likely it will not be invalidated. Currently, order = 20 is robust enough for historical swings. It is however lagging too much for real life trading. 20 days is a month, an eternity for most traders. In practice, half that span would be enough to confirm the validity of a swing.
- Hybrid: Time + Distance: one way to reduce noise and shorten the lag is to incorporate distance. The longer price will have traveled, the more likely a swing will reflect an exhaustive move. Example: if price has traveled 2.5 stdev away from the previous swing, a swing is likely to indicate a reversal. Conversely, blips occuring at 0.5-2 stdev are more likely to be just noise.
This can be incorporated directly in the alternation loop with 1 simple line of code:
<div><span style="color:#cccccc"># removes noisy swings: distance test</span></div> <div><span style="color:#cccccc"> hilo.loc[(hilo[s_hilo]*hilo[s_hilo].shift(1)<0)& # hi/lo succession</span></div> <div><span style="color:#cccccc"> (np.abs(hilo[s_hilo]+hilo[s_hilo].shift(1)).div(hilo['std'].values) < 2.5),s_hilo] = np.nan</span></div> <div><span style="color:#454545">Note that hilo['std’] has not been instantiated in the function</span></div> <div><span style="color:#454545">This hybrid method time + distance is probably closer to the reality of trading. The farther price has traveled, the more valid the signals. Instead of waiting 10 days, you may elect to shorten the waiting period to 5 bars or less. Again, please do not take my word for it. Test everything</span></div> </li>
When i first read this, i think this is our trade problem, in the real time, we can not trade like historical data, you know, signal always shift and historical is settaled."This happens only at the last swing. For historical swings, there is enough data on both sides to identify meaningful swings without fail. This function is a compromise between historical and real time swings. It will work well for historical swings, but need some adjustments for real time trading."
However, after read, Laurent Bernut give us a better way which high probability to comfirm the sigal, just $199 take too much Knowledge from Laurent Bernut.
And this mean our backtest is the best result we get. The reality has a discount.
I'll take this for apply to 1min bar for a test.
Thanks. Laurent Bernut
I've got this issue, stop loss change some time, when i take this to calc positon size, it's be trouble. ask some help here.
I take 1m to test. here some paramaters:
(I've make swing first, and then MA cross)
signal_lag = 30
st = 90
mt = 120
argrel_window=60
t_dev=180
threshold=2
result:
close floor cross signal sl90120 eqty_risk time 2020-05-25 08:51:00 9512.25 1.0 -1.0 0.0 9482.25 3.160750 2020-05-25 08:52:00 9512.75 1.0 -1.0 0.0 9482.25 3.108934 2020-05-25 08:53:00 9513.50 1.0 -1.0 0.0 9482.25 3.034320 2020-05-25 08:54:00 9516.25 1.0 -1.0 0.0 9482.25 2.788897 2020-05-25 08:55:00 9517.75 1.0 -1.0 0.0 9482.25 2.671056 2020-05-25 08:56:00 9517.00 1.0 -1.0 0.0 9482.25 2.728705 2020-05-25 08:57:00 9517.75 1.0 -1.0 0.0 9482.25 2.671056 2020-05-25 08:58:00 9516.75 1.0 -1.0 0.0 9482.25 2.748478 2020-05-25 08:59:00 9516.25 1.0 -1.0 0.0 9482.25 2.788897 2020-05-25 09:00:00 9516.00 1.0 1.0 0.0 9520.50 -21.156667
close floor cross signal sl90120 eqty_risk time 2020-05-25 08:52:00 9512.75 1.0 -1.0 0.0 9517.5 -20.036842 2020-05-25 08:53:00 9513.50 1.0 -1.0 0.0 9517.5 -23.793750 2020-05-25 08:54:00 9516.25 1.0 -1.0 0.0 9517.5 -76.140000 2020-05-25 08:55:00 9517.75 1.0 -1.0 0.0 9517.5 380.700000 2020-05-25 08:56:00 9517.00 1.0 -1.0 0.0 9517.5 -190.350000 2020-05-25 08:57:00 9517.75 1.0 -1.0 0.0 9517.5 380.700000 2020-05-25 08:58:00 9516.75 1.0 -1.0 0.0 9517.5 -126.900000 2020-05-25 08:59:00 9516.25 1.0 -1.0 0.0 9517.5 -76.140000 2020-05-25 09:00:00 9516.00 1.0 1.0 0.0 9517.5 -63.450000 2020-05-25 09:01:00 9520.50 1.0 1.0 0.0 9517.5 31.725000
20200525_143630 -> calc... -------------------- close floor cross signal sl90120 eqty_risk time 2020-05-25 08:53:00 9513.50 1.0 -1.0 0.0 9482.25 3.034320 2020-05-25 08:54:00 9516.25 1.0 -1.0 0.0 9482.25 2.788897 2020-05-25 08:55:00 9517.75 1.0 -1.0 0.0 9482.25 2.671056 2020-05-25 08:56:00 9517.00 1.0 -1.0 0.0 9482.25 2.728705 2020-05-25 08:57:00 9517.75 1.0 -1.0 0.0 9482.25 2.671056 2020-05-25 08:58:00 9516.75 1.0 -1.0 0.0 9482.25 2.748478 2020-05-25 08:59:00 9516.25 1.0 -1.0 0.0 9482.25 2.788897 2020-05-25 09:00:00 9516.00 1.0 1.0 1.0 9482.25 2.809556 2020-05-25 09:01:00 9520.50 1.0 1.0 1.0 9482.25 2.479020 2020-05-25 09:02:00 9518.25 1.0 1.0 1.0 9482.25 2.633958
-
Adaptive range: When price rebounds off a low and prints a retest high, it may trade sideways before finally crossing that point. Adaptive range narrows the range for the test to either the first or later retest. You may find this feature useful in intraday trading where some orders have market impact.
-
Distance test: Retest has no statistical validity per se. Retests happen in narrow ranges. When price has traveled some distance however, retest may indicate trend exhaustion. Distance test is a measure of sensitivity. It is all a trade-off between distance and hit rate. Try with 2 or 3 std
I use a variation of swings_fp in my own trading. It is statistically accurate enough. For example, it timed the March 2020 low on March 25th. Bear in mind however that this is not an exact science. Sometimes the market pushes through.
I hope you will find the lag problem solved
Kind regards,
Laurent Bernut
Thanks, Laurent for adding these two functions.
The code and explanation for these two new functions and implementation can be accessed from here.
Hi Ishan Shah:
Thanks upload so efficient.
I've download the file. I'm going to take a deep learn from it.
Hi Laurent:
Thanks very much. It's awesome. Today i break down the code to solve the real time tradeing problem(print out all thing to compare). The reply just on time.
I've found one is argrelextrema can not take last value to calc, simple fix it:
argrelextrema( mode='wrap')
Thanks a lot.
Woody
Hi Laurent Bernut:
I've done the concept and code part, but I think there is distance to reality, for this can i get your email please? and also i change some code want to have a discuss. just send me something i will reply. aprilsnowyou@gmail.com
Thanks
Woody
Hello Laurent.
Unfortunately, both of your Legless Swing detection functions, swings_argrelan
and swing_fp
, are using argrelextrema
and find_peaks
. These functions introduce look-ahead bias, which causes them to produce significantly different results on real-time data compared to historical data. Do you have any thoughts on how to address this issue?
Please review and provide a fix.
Kind Regards, Nikolay
Hi Nikolay,
We have forwarded your query to the author of the course and will keep you updated on this
Hello NiKolay,
Thank You very much for your question. The latest version of the swing detection does not use find_peaks or argrelextrema. It is much faster and more accurate.
It calculates fractals at all levels. It seamlessly works across time frames. It can find the 1 minute bar that triggered the bear market avalanche on 1 dar bar
I have sent a Jupyter notebook to Quantra with the latest version of the floor ceiling. This will take care of the problem immediately.
Besides, I will revise the code in the course. Thank You very much for pointing the issue BTW. I will also work on an update to the course with position sizing libraries and several changes.
Once again,
Thank you very much, Laurent, for your prompt response! It would be great to see the updates.
Dear Quantra, could we please get access to the updated notebooks mentioned by Laurent?
Kind Reagrds, Nikolay
Hey Nikolay!
The notebook has been shared with you over email
Hi Rushda,
Thank you!
Regards, Nikolay
Hi Laurent,
It seems like the fractal-based calculations might still have a look-ahead bias. For example, consider the following code:
def fractal(px, lvl): max_lvl = np.minimum(2,lvl) fractal = px[(px<= px.shift(-1)) & (px < px.shift(+1)) & (px<= px.shift(- max_lvl)) & (px < px.shift(+ max_lvl))] return fractal
Am I missing something?
Regards,
Nikolay