Walk forward optimization with LSTM - Section 14

Hi,



can you please explain how to store the values obtained for "pnl" in a DataFrame, while matching the index of such pnl_df with the correct dates in the index of the "tech" dataframe used as input in the walk_forward function?

 

# Set the seed to reproduce the results
set_seed(13)

# Reset the index
tech.index = range(0, len(tech))

features = tech.pct_change()

# Store the results in pnl 
pnl = walk_forward(tech, 30, 30, 180, features.fillna(0))



# Plot the returns
plt.figure(figsize=[15, 7])
plot_returns_optimised(pnl, tech, c='b', aclass='', plot_title= 'Strategy Vs Benchmark Returns')

Since we re-index "tech" before running the walk_forward function, we loose the valuable information about the dates on the x-axis when running the plot_returns_optimised. It would be great to find a way to re-match the index of "pnl" with the original index of "tech", so to be able to compare the results with other strategies (e.g., MVO walf-roward). 

Thanks!

Luca  

Just to clarify my question above, the reason why I ask this, it's because when running the function on different datasets, I noticed that depending on the step and n_window chosen, the length of pnl is different than the length of the input df, hence I can no longer reconcile to which dates the returns belong to. This should be attributable to the way the loop is built (i.e.,  while split+n_window <= df.index[-1]:). In fact, I tried running both the walk-forward MVO and the walk-forward LSTM, and the length of the two output is different. In the first case, it matches the length of "tech", while in the latter it doesn't, and I can't find a way to reconcile the LSTM WFO dates of the strategy returns. 



Thanks for your help!



Luca

Hi Luca,



Thank you for pointing this out. We will update the functions (walk_forward and plot_returns_optimised) so that the P&L and returns will have a datetime index. We will update you once the enhancement is implemented.



Thanks.

Hi Varun,



I spot another issue in your code for the LSTM walk-forward optimization notebook.



I have run locally your code with the exact same data you provided (us_tech_dec_31_2009_dec_31_2020.csv), because I could not understand why I got so many observations in the variable "pnl".



Given a lookback of 30 days, and the first in-sample period used for the optimization equal to 180 days, I would have expected to have the "pnl" length equal to the "tech" length minus the first two n_window periods (since the first 180 days can't be used by the model because it needs minimum 30 days of lookback, and the second 180 days are used for the first in-sample otpimization).



In fact, the "pnl" index starts at position 360, if you print "pnl". Therefore, I exported in Excel the "pnl" series to better analyse it, and I realised that the index has duplicated values every time there is a move in the n_window (e.g., at rows 540, 720, etc…). 



Can you please explain why this is the case? I suspect it might be because of the following lines of code in the walk_forward function:

 

  """Collect the OOS results"""
        oos = rets.loc[split:split+n_window].mul(asset_weights).sum(axis=1)
        all_rets.append(oos)

 Probably the code should be tweaked to avoid any overlap every time the time window shifts by 180 days (otherwise the last day of the in-sample window is the first day of the out-of-sample window). Something like this:

 

"""Collect the OOS results"""
        oos = rets.loc[split+1:split+1+n_window].mul(asset_weights).sum(axis=1)
        all_rets.append(oos)

  This way, we should avoid any overlap between the time window used in the optimization, and the consequent window used in the oos.



Again, as I mentioned in my previous comments, it would be ideal to have also a reconciliation in terms of dates with the index of "pnl" (please, let me know as soon as this is ready in the course).



Kind regards,



Luca 

Hello Luca,



Thank you for your follow-up. We will get back on this.

 

Hi Rekhit, any news about this?

Hello Luca,



The functions 'walk_forward' and 'plot_returns_optimized' have been updated. The lookahead bias has been taken care of. Now, the 'pnl' dataframe has dates as index, and now you can plot the P&L along with the dates. Please have a look at the notebook in section 14 (link) and let us know if you have any queries on this.