Swing Trading Capstone Project Asset Data is not 1 minute data

Srinivas_Mv_Srinivas_Mv_6K067 · June 19, 2021, 3:20pm

Hi,

In the desctiption for the project following is given:

"The minute frequency data for the ten tickers is stored in the capstone_data_2020_2021.bz2 pickle file."

However, when I write out the dataframe to a csv file (after reading the pickle file), it seems to me that the data is actually 15-minute data. I have pasted the first few lines of the csv file for 'BAC' symbol. What am I missing?

PS: BTW, the data in model solution also has the same issue…

Regards,

Srinivas

,Open,High,Low,Close,Volume

2019-01-02 09:30:00,24.07,24.09,24.01,24.06,1078894.0

2019-01-02 09:45:00,24.06,24.64,24.04,24.54,4169202.0

2019-01-02 10:00:00,24.53,24.58,24.41,24.53,2175107.0

2019-01-02 10:15:00,24.52,24.67,24.48,24.64,2953343.0

2019-01-02 10:30:00,24.64,24.83,24.63,24.76,3012152.0

2019-01-02 10:45:00,24.76,24.91,24.73,24.91,2656067.0

2019-01-02 11:00:00,24.9,25.0,24.88,24.93,3306992.0

2019-01-02 11:15:00,24.93,25.06,24.88,24.99,3081897.0

2019-01-02 11:30:00,24.99,25.02,24.9,24.93,2357494.0

2019-01-02 11:45:00,24.93,25.0,24.88,24.98,1855745.0

2019-01-02 12:00:00,24.97,24.98,24.84,24.84,1507738.0

2019-01-02 12:15:00,24.84,24.89,24.82,24.85,1797244.0

2019-01-02 12:30:00,24.85,24.86,24.78,24.83,1147466.0

2019-01-02 12:45:00,24.83,24.94,24.79,24.94,1442801.0

2019-01-02 13:00:00,24.93,24.97,24.9,24.94,946553.0

2019-01-02 13:15:00,24.94,25.03,24.94,25.01,1397477.0

2019-01-02 13:30:00,24.99,25.08,24.96,25.08,1336732.0

2019-01-02 13:45:00,25.06,25.09,25.02,25.06,1777980.0

2019-01-02 14:00:00,25.06,25.19,25.05,25.09,1918104.0

2019-01-02 14:15:00,25.1,25.11,24.63,24.99,1884455.0

2019-01-02 14:30:00,24.98,25.04,24.98,25.02,1373447.0

2019-01-02 14:45:00,25.02,25.08,24.97,25.0,1141071.0

2019-01-02 15:00:00,25.0,25.04,24.93,24.94,1265584.0

2019-01-02 15:15:00,24.95,24.98,24.63,24.9,1695508.0

2019-01-02 15:30:00,24.91,24.94,24.82,24.84,2276863.0

2019-01-02 15:45:00,24.84,24.95,24.82,24.91,2047844.0

2019-01-02 16:00:00,24.92,25.03,24.87,24.96,5664076.0

2019-01-04 09:30:00,25.09,25.23,25.02,25.17,1279558.0

2019-01-04 09:45:00,25.17,25.28,25.1,25.26,5750044.0

2019-01-04 10:00:00,25.26,25.31,25.21,25.25,5054882.0

2019-01-04 10:15:00,25.24,25.27,25.09,25.13,2649199.0

2019-01-04 10:30:00,25.12,25.34,25.05,25.29,8108064.0

2019-01-04 10:45:00,25.28,25.31,25.14,25.15,6874410.0

2019-01-04 11:00:00,25.16,25.23,25.07,25.24,3961354.0

2019-01-04 11:15:00,25.24,25.35,24.56,25.29,3396469.0

Gaurav_Singh_5JwXj · June 21, 2021, 6:35pm

Hi Srinivas,

Thanks for pointing this out. We have updated the capstone project zip files. The notebook will be changed very soon.

Thanks!

Srinivas_Mv_Srinivas_Mv_6K067 · June 22, 2021, 4:21am

Hi Gaurav,

Thanks for addressing this. Hope you will let me know once it is updated so that I can download it again.

Regards,

Srinivas

Gaurav_Singh_5JwXj · June 22, 2021, 4:45am

Hi Srinivas,

Thank you for understanding. The data has been uploaded and the NB also updated accordingly.

Let me know if you need any help!

Regards,

Gaurav

Srinivas_Mv_Srinivas_Mv_6K067 · July 8, 2021, 6:28am

Hi Gaurav,

Got back to this after a short break and downloaded the updated notebook again. The data now looks fine. However, I was wondering if the code in the solution template notebook is accurate. I am seeing issues in the 'Data Sanity' section in the implementation logic. Is this something intentional for us to find it :) ?

Regards,

Srinivas

Gaurav_Singh_5JwXj · July 8, 2021, 7:36am

Hi Srinivas,

Thanks for the feedback, we have corrected a possible bug in that function. Please let us know if you have any more feedback/queries!

Regards,

Gaurav

Srinivas_Mv_Srinivas_Mv_6K067 · July 8, 2021, 8:06am

Hi Gaurav,

Thanks for the quick response. When can I download it?

Regards,

Srinivas

Srinivas_Mv_Srinivas_Mv_6K067 · July 8, 2021, 8:22am

Hi Gaurav,

I downloaded it anyway and got the updates. However, I think there is still a problem in the logic. I feel in the function, the following two lines:

to_delete = list(pd.to_datetime(datapoints_day.index))

return price_data[~(price_data.index.isin(to_delete))]

should be modified to:

to_delete = list(pd.to_datetime(datapoints_day.index).strftime('%Y-%m-%d'))

return price_data[~(pd.to_datetime(price_data.index).strftime('%Y-%m-%d').isin(to_delete))]

Please confirm if this is correct and release a new version for the same if you agree.

Regards,

Srinivas

Gaurav_Singh_5JwXj · July 8, 2021, 9:17am

Hi Srinivas,

There is no binding reason to use the strftime function. The current code can handle the index comparison as it can be seen in the example below:

The dummy False value indicate that those are being dropped in the sample output.

Hope this helps!

Thanks,

Gaurav

Srinivas_Mv_Srinivas_Mv_6K067 · July 8, 2021, 10:04am

Hi Gaurav,

I think there is some disconnect. I modify the code to something like below:

for asset, asset_data in resampled_asset_data.items():

print(asset)

print(asset_data.head(3))

asset_data = asset_data[2:]

print(asset_data.head(3))

resampled_asset_data[asset] = sanity_check(asset_data, asset)

print(resampled_asset_data[asset].head(3))

Here, I am dropping first 2 rows for each ticker. That is, for the first day, only 5 hourly time stamps would be present. So, if the logic in the function works correctly, it should drop the first day, right? But as you can see in the output below, the resampled_asset_data has the first day also (though it does not have the requied number of timestamps which is 7)

BAC
                      Open   High    Low  Close      Volume
2018-01-02 10:00:00  29.74  29.80  29.61  29.66  10626570.0
2018-01-02 11:00:00  29.66  29.75  29.64  29.73   9047821.0
2018-01-02 12:00:00  29.73  29.77  29.63  29.65   7166519.0
                      Open   High    Low  Close     Volume
2018-01-02 12:00:00  29.73  29.77  29.63  29.65  7166519.0
2018-01-02 13:00:00  29.67  29.74  29.65  29.70  3924344.0
2018-01-02 14:00:00  29.71  29.94  29.69  29.73  3333392.0
Removing 1 data points.
                      Open   High    Low  Close     Volume
2018-01-02 12:00:00  29.73  29.77  29.63  29.65  7166519.0
2018-01-02 13:00:00  29.67  29.74  29.65  29.70  3924344.0
2018-01-02 14:00:00  29.71  29.94  29.69  29.73  3333392.0

Adding the strftime() in the function as per my previous post overcomes this issue. Do you agree?

Regards,

Srinivas

Gaurav_Singh_5JwXj · July 8, 2021, 7:36pm

Hi Srinivas,

Your reasoning is absolutely correct and after checking the code, the same has been updated in the capstone solution. Thanks for your valuable feedback!

Regards,

Gaurav Singh

Srinivas_Mv_Srinivas_Mv_6K067 · July 9, 2021, 4:59am

Thanks Gaurav.

One more issue in the Screener section:

Apply the filtering criteria for each asset

for asset in tickers:

# Fetch the current asset data

asset_data = multi_asset_data[asset][:split]

Shouldn't you be using the data after sanity check here. The above statement is using the 1-minute raw data.

BTW, once you change this to use the data after sanity check, the screener filters off everything with the current thresholds. That is, therfe are no tickers left for the next stage processing.

Regards,

Srinivas

Srinivas_Mv_Srinivas_Mv_6K067 · July 9, 2021, 2:46pm

Hi Gaurav,

The last step of Performance analysis is giving errors. Can you please help.

Regards,

Srinvivas

----

Start date2019-03-22

End date2021-02-11

Total months33

BacktestAnnual return2.2%

Cumulative returns6.2%

Annual volatility11.9%

Sharpe ratio0.24

Calmar ratio0.09

Stability0.10

Max drawdown-24.4%

Omega ratio1.05

Sortino ratio0.35

Skew0.07

Kurtosis3.72

Tail ratio0.98

Daily value at risk-1.5%

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-34-299af769327e> in <module>
      3 """
      4 # Pass the daily return to pyfolio
----> 5 pf.create_simple_tear_sheet(
      6     trade_returns['Portfolio_Returns'].resample('1D').sum())
      7 

~/anaconda3/lib/python3.8/site-packages/pyfolio/plotting.py in call_w_context(*args, **kwargs)
     50         if set_context:
     51             with plotting_context(), axes_style():
---> 52                 return func(*args, **kwargs)
     53         else:
     54             return func(*args, **kwargs)

~/anaconda3/lib/python3.8/site-packages/pyfolio/tears.py in create_simple_tear_sheet(returns, positions, transactions, benchmark_rets, slippage, estimate_intraday, live_start_date, turnover_denom, header_rows)
    378     i += 1
    379 
--> 380     plotting.plot_rolling_returns(returns,
    381                                   factor_returns=benchmark_rets,
    382                                   live_start_date=live_start_date,

~/anaconda3/lib/python3.8/site-packages/pyfolio/plotting.py in plot_rolling_returns(returns, factor_returns, live_start_date, logy, cone_std, legend_loc, volatility_match, cone_function, ax, **kwargs)
    805         oos_cum_returns = pd.Series([])
    806 
--> 807     is_cum_returns.plot(lw=3, color='forestgreen', alpha=0.6,
    808                         label='Backtest', ax=ax, **kwargs)
    809 

~/anaconda3/lib/python3.8/site-packages/pandas/plotting/_core.py in __call__(self, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)
   2732                  yerr=None, xerr=None,
   2733                  label=None, secondary_y=False, **kwds):
-> 2734         return plot_series(self._data, kind=kind, ax=ax, figsize=figsize,
   2735                            use_index=use_index, title=title, grid=grid,
   2736                            legend=legend, style=style, logx=logx, logy=logy,

~/anaconda3/lib/python3.8/site-packages/pandas/plotting/_core.py in plot_series(data, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)
   1992         ax = _gca()
   1993         ax = MPLPlot._get_ax_layer(ax)
-> 1994     return _plot(data, kind=kind, ax=ax,
   1995                  figsize=figsize, use_index=use_index, title=title,
   1996                  grid=grid, legend=legend,

~/anaconda3/lib/python3.8/site-packages/pandas/plotting/_core.py in _plot(data, x, y, subplots, ax, kind, **kwds)
   1802         plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
   1803 
-> 1804     plot_obj.generate()
   1805     plot_obj.draw()
   1806     return plot_obj.result

~/anaconda3/lib/python3.8/site-packages/pandas/plotting/_core.py in generate(self)
    256     def generate(self):
    257         self._args_adjust()
--> 258         self._compute_plot_data()
    259         self._setup_subplots()
    260         self._make_plot()

~/anaconda3/lib/python3.8/site-packages/pandas/plotting/_core.py in _compute_plot_data(self)
    358         # with ``dtype == object``
    359         data = data._convert(datetime=True, timedelta=True)
--> 360         numeric_data = data.select_dtypes(include=[np.number,
    361                                                    "datetime",
    362                                                    "datetimetz",

~/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py in select_dtypes(s

Gaurav_Singh_5JwXj · July 12, 2021, 8:25pm

Hi Srinivas,

The resampled data was being used to place the trades, whereas the minute data was being used for filtering which stocks satisfy the screener criteria. Having said that, thanks for pointing out the error in the code as the solution assumed the hourly candles for screening but the minute data was being used. The same has been corrected on the portal.

As for the second error, the pyfolio one, I believe that is due to an incorrect package, or a changed NB code file on your system. You can refer to this blog for instructions to setup the Python environment in your local system.

Hope this helps!

Thanks,

Gaurav

Srinivas_Mv_Srinivas_Mv_6K067 · July 13, 2021, 8:12am

Hi Gaurav,

Thanks for the updated NB. I have downloaded it.

I am getting the pyfolio error on this one too. I see that the python version now recommended is 3.9.5. I upgraded to this version. However, the same error is seen even now. I reviewed the blog you mentioned, but did not find any major issues. Following are some of the package versions that I am using…

print('Versions:')

print(f'Numpy: {np.version}')

print(f'Pandas: {pd.version}')

print(f'Talib: {ta.version}')

print(f'PyFolio: {pf.version}')

Versions:
Numpy: 1.20.0
Pandas: 0.23.4
Talib: 0.4.20
PyFolio: 0.9.2

What am I missing? Below is the full error log (new) ....

Regards,
Srinivas

-------------

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-f2f833484768> in <module>
      1 # Pass the daily return to pyfolio
----> 2 pf.create_simple_tear_sheet(
      3     trade_returns['Portfolio_Returns'].resample('1D').sum())
~/anaconda3/lib/python3.8/site-packages/pyfolio/plotting.py in call_w_context(*args, **kwargs)

     50         if set_context:

     51             with plotting_context(), axes_style():

—> 52                 return func(*args, **kwargs)

     53         else:

     54             return func(*args, **kwargs)
~/anaconda3/lib/python3.8/site-packages/pyfolio/tears.py in create_simple_tear_sheet(returns, positions, transactions, benchmark_rets, slippage, estimate_intraday, live_start_date, turnover_denom, header_rows)

    378     i += 1

    379

–> 380     plotting.plot_rolling_returns(returns,

    381                                   factor_returns=benchmark_rets,

    382                                   live_start_date=live_start_date,
~/anaconda3/lib/python3.8/site-packages/pyfolio/plotting.py in plot_rolling_returns(returns, factor_returns, live_start_date, logy, cone_std, legend_loc, volatility_match, cone_function, ax, **kwargs)

    805         oos_cum_returns = pd.Series()

    806

–> 807     is_cum_returns.plot(lw=3, color='forestgreen', alpha=0.6,

    808                         label='Backtest', ax=ax, **kwargs)

    809
~/anaconda3/lib/python3.8/site-packages/pandas/plotting/_core.py in call(self, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)

   2732                  yerr=None, xerr=None,

   2733                  label=None, secondary_y=False, **kwds):

-> 2734         return plot_series(self._data, kind=kind, ax=ax, figsize=figsize,

   2735                            use_index=use_index, title=title, grid=grid,

   2736                            legend=legend, style=style, logx=logx, logy=logy,
~/anaconda3/lib/python3.8/site-packages/pandas/plotting/_core.py in plot_series(data, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)

   1992         ax = _gca()

   1993         ax = MPLPlot._get_ax_layer(ax)

-> 1994     return _plot(data, kind=kind, ax=ax,

   1995                  figsize=figsize, use_index=use_index, title=title,

   1996                  grid=grid, legend=legend,
~/anaconda3/lib/python3.8/site-packages/pandas/plotting/_core.py in _plot(data, x, y, subplots, ax, kind, **kwds)

   1802         plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)

   1803

-> 1804     plot_obj.generate()

   1805     plot_obj.draw()

   1806     return plot_obj.result
~/anaconda3/lib/python3.8/site-packages/pandas/plotting/_core.py in generate(self)

    256     def generate(self):

    257         self._args_adjust()

–> 258         self._compute_plot_data()

    259         self._setup_subplots()

    260         self._make_plot()
~/anaconda3/lib/python3.8/site-packages/pandas/plotting/_core.py in _compute_plot_data(self)

    358         # with dtype == object

    359         data = data._convert(datetime=True, timedelta=True)

–> 360         numeric_data = data.select_dtypes(include=[np.number,

    361                                                    "dat

Gaurav_Singh_5JwXj · July 13, 2021, 5:58pm

Hello Srinivas,

Your package version seems incorrect. The requirements file has pandas==1.2.4.

You can refer to this section of the blog. The requirement file is shared as a link in the same section (within the slides). Please set the environment as per the blog for the code to run properly.

Hope this helps!