Johansen Test p and k values

Course Name: Mean Reversion Strategies In Python, Section No: 9, Unit No: 14, Unit type: Notebook



In the course, p and k values of 0 and 1 have been used for the Johansen test with finding cointegration between triplets. I can see from the code that k refers to number of lagged differences. What does this mean? 

The number of lagged differences is the number of prior time series values you want to consider to find cointegration. For example, if you say 1, that means that you will compare the price at the current timestamp  (t) with the price at the previous timestamp (t-1) to find if the price series are cointegrated. And if it is 2, then you will compare the price at the current timestamp (t) with the price at previous timestmap (t-1) and the price at previous to previous timestamp (t-2).



How to decide the number of lagged terms to include?

As a general rule of thumb in financial time series, the higher the value the higher the chances of overfitting. So, you should keep the difference to a lower value such as 1.



I hope this helps.



Thanks