Hedge Ratio in Pair Trading

In pair trading, the hedge ratio or the beta is obtained from the regression of two prices. Can you explain why the intercept of the regression is assumed 0 when calculating the beta? should the intercept be always assumed 0 for all the regressions?

The assumption of an intercept of zero in pair trading regression holds significance for a couple of reasons.



Firstly, it aids in mitigating overfitting. When estimating both the slope and intercept in a regression, the model becomes more complex, increasing the risk of overfitting the data. By setting the intercept to zero, you're simplifying the model (it has to estimate one variable only), which can help prevent it from fitting noise in the data and improve its generalizability.



Secondly, the zero intercept assumption aligns with logical reasoning, especially in financial markets. Let's consider an example involving GLD (gold ETF) and GDX (gold miners ETF). If the price of gold were to plummet to zero, it stands to reason that the value of gold miners would also dwindle to negligible levels, as their business revolves around gold extraction. Thus, assuming a zero intercept reflects the idea that the value of the dependent variable (e.g., GDX) should also be zero when the independent variable (e.g., GLD) is zero.



Therefore, we prefer to set the intercept to zero in Pairs Trading. Nonetheless, it's important to note that the choice of whether to include an intercept in your regression model ultimately depends on the context and characteristics of the assets you're working with. 



I hope this helps!

should the intercept be assumed always 0?

I would say it is preferred and recommended to set intercept to 0 as it reduces overfitting.



However, in special and rare cases, you can keep the intercept non-zero. That is, a non-zero spread at the point where one of the assets has a price of zero.



This can be useful in cases where the two assets do not have a theoretical relationship, or if there are other factors at play that affect the spread even when one asset has a zero price. For example, creating a spread of gold miners ETF (GDX) and US Oil ETF (USO).



Thanks

 Hi, there is no assumption that intecept should be zero. Intecept can be seen as errors or some hidden information in the system which can't be explained by linear regression. You should keep the intecept when you are doing the linear regression. 

Agreed. Economically, the intercept should not be assumed zero, even for GLD (gold eft) and GDX ( gold miners) or USO and oil company etf. The main problem is that it is very volatile over time and may cause overfitting issue. 

Overfit,for example, if you are doing factor analysis, you use two value factors such as pe ratio factor and pb ratio factor in multi linear regression. However, they have overlapped effect to describe the model. You can check their vif(to judge whether the model is overfitted) the vif of two value factors will both be bigger than 10 which means your model is overfit due to overlapped factors(also check aic). You should remove one of them because they can be seen have same function in describing the model,just keep one. In pairstrading, the intecept will help you understand hidden information that cannot be explained by the model. It's not overfit. If you remove intecept in your model, it will cause lackfit because you ignore some hidden information in the movement of stock pairs residuals. This will lead to bigger mse than you take intecept into consideration.

VIF is not for overfitting. VIF is to test Multicollinearity and nothing to do with overfitting. There is no need to test multicollineary in a lineal regression of 1 independent variable. I don't see Quantra is giving us a clear answer either

Hello Bill and Arthur,



Thank you both for your valuable contributions to the quantra community.



Regarding the question of whether to include the intercept (constant term) when calculating the hedge ratio in pairs trading, it's important to consider the underlying rationale behind the choice. 



When an intercept is present, the spread is represented as y - mx - c. In contrast, in the absence of an intercept, the spread simplifies to y - mx.



In the latter equation, our focus is solely on estimating the slope (m) of the line, which reduces the risk of overfitting the data and also reduces model complexity. Considering this rationale, our course has opted to set the intercept (c) to zero. 



Based on these logic, we recommend setting the intercept to zero when calculating the hedge ratio in pairs trading. 



Thank you both for your insightful contributions to the community discussion.



Note: It's important to recognize that not all regression scenarios/applications warrant setting the intercept to 0. 



Best,

Ishan

Ishan, is it possible to find a cointegrated pair of assets that produces a positive P&L with assumption of 0 intercept  when calculating the hedge ratio?

Hello Bill,



It is possible to find a cointegrated pair of assets, that produce a positive P&L. However, you should always thoroughly backtest your strategy and analyse its performance. Further, do plan on paper trading a strategy to make sure that the strategy is performing well and only then should you go ahead and live trade.

 

It's also important to note that even in cointegrated pairs, there can be periods of divergence where the spread widens, potentially leading to temporary losses before the spread eventually converges again. Additionally, trading costs, transaction fees, and market volatility can impact the effectiveness of a cointegration-based trading strategy.



Hope this helps.