OLS Regression between independent series and dependent series gives high R sqaure.
But Durbin-Watson is much lower than R sqaure. I read that in such a case the OLS regression is spurious.
I plan to use this regression equation to take a call on long or short position based on the Std deviation of residual.
Correctness of OLS is important but most of my OLS regression show Durbin-Watson much less than R squared
I have a plain vanilla strategy based on Engle Granger that does not consider Durbin-Watson
- Get hold of 2 time series in the same sector
- Verify both series are I(1).
- Apply OLS on the level series data. You will get a regression equation with coefficients estimated for the independent variables.
- Find residual and create a residual series.
- Apply ADF to determie if the residual is stationary. If it is stationary the criteria for the series to be cointegrating is met. The coefficient
estimates we got by applying OLS are valid. This equation can be used for deciding on the number of units to go for long and short if std deviation
reaches a certain threshold.
- Go for the next step of backtesting the strategy.
Hi Neel,
When conducting regression analysis, it is important to consider both the R-squared value and the Durbin-Watson statistic to assess the reliability of the regression results.
The R-squared value measures the proportion of variance in the dependent variable that is explained by the independent variables in the regression model. A high R-squared indicates a good fit of the model, suggesting that a large portion of the variability in the dependent variable is accounted for by the independent variables. However, a high R-squared alone does not guarantee the absence of other issues in the regression analysis.
The Durbin-Watson statistic, on the other hand, is a test for the presence of autocorrelation in the residuals of the regression model. Autocorrelation occurs when there is a correlation between the residuals at different time points, indicating that the assumption of independence of errors is violated. A low Durbin-Watson value indicates the presence of positive autocorrelation, while a high value suggests negative autocorrelation.
If the Durbin-Watson statistic is much lower than the R-squared value, it suggests that there is a high degree of positive autocorrelation in the residuals, which can lead to spurious regression results. Positive autocorrelation means that the residuals at one time point are positively correlated with the residuals at nearby time points. This violates the assumption of independent errors, which is a crucial assumption in ordinary least squares (OLS) regression.
In the presence of autocorrelation, the standard errors of the regression coefficients can be underestimated, leading to inflated t-statistics and potentially misleading p-values. This can result in false conclusions about the significance of the independent variables.
Hope this helps!
Thanks,
Akshay
Hi Akshay,
If there is high auto-correlation in residuals, then ADF test should not show the residuals as stationary. But I see that ADF test shows such series of residuals derived from OLS as stationary.
Residuals were calculated by subtracting mathematical value (derived from OLS regression equation) from observed value of dependent variable and
Surprisingly, I don't see any mention of Durbin-Watson in Dr Chan's book or in your literature that is talking about pairs trading.
Can you point me to any such reference.?
At the end of the day, I just want to know how to deal with Durbin-Watson so that my startegy is not negatively impacted. That's the suggestion I am looking for.
Do get back with your thoughts.
Appreciate your help.
Thanks,
Neel
Hi Neel,
If the ADF test shows the residuals derived from the OLS regression as stationary, despite high autocorrelation in the residuals, it may indicate that the ADF test is not capturing the specific form of autocorrelation present in your data adequately. Autocorrelation can take various forms, and the ADF test might not be sensitive to the particular type of autocorrelation exhibited by your residuals.
Regarding your question about references specifically mentioning Durbin-Watson in the context of pairs trading, it is true that the Durbin-Watson statistic may not be explicitly discussed in literature focused on pairs trading. The reason for this is that pairs trading primarily involves analyzing the relationship between two assets and identifying deviations from their historical relationship to generate trading signals. However, this does not negate the importance of considering autocorrelation in regression analysis, including pairs trading strategies.
To address the issue of autocorrelation and its impact on your strategy, here are a few suggestions:
- Robust Regression Techniques: Consider using robust regression techniques less sensitive to assumptions violations, such as heteroscedasticity and autocorrelation, like weighted least squares regression or regression using robust standard errors.
- Autocorrelation Modeling: Instead of relying solely on OLS regression, explore alternative models that explicitly account for autocorrelation, such as autoregressive integrated moving average (ARIMA) models or other time series techniques. These models can capture the autocorrelation structure more accurately and provide reliable estimates for your strategy.
- Consider Time Series Techniques: Pairs trading often involves analyzing the historical relationship between two assets over time. Time series analysis techniques, such as cointegration analysis, vector error correction models (VECM), or autoregressive conditional heteroscedasticity (ARCH/GARCH) models, may be more appropriate for capturing the dynamic dependencies and autocorrelation patterns in the data.
Thanks,
Akshay
Hi Akash,
" it may indicate that the ADF test is not capturing the specific form of autocorrelation present in your data adequately".
Are you suggesting that ADF test of python statsmodel package may not be rock solid when it comes to determining stationairty? I thought that ADF is doing the job perfectly. I am considering pvalue < 0.1 indicative of stationarity in residuals.
The imperfections I thought could be attributed to OLS since it is "estimating" a best fit. Can you please share your view on this specific point.
Could you suggest good tests and hetroscedasticity removal methods for Robust Regression Techniques?
Not clear on what series you are suggesting to do Autocorrelation Modeling? I am modelling relation between 2 stock price series for pairs trading.
I can understand that for a momentum strategy series transformation techniques like AR(1) on price series could help.
Thanks,
Neel
Hi Neel,
The ADF test implemented in the Python statsmodels package is generally considered reliable and widely used for testing the stationarity of a time series. It is a robust test to determine whether a time series is stationary or non-stationary.
When I mentioned that the ADF test might not capture the specific form of autocorrelation present in your data, I meant that in some cases, certain autocorrelation patterns might not be adequately captured by the test, leading to unexpected results. However, this is not to imply that the ADF test itself is unreliable or inherently flawed.
In the context of the ADF test on residuals derived from OLS regression, it is crucial to remember that the ADF test is used to check the stationarity of a time series, and the residuals in this context are essentially a time series derived from the OLS model. If the ADF test shows the residuals as stationary, they do not exhibit a unit root and are consistent with a stationary time series process.
However, as I mentioned earlier, it's essential to consider both the ADF test results and the Durbin-Watson statistic to evaluate the reliability of the regression analysis, especially when the residuals exhibit autocorrelation. The Durbin-Watson statistic specifically tests for serial correlation (autocorrelation) in the residuals. If the Durbin-Watson statistic is low (indicating positive autocorrelation), it implies that the residuals are not independent, which can affect the validity of the OLS estimates and subsequent inferences.
To summarise, the ADF test is generally reliable for assessing stationarity in time series data, including residuals from OLS regression. However, the presence of autocorrelation in residuals is a separate issue that should be addressed, as it can affect the reliability of the regression analysis.
Here are some robust tests and methods for dealing with heteroscedasticity in the context of robust regression:
- Breusch-Pagan Test: The Breusch-Pagan test is commonly used to detect heteroscedasticity in a regression model. It tests whether the variance of the residuals is dependent on the independent variables. If the p-value of the test is below a chosen significance level (e.g., 0.05), it indicates the presence of heteroscedasticity.
- White's Test: White's test is another popular test for heteroscedasticity. It examines whether the squared residuals are correlated with the independent variables. Like the Breusch-Pagan test, a low p-value suggests the presence of heteroscedasticity.
- Weighted Least Squares (WLS) Regression: This method addresses heteroscedasticity by giving different weights to observations based on their estimated variance. It downweights the influence of observations with higher variance, thereby giving more emphasis to those with lower variance. This approach effectively minimizes the impact of heteroscedasticity on the regression results.
- Huber-White Standard Errors (Robust Standard Errors): Instead of using the conventional standard errors based on the assumption of homoscedasticity, you can use Huber-White standard errors (also known as robust standard errors). These standard errors provide robust estimates of the regression coefficients, accounting for heteroscedasticity.
- Generalized Least Squares (GLS) Regression: This extension of WLS allows for the specification of a correlation structure in the residuals. It can be used when the variance of the residuals is not constant but follows a specific pattern.
- Bootstrapping: It is a resampling technique that can estimate the variability of the regression coefficients in the presence of heteroscedasticity. It involves repeatedly drawing samples from the data with replacement and estimating the regression model on each sample. This provides empirical estimates of the standard errors and confidence intervals, which are robust to heteroscedasticity.
You can consider the following approaches to handle the issue of autocorrelation in your pairs trading strategy:
- Pairs Residuals Autocorrelation: If you observe high autocorrelation in the residuals derived from the OLS regression between the two stock prices, consider investigating the specific nature of the autocorrelation. Plot the residuals' autocorrelation function (ACF) and partial autocorrelation function to identify the lag structure and any potential patterns.
- Residuals Transformation: If you find evidence of autocorrelation in the residuals, you can apply transformations to the residuals to remove or reduce the autocorrelation. For example, you may consider applying autoregressive (AR) or moving average (MA) models on the residuals to capture the autocorrelation structure explicitly. This can help improve the accuracy of your model and trading signals.
- Machine Learning Models: Instead of relying solely on OLS regression, you can explore machine learning models that capture complex relationships between stock prices. Techniques like support vector regression (SVR), random forests, or neural networks may be useful for pairs trading strategies.
Hope this helps and answers your queries!
Thanks,
Akshay