Linear Regression
Linear Regression is one of the most widely known modeling techniques. Linear regression establishes a relationship between a dependent variable (Y) and one or more independent variables (X) using a best fit straight line. If there is only one independent variable, then it is called as a simple linear regression but if there are more than one independent variables, then it is called as multiple linear regression.
It is mathematically represented by the following equation: Y= a+b*X + e
Where,
a = intercept, b = slope, e = error term
How to obtain the best-fit line?
The most common method for fitting a regression line is the method of least-squares. It calculates the best-fit line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line. Because the deviations are first squared, when added, there is no canceling out between positive and negative values.
However, the optimal regression line is the one for which the sum of the squared differences (vertical distances) between the ‘y’ values predicted by the regression equation/line and the actual ‘y’ values is minimal.