본문으로 바로가기

The OLS Formula

If you choose an option to add a trend line in excel or other programs like Python, SAS or R, they are essentially plotting OLS-dervied coordinates. In this section, I explain in detail what the computer does when it tries to find the line that best fits the data points.

 

The basic idea is very simple. The computer decomposese the actual value of an observation ($Y$) into fitted value ($\hat{Y}$) and a residual ($e$). The best fitted line is the line that minimizes the overall distance between the actual and the predicted values.

 

 

Figure 14

 

Since negative and positive residuals cancel each other, we won't be taking simple sum. One way to resolve this problem and find the "best fit" at the same time is to minimize the sum of squared vertical distances between data points and the equation line - hence, the name of this method is ordinary least squares, OLS.

 

Since we write $y_i = \hat{\alpha}+ \hat{\beta} x_i + e_i $ and we want to find $\hat{a}$ and $\hat{b}$ such that minimizes $\sum^N_i=1 e^2_i$, we have:

The first order conditions (FOC) for this minimization are:

 

where $\bar{X} = \frac{\sum^N_{i=1} x_i}{N} = E(X)$, the sample average.

 

Solving this system of equations, we obtain:

This was introduced in the previous post

Note: This simplified formula does not work on multivariate regression.