Derivation of the Normal Equation of Linear Regression problem

Published in

Some aspects of the Artificial Intelligence

4 min readSep 12, 2020

Linear Regression is the prediction of the continuous labels which finds the line fits best to the data points that plotted on the cartesian coordinate system.

Here is the example of the data points that illustrate the relationship between the years of experience and salary. Everything we need to define the best linear formula that expresses all these points.

Here is the line that fits the data points given above. Different optimization algorithms can be applied to fins the linear formula, like Stochastic Gradient Descent, Batch Gradient Descent and etc. But, here we will use another approach to this problem, called the normal equation.

First of all, we need to write the hypothesis function which fits the linearly to our data set. For the picture given above, we have one independent variable ( feature) -> Years of Experience and one dependent variable ( label ) -> Salary.

But in most cases, we will have multiple features and the hypothesis will be:

In the given example, each element of our dataset has n features. We can have tones of elements in our dataset. For now, we consider just one element of our dataset. If we express our elements as vectors:

Let's multiply the transpose of Omega by X and analyze the solution.

As we said, we consider only one element of our data set. Now let’s consider all the elements of our datasets to calculate the cost function. The cost function for all elements, which is:

To write it in the form of the vector, we will use the following expression:

If we express the whole features as the matrix in a dimension of m x n, and all the weights as big omega (like above), we will get the following equation:

If we rewrite our cost function, we will get the following function.

Note: Since the square of the matrix is not the equal of its square values, we should multiply it by its transpose.

Since the 1/2m coefficient does not play any role in our differentiation, we may eliminate it.

Now to minimize the cost (error), we need to find the derivative and it should be equal to zero. To find the derivatives, do not forget that, multiplication of the matrix by its transpose express the square, so the derivative will be:

Now we need to calculate the weight matrix to get the normal equation.

Our final equation is:

Derivation of the Normal Equation of Linear Regression problem

Written by Yusif Ibrahimov