Part 8: Continuing Our Journey with Simple Linear Regression: Understanding Cost Function and Gradient Descent

Hello again, young learners! We're back to dive even deeper into Simple Linear Regression. Last time, we discussed our main aim and the notations used in this algorithm. Today, we will learn how to find the best fit line using a cost function and a technique called gradient descent. Let’s get started!

Recap: Our Main Aim

Our main aim is to create a best fit line that predicts the output (like height) for a given input (like weight). We want to minimize the error between our predicted values and the actual values. To do this, we need to use something called a cost function.

What is a Cost Function?

A cost function helps us measure how well our model’s predictions match the actual data. Think of it as a way to calculate the total error. For Simple Linear Regression, a common cost function is the Mean Squared Error (MSE).

The Cost Function Formula

The cost function for Simple Linear Regression is given by:

[ J(\theta(θ)_0, \theta(θ)_1) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2 ]

Where:

( J(\theta_0, \theta_1) ) is the cost function.
( m ) is the number of data points.
( h_\theta(x^{(i)}) ) is the predicted value for the ( i )-th data point.
( y^{(i)} ) is the actual value for the ( i )-th data point.
( \theta_0 ) and ( \theta_1 ) are the parameters (intercept and slope).

Breaking Down the Formula

Predicted Points ( h_\theta(x^{(i)}) ): These are the points on our best fit line.
True Points ( y^{(i)} ): These are the actual data points.
Error: The difference between the predicted points and the true points.
Squared Error: We square the error to make sure it's always positive.
Mean Squared Error: We sum up all the squared errors and divide by the number of data points to get the average error.

Why Minimize the Cost Function?

By minimizing the cost function, we find the values of ( \theta_0 ) and ( \theta_1 ) that make our predictions as close as possible to the actual values. This helps us create the best fit line.

Introducing Gradient Descent

To minimize the cost function, we use a technique called Gradient Descent. This is an optimization method that helps us find the lowest point on the cost function curve, known as the global minima.

Understanding Gradient Descent

Imagine you're on a hill, and your goal is to reach the bottom (the lowest error). Gradient descent helps you take small steps downhill until you reach the bottom.

Here's how it works:

Start with Initial Values: Begin with initial guesses for ( \theta_0 ) and ( \theta_1 ).
Calculate the Gradient: Determine the direction and steepness of the slope.
Update the Parameters: Adjust ( \theta_0 ) and ( \theta_1 ) to move downhill.
Repeat: Continue until you reach the lowest point.

Example with Data Points

Let's use some simple data points to understand this better:

Weight (x)	Height (y)
1	1
2	2
3	3

We want to find the best fit line for these points.

Step 1: Initial Values

Let's start with an initial guess:

( \theta_0 = 0 )
( \theta_1 = 1 )

Our equation is: [ h_\theta(x) = \theta_1 \cdot x ] Since ( \theta_0 = 0 ), it’s simplified to ( h_\theta(x) = x ).

Step 2: Calculate Predicted Points

For ( \theta_1 = 1 ):

( x = 1 ): ( h_\theta(1) = 1 )
( x = 2 ): ( h_\theta(2) = 2 )
( x = 3 ): ( h_\theta(3) = 3 )

These points match our actual data points, so the error is zero.

Step 3: Update Parameters

Now, let's change ( \theta_1 ) to 0.5 and see what happens: [ h_\theta(x) = 0.5 \cdot x ]

( x = 1 ): ( h_\theta(1) = 0.5 )
( x = 2 ): ( h_\theta(2) = 1 )
( x = 3 ): ( h_\theta(3) = 1.5 )

The errors are:

( y = 1 ): Error = ( 1 - 0.5 = 0.5 )
( y = 2 ): Error = ( 2 - 1 = 1 )
( y = 3 ): Error = ( 3 - 1.5 = 1.5 )

Step 4: Repeat

We continue adjusting ( \theta_1 ) to find the value that minimizes the error. When ( \theta_1 = 1 ), our error was zero, which means we found the best fit line.

Visualizing Gradient Descent

To visualize this, imagine plotting the cost function for different values of ( \theta_1 ). The curve will look like a bowl. The lowest point in the bowl is the global minima where the error is the smallest.

The Curve

Theta (θ₁): The x-axis represents different values of ( \theta_1 ).
Cost (J(θ₁)): The y-axis represents the cost function values.

By plotting these, we see a curve. Our goal is to find the lowest point on this curve.

Wrapping Up

Today, we learned about the cost function and how it helps us measure error in Simple Linear Regression. We also introduced gradient descent, a technique that helps us find the best fit line by minimizing the cost function.

Next time, we'll discuss how to use convergence algorithms to efficiently find the best values for ( \theta_0 ) and ( \theta_1 ). Until then, keep exploring and stay curious!

Happy learning! 🚀