Part 2: Understanding Mean Absolute Deviation and Standard Deviation

In this blog post, we'll dive into the concepts of Mean Absolute Deviation (MAD) and Standard Deviation, two fundamental metrics in statistics and data science. These metrics help us understand how spread out our data points are from the average. Let's get started!

Why Measure Spread?

Imagine you have test scores from a history exam. Here are six data points representing scores out of 100:

75, 65, 72, 68, 70, 60

The average (mean) score is 70. When you plot these scores on a chart, the average score will be represented by a yellow line at 70. Each data point will be close to this average.

Now, consider a different set of test scores, say from a mathematics exam:

55, 85, 40, 90, 95, 45

The average score is still 70, but when you plot these scores, they are more spread out compared to the history scores.

In data science, it's crucial to know how far apart individual data points are from the average or how spread out they are. This helps in understanding the variability in your data.

Mean Absolute Deviation (MAD)

A straightforward way to measure spread is by calculating the Mean Absolute Deviation (MAD).

Calculate the Deviation: Subtract the average from each data point.
Take Absolute Values: Convert these deviations to absolute values (ignore negative signs).
Calculate the Mean of Absolute Deviations: Average these absolute deviations.

Example: History Test Scores

Let's calculate the MAD for the history test scores.

Deviations:

75 - 70 = 5
65 - 70 = -5
72 - 70 = 2
68 - 70 = -2
70 - 70 = 0
60 - 70 = -10

Absolute Deviations:

|5| = 5
|-5| = 5
|2| = 2 - |-2| = 2
|0| = 0
|-10| 10

Mean Absolute Deviation:

MAD = (5 + 5 + 2 + 2 + 0 + 10) / 6 = 4

Example: Mathematics Test Scores

Now for the mathematics test scores.

Deviations:

55 - 70 = -15
85 - 70 = 15
40 - 70 = -30
90 - 70 = 20
95 - 70 = 25
45 - 70 = -25

Absolute Deviations:

|-15| = 15
|15| = 15
|-30| = 30
|20| = 20
|25| = 25
|-25| = 25

Mean Absolute Deviation:

MAD = (15 + 15 + 30 + 20 + 25 + 25) / 6 = 21.67

The higher MAD in the math scores indicates that the data points are more spread out compared to the history scores.

Standard Deviation

While MAD is useful, sometimes we need a metric that accounts for the squared differences from the mean. This is where Standard Deviation comes in.

Calculate the Deviation: Subtract the average from each data point.
Square the Deviations: Square each deviation.
Calculate the Mean of Squared Deviations: Average these squared deviations.
Square Root: Take the square root of this mean.

Example: History Test Scores

Deviations:

75 - 70 = 5
65 - 70 = -5
72 - 70 = 2
68 - 70 = -2
70 - 70 = 0
60 - 70 = -10

Squared Deviations:

5^2 = 25
(-5)^2 = 25
2^2 = 4
(-2)^2 = 4
0^2 =
(-10)^2 = 100

Mean of Squared Deviations:

Mean = (25 + 25 + 4 + 4 + 0 + 100) / 6 = 26

Standard Deviation:

SD = sqrt(26) ≈ 5.1

Example: Mathematics Test Scores

Deviations:

55 - 70 = -15
85 - 70 = 15
40 - 70 = -30
90 - 70 = 20
95 - 70 = 25
45 - 70 = -25

Squared Deviations:

(-15)^2 = 225
15^2 = 225
(-30)^2 = 900
20^2 = 400
25^2 = 625
(-25)^2 = 625

Mean of Squared Deviations:

Mean = (225 + 225 + 900 + 400 + 625 + 625) / 6 = 500

Standard Deviation:

SD = sqrt(500) ≈ 22.36

The higher standard deviation in the math scores indicates a greater spread compared to the history scores.

L1 and L2 Norms

In machine learning, you might encounter terms like L1 and L2 norms. L1 norm often refers to Mean Absolute Deviation (MAD), while L2 norm refers to Standard Deviation.

L1 Norm (MAD): Sum of absolute differences.
L2 Norm (Standard Deviation): Square root of the sum of squared differences.

These norms are used in various machine learning algorithms, such as Ridge Regression (L2) and Lasso Regression (L1).

Python Code Examples

Let's implement these concepts using Python.

Mean Absolute Deviation (MAD)

import numpy as np

def mean_absolute_deviation(data):
    mean = np.mean(data)
    mad = np.mean(np.abs(data - mean))
    return mad

history_scores = np.array([75, 65, 72, 68, 70, 60])
math_scores = np.array([55, 85, 40, 90, 95, 45])

mad_history = mean_absolute_deviation(history_scores)
mad_math = mean_absolute_deviation(math_scores)

print(f"MAD (History): {mad_history}")
print(f"MAD (Math): {mad_math}")

Standard Deviation

def standard_deviation(data):
    mean = np.mean(data)
    variance = np.mean((data - mean) ** 2)
    std_dev = np.sqrt(variance)
    return std_dev

std_dev_history = standard_deviation(history_scores)
std_dev_math = standard_deviation(math_scores)

print(f"Standard Deviation (History): {std_dev_history}")
print(f"Standard Deviation (Math): {std_dev_math}")

Conclusion

Understanding Mean Absolute Deviation and Standard Deviation is crucial for analyzing data variability. MAD provides a straightforward measure of spread, while Standard Deviation gives a more nuanced view by considering squared differences. Both metrics are essential tools in statistics and data science, helping you make informed decisions based on data distribution.

Happy analyzing!

Part 2: Understanding Mean Absolute Deviation and Standard Deviation

Why Measure Spread?

Mean Absolute Deviation (MAD)

Example: History Test Scores

Example: Mathematics Test Scores

Standard Deviation

Example: History Test Scores

Example: Mathematics Test Scores

L1 and L2 Norms

Python Code Examples

Mean Absolute Deviation (MAD)

Standard Deviation

Conclusion

Recent Posts

Revanth Quick Learn