If we change only one value of a data set, will the mean absolute deviation behave as the same way as...
up vote
1
down vote
favorite
I took the new data as b and the data removed as a and calculated the new mean and used that to find the new mean and deviation in terms of old. But it gets too complicated and there is no way to get the relation looking at the terms.
Basically the question is, if after changing only one value of a data set, if the mean absolute deviation increases, will standard deviation always increase? Or is there any case where it can decrease too?
EDIT: Taking absolute mean deviation about the mean. Basically the sum of absolute difference of every point in data set with the mean divided by the number of data points.
statistics standard-deviation
add a comment |
up vote
1
down vote
favorite
I took the new data as b and the data removed as a and calculated the new mean and used that to find the new mean and deviation in terms of old. But it gets too complicated and there is no way to get the relation looking at the terms.
Basically the question is, if after changing only one value of a data set, if the mean absolute deviation increases, will standard deviation always increase? Or is there any case where it can decrease too?
EDIT: Taking absolute mean deviation about the mean. Basically the sum of absolute difference of every point in data set with the mean divided by the number of data points.
statistics standard-deviation
The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
– BruceET
Nov 21 at 21:44
What is your definition of MAD?
– BruceET
Nov 23 at 19:22
if a is the mean of data set x_i, then sum over $|(x_i - a)|$ / n
– Avinash Bhawnani
Nov 23 at 19:28
I have posted an Addendum to my Answer illustrating your definition of MAD. With this definition, moving any farther from the sample mean will increase both SD and MAD, but often the SD will change the most. If the $j$th value is changed to lie farther from $bar X,$ then both $(X_j -bar X)^2$ and $|X_j -bar X|$ will increase. Because of the squaring, the former may change greatly.
– BruceET
Nov 23 at 23:23
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I took the new data as b and the data removed as a and calculated the new mean and used that to find the new mean and deviation in terms of old. But it gets too complicated and there is no way to get the relation looking at the terms.
Basically the question is, if after changing only one value of a data set, if the mean absolute deviation increases, will standard deviation always increase? Or is there any case where it can decrease too?
EDIT: Taking absolute mean deviation about the mean. Basically the sum of absolute difference of every point in data set with the mean divided by the number of data points.
statistics standard-deviation
I took the new data as b and the data removed as a and calculated the new mean and used that to find the new mean and deviation in terms of old. But it gets too complicated and there is no way to get the relation looking at the terms.
Basically the question is, if after changing only one value of a data set, if the mean absolute deviation increases, will standard deviation always increase? Or is there any case where it can decrease too?
EDIT: Taking absolute mean deviation about the mean. Basically the sum of absolute difference of every point in data set with the mean divided by the number of data points.
statistics standard-deviation
statistics standard-deviation
edited Nov 22 at 3:13
asked Nov 20 at 21:58
Avinash Bhawnani
34519
34519
The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
– BruceET
Nov 21 at 21:44
What is your definition of MAD?
– BruceET
Nov 23 at 19:22
if a is the mean of data set x_i, then sum over $|(x_i - a)|$ / n
– Avinash Bhawnani
Nov 23 at 19:28
I have posted an Addendum to my Answer illustrating your definition of MAD. With this definition, moving any farther from the sample mean will increase both SD and MAD, but often the SD will change the most. If the $j$th value is changed to lie farther from $bar X,$ then both $(X_j -bar X)^2$ and $|X_j -bar X|$ will increase. Because of the squaring, the former may change greatly.
– BruceET
Nov 23 at 23:23
add a comment |
The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
– BruceET
Nov 21 at 21:44
What is your definition of MAD?
– BruceET
Nov 23 at 19:22
if a is the mean of data set x_i, then sum over $|(x_i - a)|$ / n
– Avinash Bhawnani
Nov 23 at 19:28
I have posted an Addendum to my Answer illustrating your definition of MAD. With this definition, moving any farther from the sample mean will increase both SD and MAD, but often the SD will change the most. If the $j$th value is changed to lie farther from $bar X,$ then both $(X_j -bar X)^2$ and $|X_j -bar X|$ will increase. Because of the squaring, the former may change greatly.
– BruceET
Nov 23 at 23:23
The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
– BruceET
Nov 21 at 21:44
The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
– BruceET
Nov 21 at 21:44
What is your definition of MAD?
– BruceET
Nov 23 at 19:22
What is your definition of MAD?
– BruceET
Nov 23 at 19:22
if a is the mean of data set x_i, then sum over $|(x_i - a)|$ / n
– Avinash Bhawnani
Nov 23 at 19:28
if a is the mean of data set x_i, then sum over $|(x_i - a)|$ / n
– Avinash Bhawnani
Nov 23 at 19:28
I have posted an Addendum to my Answer illustrating your definition of MAD. With this definition, moving any farther from the sample mean will increase both SD and MAD, but often the SD will change the most. If the $j$th value is changed to lie farther from $bar X,$ then both $(X_j -bar X)^2$ and $|X_j -bar X|$ will increase. Because of the squaring, the former may change greatly.
– BruceET
Nov 23 at 23:23
I have posted an Addendum to my Answer illustrating your definition of MAD. With this definition, moving any farther from the sample mean will increase both SD and MAD, but often the SD will change the most. If the $j$th value is changed to lie farther from $bar X,$ then both $(X_j -bar X)^2$ and $|X_j -bar X|$ will increase. Because of the squaring, the former may change greatly.
– BruceET
Nov 23 at 23:23
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
Here is an example using the definition of MAD implemented in R statistical
software: For the sample $X_i, dots, X_n,$
$$text{MAD} = 1.4826,text{Med}(|X_i - H|).$$
where $H$ is the median of the sample, and the constant multiple is intended
to put values on a scale so that MAD and sample standard deviation $S$ are
roughly comparable for large normal samples. So according to this definition
MAD is based on the Median of the absolute differences from the sample median.
Here is a sample of size $n = 20$ from $mathsf{Norm}(mu=100, sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.
x = rnorm(20, 100, 15)
summary(x); sd(x); mad(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 98.67 111.50 138.14
## 19.50935
## 20.83691
boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")
So the two values are roughly the same. Now I sort the data, choose the largest value, and replace it by the outlier 200.
x.sort = sort(x); x.20 = x.sort[20]; x.20
## 138.1427
x.sort[20] = 200; x.sort[20]
## 200
summary(x.sort); sd(x.sort); mad(x.sort)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 101.77 111.50 200.00
## 28.79103
## 20.83691
boxplot(x.sort, horizontal=T, col="skyblue2", pch=20,
main="Boxplot of Modified Sample")
Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77).
Also, the MAD was not increased (20.83691 before and after), but the sample SD
has increased noticeably (roughly, from 19.5 to 28.8).
One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.
Addendum using your definition of MAD (mean absolute deviation from sample mean). This is not as 'robust' a definition, but it works in
somewhat the same way as the one I used above. No figures this time.
Changes in R code: I have to use my own code to get this MAD, set.seed
statement will allow you to get exactly the same sample of size 20
as I used (if you try this on your own in R). Original data is x
, data with one value changed to
outlier 200 to get altered data y
.
set.seed(1123)
x = rnorm(20, 100, 15)
summary(x); mean(x); sd(x); mean(abs(x-mean(a)))
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.74 89.17 101.88 101.81 112.51 128.71
## 101.8078 # sample mean
## 13.70151 # sample SD
## 11.19836 # sample MAD
x.sort = sort(x); x.20 = x.sort[20]; x.20
## 128.7068
x.sort[20] = 200; x.sort[20]; y=x.sort
## 200 # 128.71 changed to 200
summary(y); a = mean(y); s = sd(y); s
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.74 89.17 101.88 105.37 112.51 200.00
## 25.37187 # new sample SE
MAD = mean(abs(y-a)); MAD
## 15.07735 # new sample MAD
Original data x
has sample mean $bar X = 101.81,$ sample
SD $S_x = 13.70,$ $text{MAD}_x = 11.2.$
Altered data y
has sample mean $bar Y = 105.37,$ sample
SD $S_y = 25.37,$ $text{MAD}_y = 15.1.$
So the alteration makes a large difference in the sample SD
and relatively little difference in MAD (according to your definition).
but still both are going in the positive direction only, is it possible for one to go positive and other to negative?
– Avinash Bhawnani
Nov 25 at 2:17
1
Generally speaking if you alter a dataset to increase its range, both SD and MAD (either definition) will also increase. However, I guess it may be possible to contrive a situation in which one or both of the MADs would decrease by a little. I could be wrong, but I don't think one could prove a theorem that (the median) MAD always increases.
– BruceET
Nov 25 at 2:22
yeah, actually from hunch i had the same result but i was in no way able to prove it. I then took a random data set of 3 or 4 values(I don't rememember) with a variable x and found an intriguing relation(the point where mad starts increasing is the same as where sd stops decreasing) but was not able to prove generally
– Avinash Bhawnani
Nov 25 at 10:37
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Here is an example using the definition of MAD implemented in R statistical
software: For the sample $X_i, dots, X_n,$
$$text{MAD} = 1.4826,text{Med}(|X_i - H|).$$
where $H$ is the median of the sample, and the constant multiple is intended
to put values on a scale so that MAD and sample standard deviation $S$ are
roughly comparable for large normal samples. So according to this definition
MAD is based on the Median of the absolute differences from the sample median.
Here is a sample of size $n = 20$ from $mathsf{Norm}(mu=100, sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.
x = rnorm(20, 100, 15)
summary(x); sd(x); mad(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 98.67 111.50 138.14
## 19.50935
## 20.83691
boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")
So the two values are roughly the same. Now I sort the data, choose the largest value, and replace it by the outlier 200.
x.sort = sort(x); x.20 = x.sort[20]; x.20
## 138.1427
x.sort[20] = 200; x.sort[20]
## 200
summary(x.sort); sd(x.sort); mad(x.sort)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 101.77 111.50 200.00
## 28.79103
## 20.83691
boxplot(x.sort, horizontal=T, col="skyblue2", pch=20,
main="Boxplot of Modified Sample")
Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77).
Also, the MAD was not increased (20.83691 before and after), but the sample SD
has increased noticeably (roughly, from 19.5 to 28.8).
One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.
Addendum using your definition of MAD (mean absolute deviation from sample mean). This is not as 'robust' a definition, but it works in
somewhat the same way as the one I used above. No figures this time.
Changes in R code: I have to use my own code to get this MAD, set.seed
statement will allow you to get exactly the same sample of size 20
as I used (if you try this on your own in R). Original data is x
, data with one value changed to
outlier 200 to get altered data y
.
set.seed(1123)
x = rnorm(20, 100, 15)
summary(x); mean(x); sd(x); mean(abs(x-mean(a)))
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.74 89.17 101.88 101.81 112.51 128.71
## 101.8078 # sample mean
## 13.70151 # sample SD
## 11.19836 # sample MAD
x.sort = sort(x); x.20 = x.sort[20]; x.20
## 128.7068
x.sort[20] = 200; x.sort[20]; y=x.sort
## 200 # 128.71 changed to 200
summary(y); a = mean(y); s = sd(y); s
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.74 89.17 101.88 105.37 112.51 200.00
## 25.37187 # new sample SE
MAD = mean(abs(y-a)); MAD
## 15.07735 # new sample MAD
Original data x
has sample mean $bar X = 101.81,$ sample
SD $S_x = 13.70,$ $text{MAD}_x = 11.2.$
Altered data y
has sample mean $bar Y = 105.37,$ sample
SD $S_y = 25.37,$ $text{MAD}_y = 15.1.$
So the alteration makes a large difference in the sample SD
and relatively little difference in MAD (according to your definition).
but still both are going in the positive direction only, is it possible for one to go positive and other to negative?
– Avinash Bhawnani
Nov 25 at 2:17
1
Generally speaking if you alter a dataset to increase its range, both SD and MAD (either definition) will also increase. However, I guess it may be possible to contrive a situation in which one or both of the MADs would decrease by a little. I could be wrong, but I don't think one could prove a theorem that (the median) MAD always increases.
– BruceET
Nov 25 at 2:22
yeah, actually from hunch i had the same result but i was in no way able to prove it. I then took a random data set of 3 or 4 values(I don't rememember) with a variable x and found an intriguing relation(the point where mad starts increasing is the same as where sd stops decreasing) but was not able to prove generally
– Avinash Bhawnani
Nov 25 at 10:37
add a comment |
up vote
0
down vote
Here is an example using the definition of MAD implemented in R statistical
software: For the sample $X_i, dots, X_n,$
$$text{MAD} = 1.4826,text{Med}(|X_i - H|).$$
where $H$ is the median of the sample, and the constant multiple is intended
to put values on a scale so that MAD and sample standard deviation $S$ are
roughly comparable for large normal samples. So according to this definition
MAD is based on the Median of the absolute differences from the sample median.
Here is a sample of size $n = 20$ from $mathsf{Norm}(mu=100, sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.
x = rnorm(20, 100, 15)
summary(x); sd(x); mad(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 98.67 111.50 138.14
## 19.50935
## 20.83691
boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")
So the two values are roughly the same. Now I sort the data, choose the largest value, and replace it by the outlier 200.
x.sort = sort(x); x.20 = x.sort[20]; x.20
## 138.1427
x.sort[20] = 200; x.sort[20]
## 200
summary(x.sort); sd(x.sort); mad(x.sort)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 101.77 111.50 200.00
## 28.79103
## 20.83691
boxplot(x.sort, horizontal=T, col="skyblue2", pch=20,
main="Boxplot of Modified Sample")
Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77).
Also, the MAD was not increased (20.83691 before and after), but the sample SD
has increased noticeably (roughly, from 19.5 to 28.8).
One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.
Addendum using your definition of MAD (mean absolute deviation from sample mean). This is not as 'robust' a definition, but it works in
somewhat the same way as the one I used above. No figures this time.
Changes in R code: I have to use my own code to get this MAD, set.seed
statement will allow you to get exactly the same sample of size 20
as I used (if you try this on your own in R). Original data is x
, data with one value changed to
outlier 200 to get altered data y
.
set.seed(1123)
x = rnorm(20, 100, 15)
summary(x); mean(x); sd(x); mean(abs(x-mean(a)))
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.74 89.17 101.88 101.81 112.51 128.71
## 101.8078 # sample mean
## 13.70151 # sample SD
## 11.19836 # sample MAD
x.sort = sort(x); x.20 = x.sort[20]; x.20
## 128.7068
x.sort[20] = 200; x.sort[20]; y=x.sort
## 200 # 128.71 changed to 200
summary(y); a = mean(y); s = sd(y); s
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.74 89.17 101.88 105.37 112.51 200.00
## 25.37187 # new sample SE
MAD = mean(abs(y-a)); MAD
## 15.07735 # new sample MAD
Original data x
has sample mean $bar X = 101.81,$ sample
SD $S_x = 13.70,$ $text{MAD}_x = 11.2.$
Altered data y
has sample mean $bar Y = 105.37,$ sample
SD $S_y = 25.37,$ $text{MAD}_y = 15.1.$
So the alteration makes a large difference in the sample SD
and relatively little difference in MAD (according to your definition).
but still both are going in the positive direction only, is it possible for one to go positive and other to negative?
– Avinash Bhawnani
Nov 25 at 2:17
1
Generally speaking if you alter a dataset to increase its range, both SD and MAD (either definition) will also increase. However, I guess it may be possible to contrive a situation in which one or both of the MADs would decrease by a little. I could be wrong, but I don't think one could prove a theorem that (the median) MAD always increases.
– BruceET
Nov 25 at 2:22
yeah, actually from hunch i had the same result but i was in no way able to prove it. I then took a random data set of 3 or 4 values(I don't rememember) with a variable x and found an intriguing relation(the point where mad starts increasing is the same as where sd stops decreasing) but was not able to prove generally
– Avinash Bhawnani
Nov 25 at 10:37
add a comment |
up vote
0
down vote
up vote
0
down vote
Here is an example using the definition of MAD implemented in R statistical
software: For the sample $X_i, dots, X_n,$
$$text{MAD} = 1.4826,text{Med}(|X_i - H|).$$
where $H$ is the median of the sample, and the constant multiple is intended
to put values on a scale so that MAD and sample standard deviation $S$ are
roughly comparable for large normal samples. So according to this definition
MAD is based on the Median of the absolute differences from the sample median.
Here is a sample of size $n = 20$ from $mathsf{Norm}(mu=100, sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.
x = rnorm(20, 100, 15)
summary(x); sd(x); mad(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 98.67 111.50 138.14
## 19.50935
## 20.83691
boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")
So the two values are roughly the same. Now I sort the data, choose the largest value, and replace it by the outlier 200.
x.sort = sort(x); x.20 = x.sort[20]; x.20
## 138.1427
x.sort[20] = 200; x.sort[20]
## 200
summary(x.sort); sd(x.sort); mad(x.sort)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 101.77 111.50 200.00
## 28.79103
## 20.83691
boxplot(x.sort, horizontal=T, col="skyblue2", pch=20,
main="Boxplot of Modified Sample")
Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77).
Also, the MAD was not increased (20.83691 before and after), but the sample SD
has increased noticeably (roughly, from 19.5 to 28.8).
One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.
Addendum using your definition of MAD (mean absolute deviation from sample mean). This is not as 'robust' a definition, but it works in
somewhat the same way as the one I used above. No figures this time.
Changes in R code: I have to use my own code to get this MAD, set.seed
statement will allow you to get exactly the same sample of size 20
as I used (if you try this on your own in R). Original data is x
, data with one value changed to
outlier 200 to get altered data y
.
set.seed(1123)
x = rnorm(20, 100, 15)
summary(x); mean(x); sd(x); mean(abs(x-mean(a)))
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.74 89.17 101.88 101.81 112.51 128.71
## 101.8078 # sample mean
## 13.70151 # sample SD
## 11.19836 # sample MAD
x.sort = sort(x); x.20 = x.sort[20]; x.20
## 128.7068
x.sort[20] = 200; x.sort[20]; y=x.sort
## 200 # 128.71 changed to 200
summary(y); a = mean(y); s = sd(y); s
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.74 89.17 101.88 105.37 112.51 200.00
## 25.37187 # new sample SE
MAD = mean(abs(y-a)); MAD
## 15.07735 # new sample MAD
Original data x
has sample mean $bar X = 101.81,$ sample
SD $S_x = 13.70,$ $text{MAD}_x = 11.2.$
Altered data y
has sample mean $bar Y = 105.37,$ sample
SD $S_y = 25.37,$ $text{MAD}_y = 15.1.$
So the alteration makes a large difference in the sample SD
and relatively little difference in MAD (according to your definition).
Here is an example using the definition of MAD implemented in R statistical
software: For the sample $X_i, dots, X_n,$
$$text{MAD} = 1.4826,text{Med}(|X_i - H|).$$
where $H$ is the median of the sample, and the constant multiple is intended
to put values on a scale so that MAD and sample standard deviation $S$ are
roughly comparable for large normal samples. So according to this definition
MAD is based on the Median of the absolute differences from the sample median.
Here is a sample of size $n = 20$ from $mathsf{Norm}(mu=100, sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.
x = rnorm(20, 100, 15)
summary(x); sd(x); mad(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 98.67 111.50 138.14
## 19.50935
## 20.83691
boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")
So the two values are roughly the same. Now I sort the data, choose the largest value, and replace it by the outlier 200.
x.sort = sort(x); x.20 = x.sort[20]; x.20
## 138.1427
x.sort[20] = 200; x.sort[20]
## 200
summary(x.sort); sd(x.sort); mad(x.sort)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 101.77 111.50 200.00
## 28.79103
## 20.83691
boxplot(x.sort, horizontal=T, col="skyblue2", pch=20,
main="Boxplot of Modified Sample")
Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77).
Also, the MAD was not increased (20.83691 before and after), but the sample SD
has increased noticeably (roughly, from 19.5 to 28.8).
One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.
Addendum using your definition of MAD (mean absolute deviation from sample mean). This is not as 'robust' a definition, but it works in
somewhat the same way as the one I used above. No figures this time.
Changes in R code: I have to use my own code to get this MAD, set.seed
statement will allow you to get exactly the same sample of size 20
as I used (if you try this on your own in R). Original data is x
, data with one value changed to
outlier 200 to get altered data y
.
set.seed(1123)
x = rnorm(20, 100, 15)
summary(x); mean(x); sd(x); mean(abs(x-mean(a)))
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.74 89.17 101.88 101.81 112.51 128.71
## 101.8078 # sample mean
## 13.70151 # sample SD
## 11.19836 # sample MAD
x.sort = sort(x); x.20 = x.sort[20]; x.20
## 128.7068
x.sort[20] = 200; x.sort[20]; y=x.sort
## 200 # 128.71 changed to 200
summary(y); a = mean(y); s = sd(y); s
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.74 89.17 101.88 105.37 112.51 200.00
## 25.37187 # new sample SE
MAD = mean(abs(y-a)); MAD
## 15.07735 # new sample MAD
Original data x
has sample mean $bar X = 101.81,$ sample
SD $S_x = 13.70,$ $text{MAD}_x = 11.2.$
Altered data y
has sample mean $bar Y = 105.37,$ sample
SD $S_y = 25.37,$ $text{MAD}_y = 15.1.$
So the alteration makes a large difference in the sample SD
and relatively little difference in MAD (according to your definition).
edited Nov 23 at 22:51
answered Nov 21 at 21:30
BruceET
34.9k71440
34.9k71440
but still both are going in the positive direction only, is it possible for one to go positive and other to negative?
– Avinash Bhawnani
Nov 25 at 2:17
1
Generally speaking if you alter a dataset to increase its range, both SD and MAD (either definition) will also increase. However, I guess it may be possible to contrive a situation in which one or both of the MADs would decrease by a little. I could be wrong, but I don't think one could prove a theorem that (the median) MAD always increases.
– BruceET
Nov 25 at 2:22
yeah, actually from hunch i had the same result but i was in no way able to prove it. I then took a random data set of 3 or 4 values(I don't rememember) with a variable x and found an intriguing relation(the point where mad starts increasing is the same as where sd stops decreasing) but was not able to prove generally
– Avinash Bhawnani
Nov 25 at 10:37
add a comment |
but still both are going in the positive direction only, is it possible for one to go positive and other to negative?
– Avinash Bhawnani
Nov 25 at 2:17
1
Generally speaking if you alter a dataset to increase its range, both SD and MAD (either definition) will also increase. However, I guess it may be possible to contrive a situation in which one or both of the MADs would decrease by a little. I could be wrong, but I don't think one could prove a theorem that (the median) MAD always increases.
– BruceET
Nov 25 at 2:22
yeah, actually from hunch i had the same result but i was in no way able to prove it. I then took a random data set of 3 or 4 values(I don't rememember) with a variable x and found an intriguing relation(the point where mad starts increasing is the same as where sd stops decreasing) but was not able to prove generally
– Avinash Bhawnani
Nov 25 at 10:37
but still both are going in the positive direction only, is it possible for one to go positive and other to negative?
– Avinash Bhawnani
Nov 25 at 2:17
but still both are going in the positive direction only, is it possible for one to go positive and other to negative?
– Avinash Bhawnani
Nov 25 at 2:17
1
1
Generally speaking if you alter a dataset to increase its range, both SD and MAD (either definition) will also increase. However, I guess it may be possible to contrive a situation in which one or both of the MADs would decrease by a little. I could be wrong, but I don't think one could prove a theorem that (the median) MAD always increases.
– BruceET
Nov 25 at 2:22
Generally speaking if you alter a dataset to increase its range, both SD and MAD (either definition) will also increase. However, I guess it may be possible to contrive a situation in which one or both of the MADs would decrease by a little. I could be wrong, but I don't think one could prove a theorem that (the median) MAD always increases.
– BruceET
Nov 25 at 2:22
yeah, actually from hunch i had the same result but i was in no way able to prove it. I then took a random data set of 3 or 4 values(I don't rememember) with a variable x and found an intriguing relation(the point where mad starts increasing is the same as where sd stops decreasing) but was not able to prove generally
– Avinash Bhawnani
Nov 25 at 10:37
yeah, actually from hunch i had the same result but i was in no way able to prove it. I then took a random data set of 3 or 4 values(I don't rememember) with a variable x and found an intriguing relation(the point where mad starts increasing is the same as where sd stops decreasing) but was not able to prove generally
– Avinash Bhawnani
Nov 25 at 10:37
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3006958%2fif-we-change-only-one-value-of-a-data-set-will-the-mean-absolute-deviation-beha%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
– BruceET
Nov 21 at 21:44
What is your definition of MAD?
– BruceET
Nov 23 at 19:22
if a is the mean of data set x_i, then sum over $|(x_i - a)|$ / n
– Avinash Bhawnani
Nov 23 at 19:28
I have posted an Addendum to my Answer illustrating your definition of MAD. With this definition, moving any farther from the sample mean will increase both SD and MAD, but often the SD will change the most. If the $j$th value is changed to lie farther from $bar X,$ then both $(X_j -bar X)^2$ and $|X_j -bar X|$ will increase. Because of the squaring, the former may change greatly.
– BruceET
Nov 23 at 23:23