If we change only one value of a data set, will the mean absolute deviation behave as the same way as...











up vote
1
down vote

favorite












I took the new data as b and the data removed as a and calculated the new mean and used that to find the new mean and deviation in terms of old. But it gets too complicated and there is no way to get the relation looking at the terms.



Basically the question is, if after changing only one value of a data set, if the mean absolute deviation increases, will standard deviation always increase? Or is there any case where it can decrease too?



EDIT: Taking absolute mean deviation about the mean. Basically the sum of absolute difference of every point in data set with the mean divided by the number of data points.










share|cite|improve this question
























  • The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
    – BruceET
    Nov 21 at 21:44










  • What is your definition of MAD?
    – BruceET
    Nov 23 at 19:22










  • if a is the mean of data set x_i, then sum over $|(x_i - a)|$ / n
    – Avinash Bhawnani
    Nov 23 at 19:28












  • I have posted an Addendum to my Answer illustrating your definition of MAD. With this definition, moving any farther from the sample mean will increase both SD and MAD, but often the SD will change the most. If the $j$th value is changed to lie farther from $bar X,$ then both $(X_j -bar X)^2$ and $|X_j -bar X|$ will increase. Because of the squaring, the former may change greatly.
    – BruceET
    Nov 23 at 23:23















up vote
1
down vote

favorite












I took the new data as b and the data removed as a and calculated the new mean and used that to find the new mean and deviation in terms of old. But it gets too complicated and there is no way to get the relation looking at the terms.



Basically the question is, if after changing only one value of a data set, if the mean absolute deviation increases, will standard deviation always increase? Or is there any case where it can decrease too?



EDIT: Taking absolute mean deviation about the mean. Basically the sum of absolute difference of every point in data set with the mean divided by the number of data points.










share|cite|improve this question
























  • The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
    – BruceET
    Nov 21 at 21:44










  • What is your definition of MAD?
    – BruceET
    Nov 23 at 19:22










  • if a is the mean of data set x_i, then sum over $|(x_i - a)|$ / n
    – Avinash Bhawnani
    Nov 23 at 19:28












  • I have posted an Addendum to my Answer illustrating your definition of MAD. With this definition, moving any farther from the sample mean will increase both SD and MAD, but often the SD will change the most. If the $j$th value is changed to lie farther from $bar X,$ then both $(X_j -bar X)^2$ and $|X_j -bar X|$ will increase. Because of the squaring, the former may change greatly.
    – BruceET
    Nov 23 at 23:23













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I took the new data as b and the data removed as a and calculated the new mean and used that to find the new mean and deviation in terms of old. But it gets too complicated and there is no way to get the relation looking at the terms.



Basically the question is, if after changing only one value of a data set, if the mean absolute deviation increases, will standard deviation always increase? Or is there any case where it can decrease too?



EDIT: Taking absolute mean deviation about the mean. Basically the sum of absolute difference of every point in data set with the mean divided by the number of data points.










share|cite|improve this question















I took the new data as b and the data removed as a and calculated the new mean and used that to find the new mean and deviation in terms of old. But it gets too complicated and there is no way to get the relation looking at the terms.



Basically the question is, if after changing only one value of a data set, if the mean absolute deviation increases, will standard deviation always increase? Or is there any case where it can decrease too?



EDIT: Taking absolute mean deviation about the mean. Basically the sum of absolute difference of every point in data set with the mean divided by the number of data points.







statistics standard-deviation






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Nov 22 at 3:13

























asked Nov 20 at 21:58









Avinash Bhawnani

34519




34519












  • The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
    – BruceET
    Nov 21 at 21:44










  • What is your definition of MAD?
    – BruceET
    Nov 23 at 19:22










  • if a is the mean of data set x_i, then sum over $|(x_i - a)|$ / n
    – Avinash Bhawnani
    Nov 23 at 19:28












  • I have posted an Addendum to my Answer illustrating your definition of MAD. With this definition, moving any farther from the sample mean will increase both SD and MAD, but often the SD will change the most. If the $j$th value is changed to lie farther from $bar X,$ then both $(X_j -bar X)^2$ and $|X_j -bar X|$ will increase. Because of the squaring, the former may change greatly.
    – BruceET
    Nov 23 at 23:23


















  • The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
    – BruceET
    Nov 21 at 21:44










  • What is your definition of MAD?
    – BruceET
    Nov 23 at 19:22










  • if a is the mean of data set x_i, then sum over $|(x_i - a)|$ / n
    – Avinash Bhawnani
    Nov 23 at 19:28












  • I have posted an Addendum to my Answer illustrating your definition of MAD. With this definition, moving any farther from the sample mean will increase both SD and MAD, but often the SD will change the most. If the $j$th value is changed to lie farther from $bar X,$ then both $(X_j -bar X)^2$ and $|X_j -bar X|$ will increase. Because of the squaring, the former may change greatly.
    – BruceET
    Nov 23 at 23:23
















The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
– BruceET
Nov 21 at 21:44




The terminology 'mean absolute deviation' seems to have several definitions. For an exact answer, or for relevant specific examples, you should give the formula you are using for it. // Generally speaking, mean absolute deviation is defined to be less sensitive to outliers. So if you remove a central value and substitute an extreme outlier for it, you may see that both SD and MAD increase, but SD will likely show the greater increase
– BruceET
Nov 21 at 21:44












What is your definition of MAD?
– BruceET
Nov 23 at 19:22




What is your definition of MAD?
– BruceET
Nov 23 at 19:22












if a is the mean of data set x_i, then sum over $|(x_i - a)|$ / n
– Avinash Bhawnani
Nov 23 at 19:28






if a is the mean of data set x_i, then sum over $|(x_i - a)|$ / n
– Avinash Bhawnani
Nov 23 at 19:28














I have posted an Addendum to my Answer illustrating your definition of MAD. With this definition, moving any farther from the sample mean will increase both SD and MAD, but often the SD will change the most. If the $j$th value is changed to lie farther from $bar X,$ then both $(X_j -bar X)^2$ and $|X_j -bar X|$ will increase. Because of the squaring, the former may change greatly.
– BruceET
Nov 23 at 23:23




I have posted an Addendum to my Answer illustrating your definition of MAD. With this definition, moving any farther from the sample mean will increase both SD and MAD, but often the SD will change the most. If the $j$th value is changed to lie farther from $bar X,$ then both $(X_j -bar X)^2$ and $|X_j -bar X|$ will increase. Because of the squaring, the former may change greatly.
– BruceET
Nov 23 at 23:23










1 Answer
1






active

oldest

votes

















up vote
0
down vote













Here is an example using the definition of MAD implemented in R statistical
software: For the sample $X_i, dots, X_n,$
$$text{MAD} = 1.4826,text{Med}(|X_i - H|).$$
where $H$ is the median of the sample, and the constant multiple is intended
to put values on a scale so that MAD and sample standard deviation $S$ are
roughly comparable for large normal samples. So according to this definition
MAD is based on the Median of the absolute differences from the sample median.



Here is a sample of size $n = 20$ from $mathsf{Norm}(mu=100, sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.



x = rnorm(20, 100, 15)
summary(x); sd(x); mad(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 98.67 111.50 138.14
## 19.50935
## 20.83691

boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")


enter image description here



So the two values are roughly the same. Now I sort the data, choose the largest value, and replace it by the outlier 200.



x.sort = sort(x);  x.20 = x.sort[20];  x.20
## 138.1427
x.sort[20] = 200; x.sort[20]
## 200
summary(x.sort); sd(x.sort); mad(x.sort)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 101.77 111.50 200.00
## 28.79103
## 20.83691

boxplot(x.sort, horizontal=T, col="skyblue2", pch=20,
main="Boxplot of Modified Sample")


enter image description here



Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77).
Also, the MAD was not increased (20.83691 before and after), but the sample SD
has increased noticeably (roughly, from 19.5 to 28.8).



One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.





Addendum using your definition of MAD (mean absolute deviation from sample mean). This is not as 'robust' a definition, but it works in
somewhat the same way as the one I used above. No figures this time.
Changes in R code: I have to use my own code to get this MAD, set.seed statement will allow you to get exactly the same sample of size 20
as I used (if you try this on your own in R). Original data is x, data with one value changed to
outlier 200 to get altered data y.



set.seed(1123)
x = rnorm(20, 100, 15)
summary(x); mean(x); sd(x); mean(abs(x-mean(a)))
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.74 89.17 101.88 101.81 112.51 128.71
## 101.8078 # sample mean
## 13.70151 # sample SD
## 11.19836 # sample MAD

x.sort = sort(x); x.20 = x.sort[20]; x.20
## 128.7068
x.sort[20] = 200; x.sort[20]; y=x.sort
## 200 # 128.71 changed to 200
summary(y); a = mean(y); s = sd(y); s
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.74 89.17 101.88 105.37 112.51 200.00
## 25.37187 # new sample SE
MAD = mean(abs(y-a)); MAD
## 15.07735 # new sample MAD


Original data x has sample mean $bar X = 101.81,$ sample
SD $S_x = 13.70,$ $text{MAD}_x = 11.2.$



Altered data y has sample mean $bar Y = 105.37,$ sample
SD $S_y = 25.37,$ $text{MAD}_y = 15.1.$



So the alteration makes a large difference in the sample SD
and relatively little difference in MAD (according to your definition).






share|cite|improve this answer























  • but still both are going in the positive direction only, is it possible for one to go positive and other to negative?
    – Avinash Bhawnani
    Nov 25 at 2:17






  • 1




    Generally speaking if you alter a dataset to increase its range, both SD and MAD (either definition) will also increase. However, I guess it may be possible to contrive a situation in which one or both of the MADs would decrease by a little. I could be wrong, but I don't think one could prove a theorem that (the median) MAD always increases.
    – BruceET
    Nov 25 at 2:22












  • yeah, actually from hunch i had the same result but i was in no way able to prove it. I then took a random data set of 3 or 4 values(I don't rememember) with a variable x and found an intriguing relation(the point where mad starts increasing is the same as where sd stops decreasing) but was not able to prove generally
    – Avinash Bhawnani
    Nov 25 at 10:37











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3006958%2fif-we-change-only-one-value-of-a-data-set-will-the-mean-absolute-deviation-beha%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote













Here is an example using the definition of MAD implemented in R statistical
software: For the sample $X_i, dots, X_n,$
$$text{MAD} = 1.4826,text{Med}(|X_i - H|).$$
where $H$ is the median of the sample, and the constant multiple is intended
to put values on a scale so that MAD and sample standard deviation $S$ are
roughly comparable for large normal samples. So according to this definition
MAD is based on the Median of the absolute differences from the sample median.



Here is a sample of size $n = 20$ from $mathsf{Norm}(mu=100, sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.



x = rnorm(20, 100, 15)
summary(x); sd(x); mad(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 98.67 111.50 138.14
## 19.50935
## 20.83691

boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")


enter image description here



So the two values are roughly the same. Now I sort the data, choose the largest value, and replace it by the outlier 200.



x.sort = sort(x);  x.20 = x.sort[20];  x.20
## 138.1427
x.sort[20] = 200; x.sort[20]
## 200
summary(x.sort); sd(x.sort); mad(x.sort)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 101.77 111.50 200.00
## 28.79103
## 20.83691

boxplot(x.sort, horizontal=T, col="skyblue2", pch=20,
main="Boxplot of Modified Sample")


enter image description here



Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77).
Also, the MAD was not increased (20.83691 before and after), but the sample SD
has increased noticeably (roughly, from 19.5 to 28.8).



One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.





Addendum using your definition of MAD (mean absolute deviation from sample mean). This is not as 'robust' a definition, but it works in
somewhat the same way as the one I used above. No figures this time.
Changes in R code: I have to use my own code to get this MAD, set.seed statement will allow you to get exactly the same sample of size 20
as I used (if you try this on your own in R). Original data is x, data with one value changed to
outlier 200 to get altered data y.



set.seed(1123)
x = rnorm(20, 100, 15)
summary(x); mean(x); sd(x); mean(abs(x-mean(a)))
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.74 89.17 101.88 101.81 112.51 128.71
## 101.8078 # sample mean
## 13.70151 # sample SD
## 11.19836 # sample MAD

x.sort = sort(x); x.20 = x.sort[20]; x.20
## 128.7068
x.sort[20] = 200; x.sort[20]; y=x.sort
## 200 # 128.71 changed to 200
summary(y); a = mean(y); s = sd(y); s
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.74 89.17 101.88 105.37 112.51 200.00
## 25.37187 # new sample SE
MAD = mean(abs(y-a)); MAD
## 15.07735 # new sample MAD


Original data x has sample mean $bar X = 101.81,$ sample
SD $S_x = 13.70,$ $text{MAD}_x = 11.2.$



Altered data y has sample mean $bar Y = 105.37,$ sample
SD $S_y = 25.37,$ $text{MAD}_y = 15.1.$



So the alteration makes a large difference in the sample SD
and relatively little difference in MAD (according to your definition).






share|cite|improve this answer























  • but still both are going in the positive direction only, is it possible for one to go positive and other to negative?
    – Avinash Bhawnani
    Nov 25 at 2:17






  • 1




    Generally speaking if you alter a dataset to increase its range, both SD and MAD (either definition) will also increase. However, I guess it may be possible to contrive a situation in which one or both of the MADs would decrease by a little. I could be wrong, but I don't think one could prove a theorem that (the median) MAD always increases.
    – BruceET
    Nov 25 at 2:22












  • yeah, actually from hunch i had the same result but i was in no way able to prove it. I then took a random data set of 3 or 4 values(I don't rememember) with a variable x and found an intriguing relation(the point where mad starts increasing is the same as where sd stops decreasing) but was not able to prove generally
    – Avinash Bhawnani
    Nov 25 at 10:37















up vote
0
down vote













Here is an example using the definition of MAD implemented in R statistical
software: For the sample $X_i, dots, X_n,$
$$text{MAD} = 1.4826,text{Med}(|X_i - H|).$$
where $H$ is the median of the sample, and the constant multiple is intended
to put values on a scale so that MAD and sample standard deviation $S$ are
roughly comparable for large normal samples. So according to this definition
MAD is based on the Median of the absolute differences from the sample median.



Here is a sample of size $n = 20$ from $mathsf{Norm}(mu=100, sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.



x = rnorm(20, 100, 15)
summary(x); sd(x); mad(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 98.67 111.50 138.14
## 19.50935
## 20.83691

boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")


enter image description here



So the two values are roughly the same. Now I sort the data, choose the largest value, and replace it by the outlier 200.



x.sort = sort(x);  x.20 = x.sort[20];  x.20
## 138.1427
x.sort[20] = 200; x.sort[20]
## 200
summary(x.sort); sd(x.sort); mad(x.sort)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 101.77 111.50 200.00
## 28.79103
## 20.83691

boxplot(x.sort, horizontal=T, col="skyblue2", pch=20,
main="Boxplot of Modified Sample")


enter image description here



Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77).
Also, the MAD was not increased (20.83691 before and after), but the sample SD
has increased noticeably (roughly, from 19.5 to 28.8).



One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.





Addendum using your definition of MAD (mean absolute deviation from sample mean). This is not as 'robust' a definition, but it works in
somewhat the same way as the one I used above. No figures this time.
Changes in R code: I have to use my own code to get this MAD, set.seed statement will allow you to get exactly the same sample of size 20
as I used (if you try this on your own in R). Original data is x, data with one value changed to
outlier 200 to get altered data y.



set.seed(1123)
x = rnorm(20, 100, 15)
summary(x); mean(x); sd(x); mean(abs(x-mean(a)))
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.74 89.17 101.88 101.81 112.51 128.71
## 101.8078 # sample mean
## 13.70151 # sample SD
## 11.19836 # sample MAD

x.sort = sort(x); x.20 = x.sort[20]; x.20
## 128.7068
x.sort[20] = 200; x.sort[20]; y=x.sort
## 200 # 128.71 changed to 200
summary(y); a = mean(y); s = sd(y); s
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.74 89.17 101.88 105.37 112.51 200.00
## 25.37187 # new sample SE
MAD = mean(abs(y-a)); MAD
## 15.07735 # new sample MAD


Original data x has sample mean $bar X = 101.81,$ sample
SD $S_x = 13.70,$ $text{MAD}_x = 11.2.$



Altered data y has sample mean $bar Y = 105.37,$ sample
SD $S_y = 25.37,$ $text{MAD}_y = 15.1.$



So the alteration makes a large difference in the sample SD
and relatively little difference in MAD (according to your definition).






share|cite|improve this answer























  • but still both are going in the positive direction only, is it possible for one to go positive and other to negative?
    – Avinash Bhawnani
    Nov 25 at 2:17






  • 1




    Generally speaking if you alter a dataset to increase its range, both SD and MAD (either definition) will also increase. However, I guess it may be possible to contrive a situation in which one or both of the MADs would decrease by a little. I could be wrong, but I don't think one could prove a theorem that (the median) MAD always increases.
    – BruceET
    Nov 25 at 2:22












  • yeah, actually from hunch i had the same result but i was in no way able to prove it. I then took a random data set of 3 or 4 values(I don't rememember) with a variable x and found an intriguing relation(the point where mad starts increasing is the same as where sd stops decreasing) but was not able to prove generally
    – Avinash Bhawnani
    Nov 25 at 10:37













up vote
0
down vote










up vote
0
down vote









Here is an example using the definition of MAD implemented in R statistical
software: For the sample $X_i, dots, X_n,$
$$text{MAD} = 1.4826,text{Med}(|X_i - H|).$$
where $H$ is the median of the sample, and the constant multiple is intended
to put values on a scale so that MAD and sample standard deviation $S$ are
roughly comparable for large normal samples. So according to this definition
MAD is based on the Median of the absolute differences from the sample median.



Here is a sample of size $n = 20$ from $mathsf{Norm}(mu=100, sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.



x = rnorm(20, 100, 15)
summary(x); sd(x); mad(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 98.67 111.50 138.14
## 19.50935
## 20.83691

boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")


enter image description here



So the two values are roughly the same. Now I sort the data, choose the largest value, and replace it by the outlier 200.



x.sort = sort(x);  x.20 = x.sort[20];  x.20
## 138.1427
x.sort[20] = 200; x.sort[20]
## 200
summary(x.sort); sd(x.sort); mad(x.sort)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 101.77 111.50 200.00
## 28.79103
## 20.83691

boxplot(x.sort, horizontal=T, col="skyblue2", pch=20,
main="Boxplot of Modified Sample")


enter image description here



Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77).
Also, the MAD was not increased (20.83691 before and after), but the sample SD
has increased noticeably (roughly, from 19.5 to 28.8).



One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.





Addendum using your definition of MAD (mean absolute deviation from sample mean). This is not as 'robust' a definition, but it works in
somewhat the same way as the one I used above. No figures this time.
Changes in R code: I have to use my own code to get this MAD, set.seed statement will allow you to get exactly the same sample of size 20
as I used (if you try this on your own in R). Original data is x, data with one value changed to
outlier 200 to get altered data y.



set.seed(1123)
x = rnorm(20, 100, 15)
summary(x); mean(x); sd(x); mean(abs(x-mean(a)))
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.74 89.17 101.88 101.81 112.51 128.71
## 101.8078 # sample mean
## 13.70151 # sample SD
## 11.19836 # sample MAD

x.sort = sort(x); x.20 = x.sort[20]; x.20
## 128.7068
x.sort[20] = 200; x.sort[20]; y=x.sort
## 200 # 128.71 changed to 200
summary(y); a = mean(y); s = sd(y); s
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.74 89.17 101.88 105.37 112.51 200.00
## 25.37187 # new sample SE
MAD = mean(abs(y-a)); MAD
## 15.07735 # new sample MAD


Original data x has sample mean $bar X = 101.81,$ sample
SD $S_x = 13.70,$ $text{MAD}_x = 11.2.$



Altered data y has sample mean $bar Y = 105.37,$ sample
SD $S_y = 25.37,$ $text{MAD}_y = 15.1.$



So the alteration makes a large difference in the sample SD
and relatively little difference in MAD (according to your definition).






share|cite|improve this answer














Here is an example using the definition of MAD implemented in R statistical
software: For the sample $X_i, dots, X_n,$
$$text{MAD} = 1.4826,text{Med}(|X_i - H|).$$
where $H$ is the median of the sample, and the constant multiple is intended
to put values on a scale so that MAD and sample standard deviation $S$ are
roughly comparable for large normal samples. So according to this definition
MAD is based on the Median of the absolute differences from the sample median.



Here is a sample of size $n = 20$ from $mathsf{Norm}(mu=100, sigma=15),$ along with its SD, R's version of the MAD, and a boxplot.



x = rnorm(20, 100, 15)
summary(x); sd(x); mad(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 98.67 111.50 138.14
## 19.50935
## 20.83691

boxplot(x, horizontal=T, col="skyblue2", main="Boxplot of Original Sample")


enter image description here



So the two values are roughly the same. Now I sort the data, choose the largest value, and replace it by the outlier 200.



x.sort = sort(x);  x.20 = x.sort[20];  x.20
## 138.1427
x.sort[20] = 200; x.sort[20]
## 200
summary(x.sort); sd(x.sort); mad(x.sort)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.01 84.13 98.49 101.77 111.50 200.00
## 28.79103
## 20.83691

boxplot(x.sort, horizontal=T, col="skyblue2", pch=20,
main="Boxplot of Modified Sample")


enter image description here



Notice that making this substitution has not changed the sample median (98.49 before and after) and noticeably increased the sample mean (from 98.67 to 101.77).
Also, the MAD was not increased (20.83691 before and after), but the sample SD
has increased noticeably (roughly, from 19.5 to 28.8).



One says that the sample median is a robust measure of the center of a sample and that the sample MAD is a robust measure of the dispersion of a sample.





Addendum using your definition of MAD (mean absolute deviation from sample mean). This is not as 'robust' a definition, but it works in
somewhat the same way as the one I used above. No figures this time.
Changes in R code: I have to use my own code to get this MAD, set.seed statement will allow you to get exactly the same sample of size 20
as I used (if you try this on your own in R). Original data is x, data with one value changed to
outlier 200 to get altered data y.



set.seed(1123)
x = rnorm(20, 100, 15)
summary(x); mean(x); sd(x); mean(abs(x-mean(a)))
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.74 89.17 101.88 101.81 112.51 128.71
## 101.8078 # sample mean
## 13.70151 # sample SD
## 11.19836 # sample MAD

x.sort = sort(x); x.20 = x.sort[20]; x.20
## 128.7068
x.sort[20] = 200; x.sort[20]; y=x.sort
## 200 # 128.71 changed to 200
summary(y); a = mean(y); s = sd(y); s
Min. 1st Qu. Median Mean 3rd Qu. Max.
81.74 89.17 101.88 105.37 112.51 200.00
## 25.37187 # new sample SE
MAD = mean(abs(y-a)); MAD
## 15.07735 # new sample MAD


Original data x has sample mean $bar X = 101.81,$ sample
SD $S_x = 13.70,$ $text{MAD}_x = 11.2.$



Altered data y has sample mean $bar Y = 105.37,$ sample
SD $S_y = 25.37,$ $text{MAD}_y = 15.1.$



So the alteration makes a large difference in the sample SD
and relatively little difference in MAD (according to your definition).







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Nov 23 at 22:51

























answered Nov 21 at 21:30









BruceET

34.9k71440




34.9k71440












  • but still both are going in the positive direction only, is it possible for one to go positive and other to negative?
    – Avinash Bhawnani
    Nov 25 at 2:17






  • 1




    Generally speaking if you alter a dataset to increase its range, both SD and MAD (either definition) will also increase. However, I guess it may be possible to contrive a situation in which one or both of the MADs would decrease by a little. I could be wrong, but I don't think one could prove a theorem that (the median) MAD always increases.
    – BruceET
    Nov 25 at 2:22












  • yeah, actually from hunch i had the same result but i was in no way able to prove it. I then took a random data set of 3 or 4 values(I don't rememember) with a variable x and found an intriguing relation(the point where mad starts increasing is the same as where sd stops decreasing) but was not able to prove generally
    – Avinash Bhawnani
    Nov 25 at 10:37


















  • but still both are going in the positive direction only, is it possible for one to go positive and other to negative?
    – Avinash Bhawnani
    Nov 25 at 2:17






  • 1




    Generally speaking if you alter a dataset to increase its range, both SD and MAD (either definition) will also increase. However, I guess it may be possible to contrive a situation in which one or both of the MADs would decrease by a little. I could be wrong, but I don't think one could prove a theorem that (the median) MAD always increases.
    – BruceET
    Nov 25 at 2:22












  • yeah, actually from hunch i had the same result but i was in no way able to prove it. I then took a random data set of 3 or 4 values(I don't rememember) with a variable x and found an intriguing relation(the point where mad starts increasing is the same as where sd stops decreasing) but was not able to prove generally
    – Avinash Bhawnani
    Nov 25 at 10:37
















but still both are going in the positive direction only, is it possible for one to go positive and other to negative?
– Avinash Bhawnani
Nov 25 at 2:17




but still both are going in the positive direction only, is it possible for one to go positive and other to negative?
– Avinash Bhawnani
Nov 25 at 2:17




1




1




Generally speaking if you alter a dataset to increase its range, both SD and MAD (either definition) will also increase. However, I guess it may be possible to contrive a situation in which one or both of the MADs would decrease by a little. I could be wrong, but I don't think one could prove a theorem that (the median) MAD always increases.
– BruceET
Nov 25 at 2:22






Generally speaking if you alter a dataset to increase its range, both SD and MAD (either definition) will also increase. However, I guess it may be possible to contrive a situation in which one or both of the MADs would decrease by a little. I could be wrong, but I don't think one could prove a theorem that (the median) MAD always increases.
– BruceET
Nov 25 at 2:22














yeah, actually from hunch i had the same result but i was in no way able to prove it. I then took a random data set of 3 or 4 values(I don't rememember) with a variable x and found an intriguing relation(the point where mad starts increasing is the same as where sd stops decreasing) but was not able to prove generally
– Avinash Bhawnani
Nov 25 at 10:37




yeah, actually from hunch i had the same result but i was in no way able to prove it. I then took a random data set of 3 or 4 values(I don't rememember) with a variable x and found an intriguing relation(the point where mad starts increasing is the same as where sd stops decreasing) but was not able to prove generally
– Avinash Bhawnani
Nov 25 at 10:37


















draft saved

draft discarded




















































Thanks for contributing an answer to Mathematics Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3006958%2fif-we-change-only-one-value-of-a-data-set-will-the-mean-absolute-deviation-beha%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Bundesstraße 106

Verónica Boquete

Ida-Boy-Ed-Garten