Are over-dispersion tests in GLMs actually *useful*?

Are over-dispersion tests in GLMs actually useful?

The phenomenon of 'over-dispersion' in a GLM arises whenever we use a model that restricts the variance of the response variable, and the data exhibits greater variance than the model restriction allows. This occurs commonly when modelling count data using a Poisson GLM, and it can be diagnosed by well-known tests. If tests show that there is statistically significant evidence of over-dispersion then we usually generalise the model by using a broader family of distributions that free the variance parameter from the restriction occurring under the original model. In the case of a Poisson GLM it is common to generalise either to a negative-binomial or quasi-Poisson GLM.

This situation is pregnant with an obvious objection. Why start with a Poisson GLM at all? One can start directly with the broader distributional forms, which have a (relatively) free variance parameter, and allow the variance parameter to be fit to the data, ignoring over-dispersion tests completely. In other situations when we are doing data analysis we almost always use distributional forms that allow freedom of at least the first two-moments, so why make an exception here?

My Question: Is there any good reason to start with a distribution that fixes the variance (e.g., the Poisson distribution) and then perform an over-dispersion test? How does this procedure compare with skipping this exercise completely and going straight to the more general models (e.g., negative-binomial, quasi-Poisson, etc.)? In other words, why not always use a distribution with a free variance parameter?

asked 2 hours ago

Ben

24.9k226117

$begingroup$
my guess is that, if the underlying truly is poisson, then your glm result will not exhibit those well known-good properties like estimates also being efficient in the sense of the variance of the estimates being greater than it needs to be, if the correct model had been used. Estimates are probably not even unbiased or MLE's. But that's just my intuition and I could be wrong. I'd be curious what a good answer is.
$endgroup$
– mlofton
2 hours ago

1

$begingroup$
In my experience, testing for over-dispersion is (paradoxically) mainly of use when you know (from a knowledge of the data generation process) that over-dispersion can't be present. In this context, testing for over-dispersion tells you whether the linear model is picking up all the signal in the data. If it isn't, then adding more covariates to the model should be considered. If it is, then more covariates cannot help.
$endgroup$
– Gordon Smyth
1 hour ago

$begingroup$
@GordonSmyth: I think that's a good answer. If you don't want to turn that into its own answer, I'll fold it into mine.
$endgroup$
– Cliff AB
1 hour ago

$begingroup$
@CliffAB Feel free to incorporate my comment into your answer as I don't have time to compose a full answer myself.
$endgroup$
– Gordon Smyth
42 mins ago

add a comment |

asked 2 hours ago

Ben

24.9k226117

$begingroup$
my guess is that, if the underlying truly is poisson, then your glm result will not exhibit those well known-good properties like estimates also being efficient in the sense of the variance of the estimates being greater than it needs to be, if the correct model had been used. Estimates are probably not even unbiased or MLE's. But that's just my intuition and I could be wrong. I'd be curious what a good answer is.
$endgroup$
– mlofton
2 hours ago

1

$begingroup$
In my experience, testing for over-dispersion is (paradoxically) mainly of use when you know (from a knowledge of the data generation process) that over-dispersion can't be present. In this context, testing for over-dispersion tells you whether the linear model is picking up all the signal in the data. If it isn't, then adding more covariates to the model should be considered. If it is, then more covariates cannot help.
$endgroup$
– Gordon Smyth
1 hour ago

$begingroup$
@GordonSmyth: I think that's a good answer. If you don't want to turn that into its own answer, I'll fold it into mine.
$endgroup$
– Cliff AB
1 hour ago

$begingroup$
@CliffAB Feel free to incorporate my comment into your answer as I don't have time to compose a full answer myself.
$endgroup$
– Gordon Smyth
42 mins ago

add a comment |

asked 2 hours ago

Ben

24.9k226117

overdispersion

asked 2 hours ago

Ben

24.9k226117

asked 2 hours ago

Ben

24.9k226117

asked 2 hours ago

Ben

24.9k226117

asked 2 hours ago

Ben

24.9k226117

asked 2 hours ago

Ben

24.9k226117

$begingroup$
my guess is that, if the underlying truly is poisson, then your glm result will not exhibit those well known-good properties like estimates also being efficient in the sense of the variance of the estimates being greater than it needs to be, if the correct model had been used. Estimates are probably not even unbiased or MLE's. But that's just my intuition and I could be wrong. I'd be curious what a good answer is.
$endgroup$
– mlofton
2 hours ago

1

$begingroup$
In my experience, testing for over-dispersion is (paradoxically) mainly of use when you know (from a knowledge of the data generation process) that over-dispersion can't be present. In this context, testing for over-dispersion tells you whether the linear model is picking up all the signal in the data. If it isn't, then adding more covariates to the model should be considered. If it is, then more covariates cannot help.
$endgroup$
– Gordon Smyth
1 hour ago

$begingroup$
@GordonSmyth: I think that's a good answer. If you don't want to turn that into its own answer, I'll fold it into mine.
$endgroup$
– Cliff AB
1 hour ago

$begingroup$
@CliffAB Feel free to incorporate my comment into your answer as I don't have time to compose a full answer myself.
$endgroup$
– Gordon Smyth
42 mins ago

add a comment |

$begingroup$
my guess is that, if the underlying truly is poisson, then your glm result will not exhibit those well known-good properties like estimates also being efficient in the sense of the variance of the estimates being greater than it needs to be, if the correct model had been used. Estimates are probably not even unbiased or MLE's. But that's just my intuition and I could be wrong. I'd be curious what a good answer is.
$endgroup$
– mlofton
2 hours ago

1

$begingroup$
In my experience, testing for over-dispersion is (paradoxically) mainly of use when you know (from a knowledge of the data generation process) that over-dispersion can't be present. In this context, testing for over-dispersion tells you whether the linear model is picking up all the signal in the data. If it isn't, then adding more covariates to the model should be considered. If it is, then more covariates cannot help.
$endgroup$
– Gordon Smyth
1 hour ago

$begingroup$
@GordonSmyth: I think that's a good answer. If you don't want to turn that into its own answer, I'll fold it into mine.
$endgroup$
– Cliff AB
1 hour ago

$begingroup$
@CliffAB Feel free to incorporate my comment into your answer as I don't have time to compose a full answer myself.
$endgroup$
– Gordon Smyth
42 mins ago

my guess is that, if the underlying truly is poisson, then your glm result will not exhibit those well known-good properties like estimates also being efficient in the sense of the variance of the estimates being greater than it needs to be, if the correct model had been used. Estimates are probably not even unbiased or MLE's. But that's just my intuition and I could be wrong. I'd be curious what a good answer is.

– mlofton
2 hours ago

In my experience, testing for over-dispersion is (paradoxically) mainly of use when you know (from a knowledge of the data generation process) that over-dispersion can't be present. In this context, testing for over-dispersion tells you whether the linear model is picking up all the signal in the data. If it isn't, then adding more covariates to the model should be considered. If it is, then more covariates cannot help.

– Gordon Smyth
1 hour ago

@GordonSmyth: I think that's a good answer. If you don't want to turn that into its own answer, I'll fold it into mine.

– Cliff AB
1 hour ago

@CliffAB Feel free to incorporate my comment into your answer as I don't have time to compose a full answer myself.

– Gordon Smyth
42 mins ago

add a comment |

1 Answer
1

active

oldest

votes

In principle, I actually agree that 99% of the time, it's better to just use the more flexible model. With that said, here are two and a half arguments for why you might not.

(1) Less flexible means more efficient estimates. Given that variance parameters tend to be less stable than mean parameters, your assumption of fixed mean-variance relation may stabilize standard errors more.

(2) Model checking. I've worked with physicists who believe that various measurements can be described by Poisson distributions due to theoretical physics. If we reject the hypothesis that mean = variance, we have evidence against the Poisson distribution hypothesis. As pointed out in a comment by @GordonSmyth, if you have reason to believe that a given measurement should follow a Poisson distribution, if you have evidence of over dispersion, you have evidence that you are missing important factors.

(2.5) Proper distribution. While the negative binomial regression comes from a valid statistical distribution, it's my understanding that the Quasi-Poisson does not. That means you can't really simulate count data if you believe $Var[y] = alpha E[y]$ for $alpha neq 1$. That might be annoying for some use cases. Likewise, you can't use probabilities to test for outliers, etc.

edited 16 mins ago

answered 2 hours ago

Cliff AB

12.8k12363

$begingroup$
On 2.5: There's of course negative binomial and GLMM with random effects that don't have that limitation.
$endgroup$
– Björn
24 mins ago

$begingroup$
@Björn: that's why it's only half an argument; only applies to Quasi-Likelihood methods. As far as I know, there are no likelihood based methods for under dispersion, even though this can be analyzed with a Quasi-Likelihood model.
$endgroup$
– Cliff AB
19 mins ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f392591%2fare-over-dispersion-tests-in-glms-actually-useful%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

In principle, I actually agree that 99% of the time, it's better to just use the more flexible model. With that said, here are two and a half arguments for why you might not.

edited 16 mins ago

answered 2 hours ago

Cliff AB

12.8k12363

$begingroup$
On 2.5: There's of course negative binomial and GLMM with random effects that don't have that limitation.
$endgroup$
– Björn
24 mins ago

$begingroup$
@Björn: that's why it's only half an argument; only applies to Quasi-Likelihood methods. As far as I know, there are no likelihood based methods for under dispersion, even though this can be analyzed with a Quasi-Likelihood model.
$endgroup$
– Cliff AB
19 mins ago

add a comment |

In principle, I actually agree that 99% of the time, it's better to just use the more flexible model. With that said, here are two and a half arguments for why you might not.

edited 16 mins ago

answered 2 hours ago

Cliff AB

12.8k12363

$begingroup$
On 2.5: There's of course negative binomial and GLMM with random effects that don't have that limitation.
$endgroup$
– Björn
24 mins ago

$begingroup$
@Björn: that's why it's only half an argument; only applies to Quasi-Likelihood methods. As far as I know, there are no likelihood based methods for under dispersion, even though this can be analyzed with a Quasi-Likelihood model.
$endgroup$
– Cliff AB
19 mins ago

add a comment |

In principle, I actually agree that 99% of the time, it's better to just use the more flexible model. With that said, here are two and a half arguments for why you might not.

edited 16 mins ago

answered 2 hours ago

Cliff AB

12.8k12363

In principle, I actually agree that 99% of the time, it's better to just use the more flexible model. With that said, here are two and a half arguments for why you might not.

edited 16 mins ago

answered 2 hours ago

Cliff AB

12.8k12363

edited 16 mins ago

answered 2 hours ago

Cliff AB

12.8k12363

answered 2 hours ago

Cliff AB

12.8k12363

answered 2 hours ago

Cliff AB

12.8k12363

$begingroup$
On 2.5: There's of course negative binomial and GLMM with random effects that don't have that limitation.
$endgroup$
– Björn
24 mins ago

$begingroup$
@Björn: that's why it's only half an argument; only applies to Quasi-Likelihood methods. As far as I know, there are no likelihood based methods for under dispersion, even though this can be analyzed with a Quasi-Likelihood model.
$endgroup$
– Cliff AB
19 mins ago

add a comment |

$begingroup$
On 2.5: There's of course negative binomial and GLMM with random effects that don't have that limitation.
$endgroup$
– Björn
24 mins ago

$begingroup$
@Björn: that's why it's only half an argument; only applies to Quasi-Likelihood methods. As far as I know, there are no likelihood based methods for under dispersion, even though this can be analyzed with a Quasi-Likelihood model.
$endgroup$
– Cliff AB
19 mins ago

On 2.5: There's of course negative binomial and GLMM with random effects that don't have that limitation.

– Björn
24 mins ago

@Björn: that's why it's only half an argument; only applies to Quasi-Likelihood methods. As far as I know, there are no likelihood based methods for under dispersion, even though this can be analyzed with a Quasi-Likelihood model.

– Cliff AB
19 mins ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vrftsjtryk