Significance contradiction in linear regression: significant t-test for a coefficient vs non-significant...
up vote
34
down vote
favorite
I'm fitting a multiple linear regression model between 4 categorical variables (with 4 levels each) and a numerical output. My dataset has 43 observations.
Regression gives me the following $p$-values from the $t$-test for every slope coefficient: $.15, .67, .27, .02$. Thus, the coefficient for the 4th predictor is significant at $alpha = .05$ confidence level.
On the other hand, the regression gives me a $p$-value from an overall $F$-test of the null hypothesis that all my slope coefficients are equal to zero. For my dataset, this $p$-value is $.11$.
My question: how should I interpret these results? Which $p$-value should I use and why? Is the coefficient for the 4th variable significantly different from $0$ at the $alpha = .05$ confidence level?
I've seen a related question, $F$ and $t$ statistics in a regression, but there was an opposite situation: high $t$-test $p$-values and low $F$-test $p$-value. Honestly, I don't quite understand why we would need an $F$-test in addition to a $t$-test to see if linear regression coefficients are significantly different from zero.
regression hypothesis-testing multiple-comparisons multiple-regression t-test
add a comment |
up vote
34
down vote
favorite
I'm fitting a multiple linear regression model between 4 categorical variables (with 4 levels each) and a numerical output. My dataset has 43 observations.
Regression gives me the following $p$-values from the $t$-test for every slope coefficient: $.15, .67, .27, .02$. Thus, the coefficient for the 4th predictor is significant at $alpha = .05$ confidence level.
On the other hand, the regression gives me a $p$-value from an overall $F$-test of the null hypothesis that all my slope coefficients are equal to zero. For my dataset, this $p$-value is $.11$.
My question: how should I interpret these results? Which $p$-value should I use and why? Is the coefficient for the 4th variable significantly different from $0$ at the $alpha = .05$ confidence level?
I've seen a related question, $F$ and $t$ statistics in a regression, but there was an opposite situation: high $t$-test $p$-values and low $F$-test $p$-value. Honestly, I don't quite understand why we would need an $F$-test in addition to a $t$-test to see if linear regression coefficients are significantly different from zero.
regression hypothesis-testing multiple-comparisons multiple-regression t-test
1
If you have 4 categorical variables with 4 levels each, you should have 3*4=12 coefficients for your independent variables (plus the intercept)...
– boscovich
Mar 15 '12 at 20:01
@andrea: I've decided to treat them as numerical variables.
– Leo
Mar 15 '12 at 20:12
3
0.02 is barely significant (especially if you consider the fact that you have five tests in total) and 0.11 is not very high. A generous interpretation would be that with a little more power the overall F-test would also be significant (and perhaps the first coefficient as well). A more conservative interpretation is that you shouldn't have much confidence in any of these results (including the coefficient with a .02 p value). Either way, you shouldn't read too much in the difference between .02 and .11.
– Gala
Mar 15 '12 at 20:34
2
For a discussion of the opposite case, you can also see here: how can a regression be significant yet all predictors be non-significant, in addition to the question linked above.
– gung♦
Sep 13 '12 at 15:17
add a comment |
up vote
34
down vote
favorite
up vote
34
down vote
favorite
I'm fitting a multiple linear regression model between 4 categorical variables (with 4 levels each) and a numerical output. My dataset has 43 observations.
Regression gives me the following $p$-values from the $t$-test for every slope coefficient: $.15, .67, .27, .02$. Thus, the coefficient for the 4th predictor is significant at $alpha = .05$ confidence level.
On the other hand, the regression gives me a $p$-value from an overall $F$-test of the null hypothesis that all my slope coefficients are equal to zero. For my dataset, this $p$-value is $.11$.
My question: how should I interpret these results? Which $p$-value should I use and why? Is the coefficient for the 4th variable significantly different from $0$ at the $alpha = .05$ confidence level?
I've seen a related question, $F$ and $t$ statistics in a regression, but there was an opposite situation: high $t$-test $p$-values and low $F$-test $p$-value. Honestly, I don't quite understand why we would need an $F$-test in addition to a $t$-test to see if linear regression coefficients are significantly different from zero.
regression hypothesis-testing multiple-comparisons multiple-regression t-test
I'm fitting a multiple linear regression model between 4 categorical variables (with 4 levels each) and a numerical output. My dataset has 43 observations.
Regression gives me the following $p$-values from the $t$-test for every slope coefficient: $.15, .67, .27, .02$. Thus, the coefficient for the 4th predictor is significant at $alpha = .05$ confidence level.
On the other hand, the regression gives me a $p$-value from an overall $F$-test of the null hypothesis that all my slope coefficients are equal to zero. For my dataset, this $p$-value is $.11$.
My question: how should I interpret these results? Which $p$-value should I use and why? Is the coefficient for the 4th variable significantly different from $0$ at the $alpha = .05$ confidence level?
I've seen a related question, $F$ and $t$ statistics in a regression, but there was an opposite situation: high $t$-test $p$-values and low $F$-test $p$-value. Honestly, I don't quite understand why we would need an $F$-test in addition to a $t$-test to see if linear regression coefficients are significantly different from zero.
regression hypothesis-testing multiple-comparisons multiple-regression t-test
regression hypothesis-testing multiple-comparisons multiple-regression t-test
edited Oct 18 at 17:54
ttnphns
37.8k13136317
37.8k13136317
asked Mar 15 '12 at 19:56
Leo
1,32931527
1,32931527
1
If you have 4 categorical variables with 4 levels each, you should have 3*4=12 coefficients for your independent variables (plus the intercept)...
– boscovich
Mar 15 '12 at 20:01
@andrea: I've decided to treat them as numerical variables.
– Leo
Mar 15 '12 at 20:12
3
0.02 is barely significant (especially if you consider the fact that you have five tests in total) and 0.11 is not very high. A generous interpretation would be that with a little more power the overall F-test would also be significant (and perhaps the first coefficient as well). A more conservative interpretation is that you shouldn't have much confidence in any of these results (including the coefficient with a .02 p value). Either way, you shouldn't read too much in the difference between .02 and .11.
– Gala
Mar 15 '12 at 20:34
2
For a discussion of the opposite case, you can also see here: how can a regression be significant yet all predictors be non-significant, in addition to the question linked above.
– gung♦
Sep 13 '12 at 15:17
add a comment |
1
If you have 4 categorical variables with 4 levels each, you should have 3*4=12 coefficients for your independent variables (plus the intercept)...
– boscovich
Mar 15 '12 at 20:01
@andrea: I've decided to treat them as numerical variables.
– Leo
Mar 15 '12 at 20:12
3
0.02 is barely significant (especially if you consider the fact that you have five tests in total) and 0.11 is not very high. A generous interpretation would be that with a little more power the overall F-test would also be significant (and perhaps the first coefficient as well). A more conservative interpretation is that you shouldn't have much confidence in any of these results (including the coefficient with a .02 p value). Either way, you shouldn't read too much in the difference between .02 and .11.
– Gala
Mar 15 '12 at 20:34
2
For a discussion of the opposite case, you can also see here: how can a regression be significant yet all predictors be non-significant, in addition to the question linked above.
– gung♦
Sep 13 '12 at 15:17
1
1
If you have 4 categorical variables with 4 levels each, you should have 3*4=12 coefficients for your independent variables (plus the intercept)...
– boscovich
Mar 15 '12 at 20:01
If you have 4 categorical variables with 4 levels each, you should have 3*4=12 coefficients for your independent variables (plus the intercept)...
– boscovich
Mar 15 '12 at 20:01
@andrea: I've decided to treat them as numerical variables.
– Leo
Mar 15 '12 at 20:12
@andrea: I've decided to treat them as numerical variables.
– Leo
Mar 15 '12 at 20:12
3
3
0.02 is barely significant (especially if you consider the fact that you have five tests in total) and 0.11 is not very high. A generous interpretation would be that with a little more power the overall F-test would also be significant (and perhaps the first coefficient as well). A more conservative interpretation is that you shouldn't have much confidence in any of these results (including the coefficient with a .02 p value). Either way, you shouldn't read too much in the difference between .02 and .11.
– Gala
Mar 15 '12 at 20:34
0.02 is barely significant (especially if you consider the fact that you have five tests in total) and 0.11 is not very high. A generous interpretation would be that with a little more power the overall F-test would also be significant (and perhaps the first coefficient as well). A more conservative interpretation is that you shouldn't have much confidence in any of these results (including the coefficient with a .02 p value). Either way, you shouldn't read too much in the difference between .02 and .11.
– Gala
Mar 15 '12 at 20:34
2
2
For a discussion of the opposite case, you can also see here: how can a regression be significant yet all predictors be non-significant, in addition to the question linked above.
– gung♦
Sep 13 '12 at 15:17
For a discussion of the opposite case, you can also see here: how can a regression be significant yet all predictors be non-significant, in addition to the question linked above.
– gung♦
Sep 13 '12 at 15:17
add a comment |
3 Answers
3
active
oldest
votes
up vote
36
down vote
I'm not sure that multicollinearity is what's going on here. It certainly could be, but from the information given I can't conclude that, and I don't want to start there. My first guess is that this might be a multiple comparisons issue. That is, if you run enough tests, something will show up, even if there's nothing there.
One of the issues that I harp on is that the problem of multiple comparisons is always discussed in terms of examining many pairwise comparisons—e.g., running t-tests on every unique pairing of levels. (For a humorous treatment of multiple comparisons, look here.) This leaves people with the impression that that is the only place this problem shows up. But this is simply not true—the problem of multiple comparisons shows up everywhere. For instance, if you run a regression with 4 explanatory variables, the same issues exist. In a well-designed experiment, IV's can be orthogonal, but people routinely worry about using Bonferroni corrections on sets of a-priori, orthogonal contrasts, and don't think twice about factorial ANOVA's. To my mind this is inconsistent.
The global F test is what's called a 'simultaneous' test. This checks to see if all of your predictors are unrelated to the response variable. The simultaneous test provides some protection against the problem of multiple comparisons without having to go the power-losing Bonferroni route. Unfortunately, my interpretation of what you report is that you have a null finding.
Several things mitigate against this interpretation. First, with only 43 data, you almost certainly don't have much power. It's quite possible that there is a real effect, but you just can't resolve it without more data. Second, like both @andrea and @Dimitriy, I worry about the appropriateness of treating 4-level categorical variables as numeric. This may well not be appropriate, and could have any number of effects, including diminishing your ability to detect what is really there. Lastly, I'm not sure that significance testing is quite as important as people believe. A $p$ of $.11$ is kind of low; is there really something going on there? maybe! who knows?—there's no 'bright line' at .05 that demarcates real effects from mere appearance.
add a comment |
up vote
23
down vote
I would like to suggest that this phenomenon (of a non-significant overall test despite a significant individual variable) can be understood as a kind of aggregate "masking effect" and that although it conceivably could arise from multicollinear explanatory variables, it need not do that at all. It also turns out not to be due to multiple comparison adjustments, either. Thus this answer is adding some qualifications to the answers that have already appeared, which on the contrary suggest that either multicollinearity or multiple comparisons should be looked at as the culprits.
To establish the plausibility of these assertions, let's generate a collection of perfectly orthogonal variables--just as non-collinear as possible--and a dependent variable that explicitly is determined solely by the first of the explanands (plus a good amount of random error independent of everything else). In R
this can be done (reproducibly, if you wish to experiment) as
set.seed(17)
p <- 5 # Number of explanatory variables
x <- as.matrix(do.call(expand.grid, lapply(as.list(1:p), function(i) c(-1,1))))
y <- x[,1] + rnorm(2^p, mean=0, sd=2)
It's unimportant that the explanatory variables are binary; what matters is their orthogonality, which we can check to make sure the code is working as expected, which can be done by inspecting their correlations. Indeed, the correlation matrix is interesting: the small coefficients suggest y
has little to do with any of the variables except the first (which is by design) and the off-diagonal zeros confirm the orthogonality of the explanatory variables:
> cor(cbind(x,y))
Var1 Var2 Var3 Var4 Var5 y
Var1 1.00 0.000 0.000 0.000 0.00 0.486
Var2 0.00 1.000 0.000 0.000 0.00 0.088
Var3 0.00 0.000 1.000 0.000 0.00 0.044
Var4 0.00 0.000 0.000 1.000 0.00 -0.014
Var5 0.00 0.000 0.000 0.000 1.00 -0.167
y 0.49 0.088 0.044 -0.014 -0.17 1.000
Let's run a series of regressions, using only the first variable, then the first two, and so on. For brevity and easy comparison, in each one I show only the line for the first variable and the overall F-test:
>temp <- sapply(1:p, function(i) print(summary(lm(y ~ x[, 1:i]))))
# Estimate Std. Error t value Pr(>|t|)
1 x[, 1:i] 0.898 0.294 3.05 0.0048 **
F-statistic: 9.29 on 1 and 30 DF, p-value: 0.00478
2 x[, 1:i]Var1 0.898 0.298 3.01 0.0053 **
F-statistic: 4.68 on 2 and 29 DF, p-value: 0.0173
3 x[, 1:i]Var1 0.8975 0.3029 2.96 0.0062 **
F-statistic: 3.05 on 3 and 28 DF, p-value: 0.0451
4 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0072 **
F-statistic: 2.21 on 4 and 27 DF, p-value: 0.095
5 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0073 **
F-statistic: 1.96 on 5 and 26 DF, p-value: 0.118
Look at how (a) the significance of the first variable barely changes, (a') the first variable remains significant (p < .05) even when adjusting for multiple comparisons (e.g., apply Bonferroni by multiplying the nominal p-value by the number of explanatory variables), (b) the coefficient of the first variable barely changes, but (c) the overall significance grows exponentially, quickly inflating to a non-significant level.
I interpret this as demonstrating that including explanatory variables that are largely independent of the dependent variable can "mask" the overall p-value of the regression. When the new variables are orthogonal to existing ones and to the dependent variable, they will not change the individual p-values. (The small changes seen here are because the random error added to y
is, by accident, slightly correlated with all the other variables.) One lesson to draw from this is that parsimony is valuable: using as few variables as needed can strengthen the significance of the results.
I am not saying that this is necessarily happening for the dataset in the question, about which little has been disclosed. But knowledge that this masking effect can happen should inform our interpretation of the results as well as our strategies for variable selection and model building.
+1, I agree w/ this analysis. FWIW, this is the explanation I was hinting at (perhaps not well) in my discussion about power in my answer to the other question. I do have 1 question about your version here, why do you use 32 as the mean of your error term? Is that a typo, or is it important in some way?
– gung♦
Aug 21 '12 at 20:59
@gung Where do you see 32? If you're referring tornorm(2^p, sd=2)
, please note that the first argument is the number of terms, not the mean. The mean by default is zero and therefore has not been explicitly specified.
– whuber♦
Aug 21 '12 at 21:23
Oh, sorry. I guess I was confusingrnorm()
w/ $mathcal N(mu, sigma)$.
– gung♦
Aug 21 '12 at 21:52
@gung I am grateful for the opportunity to clarify the code and therefore have edited the offending line.
– whuber♦
Aug 21 '12 at 21:58
add a comment |
up vote
11
down vote
You frequently have this happen when you have a high degree of collinearity among your explanatory variables. The ANOVA F is a joint test that all the regressors are jointly uninformative. When your Xs contain similar information, the model cannot attribute the explanatory power to one regressor or another, but their combination can explain much of the variation in the response variable.
Also, the fact that you seem to be treating you categorical variables as if they were continuous may be problematic. You are explicitly imposing restrictions like bumping $x_{1}$ from 1 to 2 has the same effect on $y$ as bumping it from 3 to 4. Sometime's that's OK, but often it's not.
If collinearity is a problem, then you will have high standard errors and perhaps implausibly large coefficients, maybe even with the wrong signs. To make sure that this is what is happening, calculate the variance inflation factors (VIFs) after your regression. A reasonable rule of thumb is that collinearity is a problem if the largest VIF is greater than 10. If so, you really have two options here. One is to re-specify the model to reduce the near-linear dependence by dropping some of your variables. The second is to get a larger and/or better (less homogenous) sample.
– Dimitriy V. Masterov
Mar 15 '12 at 20:36
1
(+1) This explanation is a good one, but it is unnecessary to attribute the phenomenon to multicollinearity: the key distinction is between jointly informative and individually informative. Including additional uncorrelated regressors (which avoids any multicollinearity) lowers the former while leaving the latter unchanged.
– whuber♦
Aug 22 '12 at 16:44
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
36
down vote
I'm not sure that multicollinearity is what's going on here. It certainly could be, but from the information given I can't conclude that, and I don't want to start there. My first guess is that this might be a multiple comparisons issue. That is, if you run enough tests, something will show up, even if there's nothing there.
One of the issues that I harp on is that the problem of multiple comparisons is always discussed in terms of examining many pairwise comparisons—e.g., running t-tests on every unique pairing of levels. (For a humorous treatment of multiple comparisons, look here.) This leaves people with the impression that that is the only place this problem shows up. But this is simply not true—the problem of multiple comparisons shows up everywhere. For instance, if you run a regression with 4 explanatory variables, the same issues exist. In a well-designed experiment, IV's can be orthogonal, but people routinely worry about using Bonferroni corrections on sets of a-priori, orthogonal contrasts, and don't think twice about factorial ANOVA's. To my mind this is inconsistent.
The global F test is what's called a 'simultaneous' test. This checks to see if all of your predictors are unrelated to the response variable. The simultaneous test provides some protection against the problem of multiple comparisons without having to go the power-losing Bonferroni route. Unfortunately, my interpretation of what you report is that you have a null finding.
Several things mitigate against this interpretation. First, with only 43 data, you almost certainly don't have much power. It's quite possible that there is a real effect, but you just can't resolve it without more data. Second, like both @andrea and @Dimitriy, I worry about the appropriateness of treating 4-level categorical variables as numeric. This may well not be appropriate, and could have any number of effects, including diminishing your ability to detect what is really there. Lastly, I'm not sure that significance testing is quite as important as people believe. A $p$ of $.11$ is kind of low; is there really something going on there? maybe! who knows?—there's no 'bright line' at .05 that demarcates real effects from mere appearance.
add a comment |
up vote
36
down vote
I'm not sure that multicollinearity is what's going on here. It certainly could be, but from the information given I can't conclude that, and I don't want to start there. My first guess is that this might be a multiple comparisons issue. That is, if you run enough tests, something will show up, even if there's nothing there.
One of the issues that I harp on is that the problem of multiple comparisons is always discussed in terms of examining many pairwise comparisons—e.g., running t-tests on every unique pairing of levels. (For a humorous treatment of multiple comparisons, look here.) This leaves people with the impression that that is the only place this problem shows up. But this is simply not true—the problem of multiple comparisons shows up everywhere. For instance, if you run a regression with 4 explanatory variables, the same issues exist. In a well-designed experiment, IV's can be orthogonal, but people routinely worry about using Bonferroni corrections on sets of a-priori, orthogonal contrasts, and don't think twice about factorial ANOVA's. To my mind this is inconsistent.
The global F test is what's called a 'simultaneous' test. This checks to see if all of your predictors are unrelated to the response variable. The simultaneous test provides some protection against the problem of multiple comparisons without having to go the power-losing Bonferroni route. Unfortunately, my interpretation of what you report is that you have a null finding.
Several things mitigate against this interpretation. First, with only 43 data, you almost certainly don't have much power. It's quite possible that there is a real effect, but you just can't resolve it without more data. Second, like both @andrea and @Dimitriy, I worry about the appropriateness of treating 4-level categorical variables as numeric. This may well not be appropriate, and could have any number of effects, including diminishing your ability to detect what is really there. Lastly, I'm not sure that significance testing is quite as important as people believe. A $p$ of $.11$ is kind of low; is there really something going on there? maybe! who knows?—there's no 'bright line' at .05 that demarcates real effects from mere appearance.
add a comment |
up vote
36
down vote
up vote
36
down vote
I'm not sure that multicollinearity is what's going on here. It certainly could be, but from the information given I can't conclude that, and I don't want to start there. My first guess is that this might be a multiple comparisons issue. That is, if you run enough tests, something will show up, even if there's nothing there.
One of the issues that I harp on is that the problem of multiple comparisons is always discussed in terms of examining many pairwise comparisons—e.g., running t-tests on every unique pairing of levels. (For a humorous treatment of multiple comparisons, look here.) This leaves people with the impression that that is the only place this problem shows up. But this is simply not true—the problem of multiple comparisons shows up everywhere. For instance, if you run a regression with 4 explanatory variables, the same issues exist. In a well-designed experiment, IV's can be orthogonal, but people routinely worry about using Bonferroni corrections on sets of a-priori, orthogonal contrasts, and don't think twice about factorial ANOVA's. To my mind this is inconsistent.
The global F test is what's called a 'simultaneous' test. This checks to see if all of your predictors are unrelated to the response variable. The simultaneous test provides some protection against the problem of multiple comparisons without having to go the power-losing Bonferroni route. Unfortunately, my interpretation of what you report is that you have a null finding.
Several things mitigate against this interpretation. First, with only 43 data, you almost certainly don't have much power. It's quite possible that there is a real effect, but you just can't resolve it without more data. Second, like both @andrea and @Dimitriy, I worry about the appropriateness of treating 4-level categorical variables as numeric. This may well not be appropriate, and could have any number of effects, including diminishing your ability to detect what is really there. Lastly, I'm not sure that significance testing is quite as important as people believe. A $p$ of $.11$ is kind of low; is there really something going on there? maybe! who knows?—there's no 'bright line' at .05 that demarcates real effects from mere appearance.
I'm not sure that multicollinearity is what's going on here. It certainly could be, but from the information given I can't conclude that, and I don't want to start there. My first guess is that this might be a multiple comparisons issue. That is, if you run enough tests, something will show up, even if there's nothing there.
One of the issues that I harp on is that the problem of multiple comparisons is always discussed in terms of examining many pairwise comparisons—e.g., running t-tests on every unique pairing of levels. (For a humorous treatment of multiple comparisons, look here.) This leaves people with the impression that that is the only place this problem shows up. But this is simply not true—the problem of multiple comparisons shows up everywhere. For instance, if you run a regression with 4 explanatory variables, the same issues exist. In a well-designed experiment, IV's can be orthogonal, but people routinely worry about using Bonferroni corrections on sets of a-priori, orthogonal contrasts, and don't think twice about factorial ANOVA's. To my mind this is inconsistent.
The global F test is what's called a 'simultaneous' test. This checks to see if all of your predictors are unrelated to the response variable. The simultaneous test provides some protection against the problem of multiple comparisons without having to go the power-losing Bonferroni route. Unfortunately, my interpretation of what you report is that you have a null finding.
Several things mitigate against this interpretation. First, with only 43 data, you almost certainly don't have much power. It's quite possible that there is a real effect, but you just can't resolve it without more data. Second, like both @andrea and @Dimitriy, I worry about the appropriateness of treating 4-level categorical variables as numeric. This may well not be appropriate, and could have any number of effects, including diminishing your ability to detect what is really there. Lastly, I'm not sure that significance testing is quite as important as people believe. A $p$ of $.11$ is kind of low; is there really something going on there? maybe! who knows?—there's no 'bright line' at .05 that demarcates real effects from mere appearance.
edited Oct 18 at 16:48
answered Mar 16 '12 at 4:12
gung♦
105k34255518
105k34255518
add a comment |
add a comment |
up vote
23
down vote
I would like to suggest that this phenomenon (of a non-significant overall test despite a significant individual variable) can be understood as a kind of aggregate "masking effect" and that although it conceivably could arise from multicollinear explanatory variables, it need not do that at all. It also turns out not to be due to multiple comparison adjustments, either. Thus this answer is adding some qualifications to the answers that have already appeared, which on the contrary suggest that either multicollinearity or multiple comparisons should be looked at as the culprits.
To establish the plausibility of these assertions, let's generate a collection of perfectly orthogonal variables--just as non-collinear as possible--and a dependent variable that explicitly is determined solely by the first of the explanands (plus a good amount of random error independent of everything else). In R
this can be done (reproducibly, if you wish to experiment) as
set.seed(17)
p <- 5 # Number of explanatory variables
x <- as.matrix(do.call(expand.grid, lapply(as.list(1:p), function(i) c(-1,1))))
y <- x[,1] + rnorm(2^p, mean=0, sd=2)
It's unimportant that the explanatory variables are binary; what matters is their orthogonality, which we can check to make sure the code is working as expected, which can be done by inspecting their correlations. Indeed, the correlation matrix is interesting: the small coefficients suggest y
has little to do with any of the variables except the first (which is by design) and the off-diagonal zeros confirm the orthogonality of the explanatory variables:
> cor(cbind(x,y))
Var1 Var2 Var3 Var4 Var5 y
Var1 1.00 0.000 0.000 0.000 0.00 0.486
Var2 0.00 1.000 0.000 0.000 0.00 0.088
Var3 0.00 0.000 1.000 0.000 0.00 0.044
Var4 0.00 0.000 0.000 1.000 0.00 -0.014
Var5 0.00 0.000 0.000 0.000 1.00 -0.167
y 0.49 0.088 0.044 -0.014 -0.17 1.000
Let's run a series of regressions, using only the first variable, then the first two, and so on. For brevity and easy comparison, in each one I show only the line for the first variable and the overall F-test:
>temp <- sapply(1:p, function(i) print(summary(lm(y ~ x[, 1:i]))))
# Estimate Std. Error t value Pr(>|t|)
1 x[, 1:i] 0.898 0.294 3.05 0.0048 **
F-statistic: 9.29 on 1 and 30 DF, p-value: 0.00478
2 x[, 1:i]Var1 0.898 0.298 3.01 0.0053 **
F-statistic: 4.68 on 2 and 29 DF, p-value: 0.0173
3 x[, 1:i]Var1 0.8975 0.3029 2.96 0.0062 **
F-statistic: 3.05 on 3 and 28 DF, p-value: 0.0451
4 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0072 **
F-statistic: 2.21 on 4 and 27 DF, p-value: 0.095
5 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0073 **
F-statistic: 1.96 on 5 and 26 DF, p-value: 0.118
Look at how (a) the significance of the first variable barely changes, (a') the first variable remains significant (p < .05) even when adjusting for multiple comparisons (e.g., apply Bonferroni by multiplying the nominal p-value by the number of explanatory variables), (b) the coefficient of the first variable barely changes, but (c) the overall significance grows exponentially, quickly inflating to a non-significant level.
I interpret this as demonstrating that including explanatory variables that are largely independent of the dependent variable can "mask" the overall p-value of the regression. When the new variables are orthogonal to existing ones and to the dependent variable, they will not change the individual p-values. (The small changes seen here are because the random error added to y
is, by accident, slightly correlated with all the other variables.) One lesson to draw from this is that parsimony is valuable: using as few variables as needed can strengthen the significance of the results.
I am not saying that this is necessarily happening for the dataset in the question, about which little has been disclosed. But knowledge that this masking effect can happen should inform our interpretation of the results as well as our strategies for variable selection and model building.
+1, I agree w/ this analysis. FWIW, this is the explanation I was hinting at (perhaps not well) in my discussion about power in my answer to the other question. I do have 1 question about your version here, why do you use 32 as the mean of your error term? Is that a typo, or is it important in some way?
– gung♦
Aug 21 '12 at 20:59
@gung Where do you see 32? If you're referring tornorm(2^p, sd=2)
, please note that the first argument is the number of terms, not the mean. The mean by default is zero and therefore has not been explicitly specified.
– whuber♦
Aug 21 '12 at 21:23
Oh, sorry. I guess I was confusingrnorm()
w/ $mathcal N(mu, sigma)$.
– gung♦
Aug 21 '12 at 21:52
@gung I am grateful for the opportunity to clarify the code and therefore have edited the offending line.
– whuber♦
Aug 21 '12 at 21:58
add a comment |
up vote
23
down vote
I would like to suggest that this phenomenon (of a non-significant overall test despite a significant individual variable) can be understood as a kind of aggregate "masking effect" and that although it conceivably could arise from multicollinear explanatory variables, it need not do that at all. It also turns out not to be due to multiple comparison adjustments, either. Thus this answer is adding some qualifications to the answers that have already appeared, which on the contrary suggest that either multicollinearity or multiple comparisons should be looked at as the culprits.
To establish the plausibility of these assertions, let's generate a collection of perfectly orthogonal variables--just as non-collinear as possible--and a dependent variable that explicitly is determined solely by the first of the explanands (plus a good amount of random error independent of everything else). In R
this can be done (reproducibly, if you wish to experiment) as
set.seed(17)
p <- 5 # Number of explanatory variables
x <- as.matrix(do.call(expand.grid, lapply(as.list(1:p), function(i) c(-1,1))))
y <- x[,1] + rnorm(2^p, mean=0, sd=2)
It's unimportant that the explanatory variables are binary; what matters is their orthogonality, which we can check to make sure the code is working as expected, which can be done by inspecting their correlations. Indeed, the correlation matrix is interesting: the small coefficients suggest y
has little to do with any of the variables except the first (which is by design) and the off-diagonal zeros confirm the orthogonality of the explanatory variables:
> cor(cbind(x,y))
Var1 Var2 Var3 Var4 Var5 y
Var1 1.00 0.000 0.000 0.000 0.00 0.486
Var2 0.00 1.000 0.000 0.000 0.00 0.088
Var3 0.00 0.000 1.000 0.000 0.00 0.044
Var4 0.00 0.000 0.000 1.000 0.00 -0.014
Var5 0.00 0.000 0.000 0.000 1.00 -0.167
y 0.49 0.088 0.044 -0.014 -0.17 1.000
Let's run a series of regressions, using only the first variable, then the first two, and so on. For brevity and easy comparison, in each one I show only the line for the first variable and the overall F-test:
>temp <- sapply(1:p, function(i) print(summary(lm(y ~ x[, 1:i]))))
# Estimate Std. Error t value Pr(>|t|)
1 x[, 1:i] 0.898 0.294 3.05 0.0048 **
F-statistic: 9.29 on 1 and 30 DF, p-value: 0.00478
2 x[, 1:i]Var1 0.898 0.298 3.01 0.0053 **
F-statistic: 4.68 on 2 and 29 DF, p-value: 0.0173
3 x[, 1:i]Var1 0.8975 0.3029 2.96 0.0062 **
F-statistic: 3.05 on 3 and 28 DF, p-value: 0.0451
4 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0072 **
F-statistic: 2.21 on 4 and 27 DF, p-value: 0.095
5 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0073 **
F-statistic: 1.96 on 5 and 26 DF, p-value: 0.118
Look at how (a) the significance of the first variable barely changes, (a') the first variable remains significant (p < .05) even when adjusting for multiple comparisons (e.g., apply Bonferroni by multiplying the nominal p-value by the number of explanatory variables), (b) the coefficient of the first variable barely changes, but (c) the overall significance grows exponentially, quickly inflating to a non-significant level.
I interpret this as demonstrating that including explanatory variables that are largely independent of the dependent variable can "mask" the overall p-value of the regression. When the new variables are orthogonal to existing ones and to the dependent variable, they will not change the individual p-values. (The small changes seen here are because the random error added to y
is, by accident, slightly correlated with all the other variables.) One lesson to draw from this is that parsimony is valuable: using as few variables as needed can strengthen the significance of the results.
I am not saying that this is necessarily happening for the dataset in the question, about which little has been disclosed. But knowledge that this masking effect can happen should inform our interpretation of the results as well as our strategies for variable selection and model building.
+1, I agree w/ this analysis. FWIW, this is the explanation I was hinting at (perhaps not well) in my discussion about power in my answer to the other question. I do have 1 question about your version here, why do you use 32 as the mean of your error term? Is that a typo, or is it important in some way?
– gung♦
Aug 21 '12 at 20:59
@gung Where do you see 32? If you're referring tornorm(2^p, sd=2)
, please note that the first argument is the number of terms, not the mean. The mean by default is zero and therefore has not been explicitly specified.
– whuber♦
Aug 21 '12 at 21:23
Oh, sorry. I guess I was confusingrnorm()
w/ $mathcal N(mu, sigma)$.
– gung♦
Aug 21 '12 at 21:52
@gung I am grateful for the opportunity to clarify the code and therefore have edited the offending line.
– whuber♦
Aug 21 '12 at 21:58
add a comment |
up vote
23
down vote
up vote
23
down vote
I would like to suggest that this phenomenon (of a non-significant overall test despite a significant individual variable) can be understood as a kind of aggregate "masking effect" and that although it conceivably could arise from multicollinear explanatory variables, it need not do that at all. It also turns out not to be due to multiple comparison adjustments, either. Thus this answer is adding some qualifications to the answers that have already appeared, which on the contrary suggest that either multicollinearity or multiple comparisons should be looked at as the culprits.
To establish the plausibility of these assertions, let's generate a collection of perfectly orthogonal variables--just as non-collinear as possible--and a dependent variable that explicitly is determined solely by the first of the explanands (plus a good amount of random error independent of everything else). In R
this can be done (reproducibly, if you wish to experiment) as
set.seed(17)
p <- 5 # Number of explanatory variables
x <- as.matrix(do.call(expand.grid, lapply(as.list(1:p), function(i) c(-1,1))))
y <- x[,1] + rnorm(2^p, mean=0, sd=2)
It's unimportant that the explanatory variables are binary; what matters is their orthogonality, which we can check to make sure the code is working as expected, which can be done by inspecting their correlations. Indeed, the correlation matrix is interesting: the small coefficients suggest y
has little to do with any of the variables except the first (which is by design) and the off-diagonal zeros confirm the orthogonality of the explanatory variables:
> cor(cbind(x,y))
Var1 Var2 Var3 Var4 Var5 y
Var1 1.00 0.000 0.000 0.000 0.00 0.486
Var2 0.00 1.000 0.000 0.000 0.00 0.088
Var3 0.00 0.000 1.000 0.000 0.00 0.044
Var4 0.00 0.000 0.000 1.000 0.00 -0.014
Var5 0.00 0.000 0.000 0.000 1.00 -0.167
y 0.49 0.088 0.044 -0.014 -0.17 1.000
Let's run a series of regressions, using only the first variable, then the first two, and so on. For brevity and easy comparison, in each one I show only the line for the first variable and the overall F-test:
>temp <- sapply(1:p, function(i) print(summary(lm(y ~ x[, 1:i]))))
# Estimate Std. Error t value Pr(>|t|)
1 x[, 1:i] 0.898 0.294 3.05 0.0048 **
F-statistic: 9.29 on 1 and 30 DF, p-value: 0.00478
2 x[, 1:i]Var1 0.898 0.298 3.01 0.0053 **
F-statistic: 4.68 on 2 and 29 DF, p-value: 0.0173
3 x[, 1:i]Var1 0.8975 0.3029 2.96 0.0062 **
F-statistic: 3.05 on 3 and 28 DF, p-value: 0.0451
4 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0072 **
F-statistic: 2.21 on 4 and 27 DF, p-value: 0.095
5 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0073 **
F-statistic: 1.96 on 5 and 26 DF, p-value: 0.118
Look at how (a) the significance of the first variable barely changes, (a') the first variable remains significant (p < .05) even when adjusting for multiple comparisons (e.g., apply Bonferroni by multiplying the nominal p-value by the number of explanatory variables), (b) the coefficient of the first variable barely changes, but (c) the overall significance grows exponentially, quickly inflating to a non-significant level.
I interpret this as demonstrating that including explanatory variables that are largely independent of the dependent variable can "mask" the overall p-value of the regression. When the new variables are orthogonal to existing ones and to the dependent variable, they will not change the individual p-values. (The small changes seen here are because the random error added to y
is, by accident, slightly correlated with all the other variables.) One lesson to draw from this is that parsimony is valuable: using as few variables as needed can strengthen the significance of the results.
I am not saying that this is necessarily happening for the dataset in the question, about which little has been disclosed. But knowledge that this masking effect can happen should inform our interpretation of the results as well as our strategies for variable selection and model building.
I would like to suggest that this phenomenon (of a non-significant overall test despite a significant individual variable) can be understood as a kind of aggregate "masking effect" and that although it conceivably could arise from multicollinear explanatory variables, it need not do that at all. It also turns out not to be due to multiple comparison adjustments, either. Thus this answer is adding some qualifications to the answers that have already appeared, which on the contrary suggest that either multicollinearity or multiple comparisons should be looked at as the culprits.
To establish the plausibility of these assertions, let's generate a collection of perfectly orthogonal variables--just as non-collinear as possible--and a dependent variable that explicitly is determined solely by the first of the explanands (plus a good amount of random error independent of everything else). In R
this can be done (reproducibly, if you wish to experiment) as
set.seed(17)
p <- 5 # Number of explanatory variables
x <- as.matrix(do.call(expand.grid, lapply(as.list(1:p), function(i) c(-1,1))))
y <- x[,1] + rnorm(2^p, mean=0, sd=2)
It's unimportant that the explanatory variables are binary; what matters is their orthogonality, which we can check to make sure the code is working as expected, which can be done by inspecting their correlations. Indeed, the correlation matrix is interesting: the small coefficients suggest y
has little to do with any of the variables except the first (which is by design) and the off-diagonal zeros confirm the orthogonality of the explanatory variables:
> cor(cbind(x,y))
Var1 Var2 Var3 Var4 Var5 y
Var1 1.00 0.000 0.000 0.000 0.00 0.486
Var2 0.00 1.000 0.000 0.000 0.00 0.088
Var3 0.00 0.000 1.000 0.000 0.00 0.044
Var4 0.00 0.000 0.000 1.000 0.00 -0.014
Var5 0.00 0.000 0.000 0.000 1.00 -0.167
y 0.49 0.088 0.044 -0.014 -0.17 1.000
Let's run a series of regressions, using only the first variable, then the first two, and so on. For brevity and easy comparison, in each one I show only the line for the first variable and the overall F-test:
>temp <- sapply(1:p, function(i) print(summary(lm(y ~ x[, 1:i]))))
# Estimate Std. Error t value Pr(>|t|)
1 x[, 1:i] 0.898 0.294 3.05 0.0048 **
F-statistic: 9.29 on 1 and 30 DF, p-value: 0.00478
2 x[, 1:i]Var1 0.898 0.298 3.01 0.0053 **
F-statistic: 4.68 on 2 and 29 DF, p-value: 0.0173
3 x[, 1:i]Var1 0.8975 0.3029 2.96 0.0062 **
F-statistic: 3.05 on 3 and 28 DF, p-value: 0.0451
4 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0072 **
F-statistic: 2.21 on 4 and 27 DF, p-value: 0.095
5 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0073 **
F-statistic: 1.96 on 5 and 26 DF, p-value: 0.118
Look at how (a) the significance of the first variable barely changes, (a') the first variable remains significant (p < .05) even when adjusting for multiple comparisons (e.g., apply Bonferroni by multiplying the nominal p-value by the number of explanatory variables), (b) the coefficient of the first variable barely changes, but (c) the overall significance grows exponentially, quickly inflating to a non-significant level.
I interpret this as demonstrating that including explanatory variables that are largely independent of the dependent variable can "mask" the overall p-value of the regression. When the new variables are orthogonal to existing ones and to the dependent variable, they will not change the individual p-values. (The small changes seen here are because the random error added to y
is, by accident, slightly correlated with all the other variables.) One lesson to draw from this is that parsimony is valuable: using as few variables as needed can strengthen the significance of the results.
I am not saying that this is necessarily happening for the dataset in the question, about which little has been disclosed. But knowledge that this masking effect can happen should inform our interpretation of the results as well as our strategies for variable selection and model building.
edited Aug 21 '12 at 21:57
answered Aug 21 '12 at 20:48
whuber♦
200k32432803
200k32432803
+1, I agree w/ this analysis. FWIW, this is the explanation I was hinting at (perhaps not well) in my discussion about power in my answer to the other question. I do have 1 question about your version here, why do you use 32 as the mean of your error term? Is that a typo, or is it important in some way?
– gung♦
Aug 21 '12 at 20:59
@gung Where do you see 32? If you're referring tornorm(2^p, sd=2)
, please note that the first argument is the number of terms, not the mean. The mean by default is zero and therefore has not been explicitly specified.
– whuber♦
Aug 21 '12 at 21:23
Oh, sorry. I guess I was confusingrnorm()
w/ $mathcal N(mu, sigma)$.
– gung♦
Aug 21 '12 at 21:52
@gung I am grateful for the opportunity to clarify the code and therefore have edited the offending line.
– whuber♦
Aug 21 '12 at 21:58
add a comment |
+1, I agree w/ this analysis. FWIW, this is the explanation I was hinting at (perhaps not well) in my discussion about power in my answer to the other question. I do have 1 question about your version here, why do you use 32 as the mean of your error term? Is that a typo, or is it important in some way?
– gung♦
Aug 21 '12 at 20:59
@gung Where do you see 32? If you're referring tornorm(2^p, sd=2)
, please note that the first argument is the number of terms, not the mean. The mean by default is zero and therefore has not been explicitly specified.
– whuber♦
Aug 21 '12 at 21:23
Oh, sorry. I guess I was confusingrnorm()
w/ $mathcal N(mu, sigma)$.
– gung♦
Aug 21 '12 at 21:52
@gung I am grateful for the opportunity to clarify the code and therefore have edited the offending line.
– whuber♦
Aug 21 '12 at 21:58
+1, I agree w/ this analysis. FWIW, this is the explanation I was hinting at (perhaps not well) in my discussion about power in my answer to the other question. I do have 1 question about your version here, why do you use 32 as the mean of your error term? Is that a typo, or is it important in some way?
– gung♦
Aug 21 '12 at 20:59
+1, I agree w/ this analysis. FWIW, this is the explanation I was hinting at (perhaps not well) in my discussion about power in my answer to the other question. I do have 1 question about your version here, why do you use 32 as the mean of your error term? Is that a typo, or is it important in some way?
– gung♦
Aug 21 '12 at 20:59
@gung Where do you see 32? If you're referring to
rnorm(2^p, sd=2)
, please note that the first argument is the number of terms, not the mean. The mean by default is zero and therefore has not been explicitly specified.– whuber♦
Aug 21 '12 at 21:23
@gung Where do you see 32? If you're referring to
rnorm(2^p, sd=2)
, please note that the first argument is the number of terms, not the mean. The mean by default is zero and therefore has not been explicitly specified.– whuber♦
Aug 21 '12 at 21:23
Oh, sorry. I guess I was confusing
rnorm()
w/ $mathcal N(mu, sigma)$.– gung♦
Aug 21 '12 at 21:52
Oh, sorry. I guess I was confusing
rnorm()
w/ $mathcal N(mu, sigma)$.– gung♦
Aug 21 '12 at 21:52
@gung I am grateful for the opportunity to clarify the code and therefore have edited the offending line.
– whuber♦
Aug 21 '12 at 21:58
@gung I am grateful for the opportunity to clarify the code and therefore have edited the offending line.
– whuber♦
Aug 21 '12 at 21:58
add a comment |
up vote
11
down vote
You frequently have this happen when you have a high degree of collinearity among your explanatory variables. The ANOVA F is a joint test that all the regressors are jointly uninformative. When your Xs contain similar information, the model cannot attribute the explanatory power to one regressor or another, but their combination can explain much of the variation in the response variable.
Also, the fact that you seem to be treating you categorical variables as if they were continuous may be problematic. You are explicitly imposing restrictions like bumping $x_{1}$ from 1 to 2 has the same effect on $y$ as bumping it from 3 to 4. Sometime's that's OK, but often it's not.
If collinearity is a problem, then you will have high standard errors and perhaps implausibly large coefficients, maybe even with the wrong signs. To make sure that this is what is happening, calculate the variance inflation factors (VIFs) after your regression. A reasonable rule of thumb is that collinearity is a problem if the largest VIF is greater than 10. If so, you really have two options here. One is to re-specify the model to reduce the near-linear dependence by dropping some of your variables. The second is to get a larger and/or better (less homogenous) sample.
– Dimitriy V. Masterov
Mar 15 '12 at 20:36
1
(+1) This explanation is a good one, but it is unnecessary to attribute the phenomenon to multicollinearity: the key distinction is between jointly informative and individually informative. Including additional uncorrelated regressors (which avoids any multicollinearity) lowers the former while leaving the latter unchanged.
– whuber♦
Aug 22 '12 at 16:44
add a comment |
up vote
11
down vote
You frequently have this happen when you have a high degree of collinearity among your explanatory variables. The ANOVA F is a joint test that all the regressors are jointly uninformative. When your Xs contain similar information, the model cannot attribute the explanatory power to one regressor or another, but their combination can explain much of the variation in the response variable.
Also, the fact that you seem to be treating you categorical variables as if they were continuous may be problematic. You are explicitly imposing restrictions like bumping $x_{1}$ from 1 to 2 has the same effect on $y$ as bumping it from 3 to 4. Sometime's that's OK, but often it's not.
If collinearity is a problem, then you will have high standard errors and perhaps implausibly large coefficients, maybe even with the wrong signs. To make sure that this is what is happening, calculate the variance inflation factors (VIFs) after your regression. A reasonable rule of thumb is that collinearity is a problem if the largest VIF is greater than 10. If so, you really have two options here. One is to re-specify the model to reduce the near-linear dependence by dropping some of your variables. The second is to get a larger and/or better (less homogenous) sample.
– Dimitriy V. Masterov
Mar 15 '12 at 20:36
1
(+1) This explanation is a good one, but it is unnecessary to attribute the phenomenon to multicollinearity: the key distinction is between jointly informative and individually informative. Including additional uncorrelated regressors (which avoids any multicollinearity) lowers the former while leaving the latter unchanged.
– whuber♦
Aug 22 '12 at 16:44
add a comment |
up vote
11
down vote
up vote
11
down vote
You frequently have this happen when you have a high degree of collinearity among your explanatory variables. The ANOVA F is a joint test that all the regressors are jointly uninformative. When your Xs contain similar information, the model cannot attribute the explanatory power to one regressor or another, but their combination can explain much of the variation in the response variable.
Also, the fact that you seem to be treating you categorical variables as if they were continuous may be problematic. You are explicitly imposing restrictions like bumping $x_{1}$ from 1 to 2 has the same effect on $y$ as bumping it from 3 to 4. Sometime's that's OK, but often it's not.
You frequently have this happen when you have a high degree of collinearity among your explanatory variables. The ANOVA F is a joint test that all the regressors are jointly uninformative. When your Xs contain similar information, the model cannot attribute the explanatory power to one regressor or another, but their combination can explain much of the variation in the response variable.
Also, the fact that you seem to be treating you categorical variables as if they were continuous may be problematic. You are explicitly imposing restrictions like bumping $x_{1}$ from 1 to 2 has the same effect on $y$ as bumping it from 3 to 4. Sometime's that's OK, but often it's not.
answered Mar 15 '12 at 20:24
Dimitriy V. Masterov
20.3k14091
20.3k14091
If collinearity is a problem, then you will have high standard errors and perhaps implausibly large coefficients, maybe even with the wrong signs. To make sure that this is what is happening, calculate the variance inflation factors (VIFs) after your regression. A reasonable rule of thumb is that collinearity is a problem if the largest VIF is greater than 10. If so, you really have two options here. One is to re-specify the model to reduce the near-linear dependence by dropping some of your variables. The second is to get a larger and/or better (less homogenous) sample.
– Dimitriy V. Masterov
Mar 15 '12 at 20:36
1
(+1) This explanation is a good one, but it is unnecessary to attribute the phenomenon to multicollinearity: the key distinction is between jointly informative and individually informative. Including additional uncorrelated regressors (which avoids any multicollinearity) lowers the former while leaving the latter unchanged.
– whuber♦
Aug 22 '12 at 16:44
add a comment |
If collinearity is a problem, then you will have high standard errors and perhaps implausibly large coefficients, maybe even with the wrong signs. To make sure that this is what is happening, calculate the variance inflation factors (VIFs) after your regression. A reasonable rule of thumb is that collinearity is a problem if the largest VIF is greater than 10. If so, you really have two options here. One is to re-specify the model to reduce the near-linear dependence by dropping some of your variables. The second is to get a larger and/or better (less homogenous) sample.
– Dimitriy V. Masterov
Mar 15 '12 at 20:36
1
(+1) This explanation is a good one, but it is unnecessary to attribute the phenomenon to multicollinearity: the key distinction is between jointly informative and individually informative. Including additional uncorrelated regressors (which avoids any multicollinearity) lowers the former while leaving the latter unchanged.
– whuber♦
Aug 22 '12 at 16:44
If collinearity is a problem, then you will have high standard errors and perhaps implausibly large coefficients, maybe even with the wrong signs. To make sure that this is what is happening, calculate the variance inflation factors (VIFs) after your regression. A reasonable rule of thumb is that collinearity is a problem if the largest VIF is greater than 10. If so, you really have two options here. One is to re-specify the model to reduce the near-linear dependence by dropping some of your variables. The second is to get a larger and/or better (less homogenous) sample.
– Dimitriy V. Masterov
Mar 15 '12 at 20:36
If collinearity is a problem, then you will have high standard errors and perhaps implausibly large coefficients, maybe even with the wrong signs. To make sure that this is what is happening, calculate the variance inflation factors (VIFs) after your regression. A reasonable rule of thumb is that collinearity is a problem if the largest VIF is greater than 10. If so, you really have two options here. One is to re-specify the model to reduce the near-linear dependence by dropping some of your variables. The second is to get a larger and/or better (less homogenous) sample.
– Dimitriy V. Masterov
Mar 15 '12 at 20:36
1
1
(+1) This explanation is a good one, but it is unnecessary to attribute the phenomenon to multicollinearity: the key distinction is between jointly informative and individually informative. Including additional uncorrelated regressors (which avoids any multicollinearity) lowers the former while leaving the latter unchanged.
– whuber♦
Aug 22 '12 at 16:44
(+1) This explanation is a good one, but it is unnecessary to attribute the phenomenon to multicollinearity: the key distinction is between jointly informative and individually informative. Including additional uncorrelated regressors (which avoids any multicollinearity) lowers the former while leaving the latter unchanged.
– whuber♦
Aug 22 '12 at 16:44
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f24720%2fsignificance-contradiction-in-linear-regression-significant-t-test-for-a-coeffi%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
If you have 4 categorical variables with 4 levels each, you should have 3*4=12 coefficients for your independent variables (plus the intercept)...
– boscovich
Mar 15 '12 at 20:01
@andrea: I've decided to treat them as numerical variables.
– Leo
Mar 15 '12 at 20:12
3
0.02 is barely significant (especially if you consider the fact that you have five tests in total) and 0.11 is not very high. A generous interpretation would be that with a little more power the overall F-test would also be significant (and perhaps the first coefficient as well). A more conservative interpretation is that you shouldn't have much confidence in any of these results (including the coefficient with a .02 p value). Either way, you shouldn't read too much in the difference between .02 and .11.
– Gala
Mar 15 '12 at 20:34
2
For a discussion of the opposite case, you can also see here: how can a regression be significant yet all predictors be non-significant, in addition to the question linked above.
– gung♦
Sep 13 '12 at 15:17