When an MLE attains the Cramer-Rao bound for an exponential family.
$begingroup$
I'm trying to do the following exercise:
Let $$delta_theta(x)=S(x)exp (theta T(x)-f(theta))$$
be a family of densities for a random variable $X:Omegarightarrow
mathbb{R}$, where $theta$ takes values in an open interval, and
$T:mathbb{R}rightarrow mathbb{R}$ is a non-constant function.
Suppose the maximum likelihood estimator $hat{theta}(x)$ exists and has
finite variance for all $theta$. Prove that $hat{theta}(x)$ attains
the Cramer-Rao lower bound for all $theta$ if and only if
$f'(theta)$ has the form $alpha theta+beta$ for constants $alpha$
and $beta$.
At present, all i know about the MLE $hattheta$ is that, for given $xin mathbb{R}$, it is the solution to $$f'(hattheta) = T(x) label{a}tag{1}$$
On its face, the question doesn't make much sense to me, since the book hasn't given me any information about the bias of $hat{theta}$. So in principle, the Cramer-Rao lower bound $$Var(hat{theta})geq frac{[frac{d}{dtheta}(mathbb{E}_theta[hat{theta}(x)])]^2}{mathbb{E}_theta[( frac{partial}{partialtheta}log delta_theta(x))^2]}$$ isn't really a lower bound, since the numerator depends on the estimator $hat{theta}$. And the numerator could be zero, in principle.
Setting that aside, I think what the question wants me to do is just to write down the condition where the Cauchy-Schwarz inequality (used in the proof of the Cramer-Rao lower bound) gives equality: We have $$frac{d}{dtheta}mathbb{E}_theta[hat theta(x)]=frac{d}{dtheta}int_mathbb{R}hat theta(x) delta_theta(x) ~~dx=int_mathbb{R}hat theta(x)~cdot~frac{partial}{partial theta} delta_theta(x)~cdot~delta_theta(x)^{-1}~cdot~delta_theta(x)~~dx=$$
$$int_mathbb{R}(hat theta(x)-mathbb{E}_theta[hat theta])~cdot~frac{partial}{partial theta} delta_theta(x)~cdot~delta_theta(x)^{-1}~cdot~delta_theta(x)~~dx ~~~text{(since}int_mathbb{R}[frac{partial}{partial theta} delta_theta(x)]delta_theta(x)^{-1}cdot~delta_theta(x)dx=0 text{)}$$
$$= int_mathbb{R}(hat theta(x)-mathbb{E}_theta[hat theta])~cdot~[T(x)-f'(theta)]~cdot~delta_theta(x)~~dxleq sqrt{int_mathbb{R}(hat theta(x)-mathbb{E}_theta[hat theta])^2~ delta_theta(x)~dx cdot int_mathbb{R}[T(x)-f'(theta)]^2~delta_theta(x)~~dx}=$$
$$sqrt{Var(hat theta) cdot int_mathbb{R}[T(x)-f'(theta)]^2~delta_theta(x)~~dx}$$
Whereas if we wanted equality we would need $$hat theta(x)-mathbb{E}_theta[hat theta]= C(theta)cdot[T(x)-f'(theta)] $$ where $C(theta)$ is a constant depending only on $theta$. At this point I am totally stuck. And I haven't even used equation ref{a}.
probability statistics
$endgroup$
add a comment |
$begingroup$
I'm trying to do the following exercise:
Let $$delta_theta(x)=S(x)exp (theta T(x)-f(theta))$$
be a family of densities for a random variable $X:Omegarightarrow
mathbb{R}$, where $theta$ takes values in an open interval, and
$T:mathbb{R}rightarrow mathbb{R}$ is a non-constant function.
Suppose the maximum likelihood estimator $hat{theta}(x)$ exists and has
finite variance for all $theta$. Prove that $hat{theta}(x)$ attains
the Cramer-Rao lower bound for all $theta$ if and only if
$f'(theta)$ has the form $alpha theta+beta$ for constants $alpha$
and $beta$.
At present, all i know about the MLE $hattheta$ is that, for given $xin mathbb{R}$, it is the solution to $$f'(hattheta) = T(x) label{a}tag{1}$$
On its face, the question doesn't make much sense to me, since the book hasn't given me any information about the bias of $hat{theta}$. So in principle, the Cramer-Rao lower bound $$Var(hat{theta})geq frac{[frac{d}{dtheta}(mathbb{E}_theta[hat{theta}(x)])]^2}{mathbb{E}_theta[( frac{partial}{partialtheta}log delta_theta(x))^2]}$$ isn't really a lower bound, since the numerator depends on the estimator $hat{theta}$. And the numerator could be zero, in principle.
Setting that aside, I think what the question wants me to do is just to write down the condition where the Cauchy-Schwarz inequality (used in the proof of the Cramer-Rao lower bound) gives equality: We have $$frac{d}{dtheta}mathbb{E}_theta[hat theta(x)]=frac{d}{dtheta}int_mathbb{R}hat theta(x) delta_theta(x) ~~dx=int_mathbb{R}hat theta(x)~cdot~frac{partial}{partial theta} delta_theta(x)~cdot~delta_theta(x)^{-1}~cdot~delta_theta(x)~~dx=$$
$$int_mathbb{R}(hat theta(x)-mathbb{E}_theta[hat theta])~cdot~frac{partial}{partial theta} delta_theta(x)~cdot~delta_theta(x)^{-1}~cdot~delta_theta(x)~~dx ~~~text{(since}int_mathbb{R}[frac{partial}{partial theta} delta_theta(x)]delta_theta(x)^{-1}cdot~delta_theta(x)dx=0 text{)}$$
$$= int_mathbb{R}(hat theta(x)-mathbb{E}_theta[hat theta])~cdot~[T(x)-f'(theta)]~cdot~delta_theta(x)~~dxleq sqrt{int_mathbb{R}(hat theta(x)-mathbb{E}_theta[hat theta])^2~ delta_theta(x)~dx cdot int_mathbb{R}[T(x)-f'(theta)]^2~delta_theta(x)~~dx}=$$
$$sqrt{Var(hat theta) cdot int_mathbb{R}[T(x)-f'(theta)]^2~delta_theta(x)~~dx}$$
Whereas if we wanted equality we would need $$hat theta(x)-mathbb{E}_theta[hat theta]= C(theta)cdot[T(x)-f'(theta)] $$ where $C(theta)$ is a constant depending only on $theta$. At this point I am totally stuck. And I haven't even used equation ref{a}.
probability statistics
$endgroup$
add a comment |
$begingroup$
I'm trying to do the following exercise:
Let $$delta_theta(x)=S(x)exp (theta T(x)-f(theta))$$
be a family of densities for a random variable $X:Omegarightarrow
mathbb{R}$, where $theta$ takes values in an open interval, and
$T:mathbb{R}rightarrow mathbb{R}$ is a non-constant function.
Suppose the maximum likelihood estimator $hat{theta}(x)$ exists and has
finite variance for all $theta$. Prove that $hat{theta}(x)$ attains
the Cramer-Rao lower bound for all $theta$ if and only if
$f'(theta)$ has the form $alpha theta+beta$ for constants $alpha$
and $beta$.
At present, all i know about the MLE $hattheta$ is that, for given $xin mathbb{R}$, it is the solution to $$f'(hattheta) = T(x) label{a}tag{1}$$
On its face, the question doesn't make much sense to me, since the book hasn't given me any information about the bias of $hat{theta}$. So in principle, the Cramer-Rao lower bound $$Var(hat{theta})geq frac{[frac{d}{dtheta}(mathbb{E}_theta[hat{theta}(x)])]^2}{mathbb{E}_theta[( frac{partial}{partialtheta}log delta_theta(x))^2]}$$ isn't really a lower bound, since the numerator depends on the estimator $hat{theta}$. And the numerator could be zero, in principle.
Setting that aside, I think what the question wants me to do is just to write down the condition where the Cauchy-Schwarz inequality (used in the proof of the Cramer-Rao lower bound) gives equality: We have $$frac{d}{dtheta}mathbb{E}_theta[hat theta(x)]=frac{d}{dtheta}int_mathbb{R}hat theta(x) delta_theta(x) ~~dx=int_mathbb{R}hat theta(x)~cdot~frac{partial}{partial theta} delta_theta(x)~cdot~delta_theta(x)^{-1}~cdot~delta_theta(x)~~dx=$$
$$int_mathbb{R}(hat theta(x)-mathbb{E}_theta[hat theta])~cdot~frac{partial}{partial theta} delta_theta(x)~cdot~delta_theta(x)^{-1}~cdot~delta_theta(x)~~dx ~~~text{(since}int_mathbb{R}[frac{partial}{partial theta} delta_theta(x)]delta_theta(x)^{-1}cdot~delta_theta(x)dx=0 text{)}$$
$$= int_mathbb{R}(hat theta(x)-mathbb{E}_theta[hat theta])~cdot~[T(x)-f'(theta)]~cdot~delta_theta(x)~~dxleq sqrt{int_mathbb{R}(hat theta(x)-mathbb{E}_theta[hat theta])^2~ delta_theta(x)~dx cdot int_mathbb{R}[T(x)-f'(theta)]^2~delta_theta(x)~~dx}=$$
$$sqrt{Var(hat theta) cdot int_mathbb{R}[T(x)-f'(theta)]^2~delta_theta(x)~~dx}$$
Whereas if we wanted equality we would need $$hat theta(x)-mathbb{E}_theta[hat theta]= C(theta)cdot[T(x)-f'(theta)] $$ where $C(theta)$ is a constant depending only on $theta$. At this point I am totally stuck. And I haven't even used equation ref{a}.
probability statistics
$endgroup$
I'm trying to do the following exercise:
Let $$delta_theta(x)=S(x)exp (theta T(x)-f(theta))$$
be a family of densities for a random variable $X:Omegarightarrow
mathbb{R}$, where $theta$ takes values in an open interval, and
$T:mathbb{R}rightarrow mathbb{R}$ is a non-constant function.
Suppose the maximum likelihood estimator $hat{theta}(x)$ exists and has
finite variance for all $theta$. Prove that $hat{theta}(x)$ attains
the Cramer-Rao lower bound for all $theta$ if and only if
$f'(theta)$ has the form $alpha theta+beta$ for constants $alpha$
and $beta$.
At present, all i know about the MLE $hattheta$ is that, for given $xin mathbb{R}$, it is the solution to $$f'(hattheta) = T(x) label{a}tag{1}$$
On its face, the question doesn't make much sense to me, since the book hasn't given me any information about the bias of $hat{theta}$. So in principle, the Cramer-Rao lower bound $$Var(hat{theta})geq frac{[frac{d}{dtheta}(mathbb{E}_theta[hat{theta}(x)])]^2}{mathbb{E}_theta[( frac{partial}{partialtheta}log delta_theta(x))^2]}$$ isn't really a lower bound, since the numerator depends on the estimator $hat{theta}$. And the numerator could be zero, in principle.
Setting that aside, I think what the question wants me to do is just to write down the condition where the Cauchy-Schwarz inequality (used in the proof of the Cramer-Rao lower bound) gives equality: We have $$frac{d}{dtheta}mathbb{E}_theta[hat theta(x)]=frac{d}{dtheta}int_mathbb{R}hat theta(x) delta_theta(x) ~~dx=int_mathbb{R}hat theta(x)~cdot~frac{partial}{partial theta} delta_theta(x)~cdot~delta_theta(x)^{-1}~cdot~delta_theta(x)~~dx=$$
$$int_mathbb{R}(hat theta(x)-mathbb{E}_theta[hat theta])~cdot~frac{partial}{partial theta} delta_theta(x)~cdot~delta_theta(x)^{-1}~cdot~delta_theta(x)~~dx ~~~text{(since}int_mathbb{R}[frac{partial}{partial theta} delta_theta(x)]delta_theta(x)^{-1}cdot~delta_theta(x)dx=0 text{)}$$
$$= int_mathbb{R}(hat theta(x)-mathbb{E}_theta[hat theta])~cdot~[T(x)-f'(theta)]~cdot~delta_theta(x)~~dxleq sqrt{int_mathbb{R}(hat theta(x)-mathbb{E}_theta[hat theta])^2~ delta_theta(x)~dx cdot int_mathbb{R}[T(x)-f'(theta)]^2~delta_theta(x)~~dx}=$$
$$sqrt{Var(hat theta) cdot int_mathbb{R}[T(x)-f'(theta)]^2~delta_theta(x)~~dx}$$
Whereas if we wanted equality we would need $$hat theta(x)-mathbb{E}_theta[hat theta]= C(theta)cdot[T(x)-f'(theta)] $$ where $C(theta)$ is a constant depending only on $theta$. At this point I am totally stuck. And I haven't even used equation ref{a}.
probability statistics
probability statistics
edited Dec 10 '18 at 22:25
Tim kinsella
asked Dec 2 '18 at 13:33
Tim kinsellaTim kinsella
2,9871330
2,9871330
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
To start with I'll assume that $hat{theta}$ is unbiased for $theta$. I'll discuss the biased case at the end.
$textbf{Claim}$ Let $l = log(delta_theta(x))$. There exists an unbiased estimator $hat{theta}$, which attains the Cramér-Rao lower bound (under regularity conditions) if and only if
$$frac{partial l}{partial theta} = I(theta)(hat{theta} - {theta}).$$
I came across this statement and its proof in these lecture notes by Jonathan Marchini.
$textbf{Proof}$ Let $U = hat{theta}$ and $V = frac{partial l}{partial theta}$. In the proof of Cramér-Rao we derive that
$$Cov(U, V)^2 leq Var(U)Var(V).$$
We have equality here (and $hat{theta}$ attains the Cramér-Rao lower bound) if and only if $V = c_1 + c_2 U$. We know that $mathbb{E}(V) = 0$, so $c_1 = -c_2 mathbb{E}(U) = -c_2 mathbb{E}(hat{theta}) = -c_2 theta$. So $$V = frac{partial l}{partial theta} = c_2(hat{theta} - theta).$$ Now we want to calculate $c_2$. If we multiply both sides by $frac{partial l}{partial theta}$ and then take expectations, we get
$$mathbb{E}left[left(frac{partial l}{partial theta}right)^2right] = c_2mathbb{E}left[hat{theta}frac{partial l}{partial theta}right] - c_2thetamathbb{E}left[frac{partial l}{partial theta}right].$$
The LHS equals $I(theta)$, and the RHS equals
$$c_2 int hat{theta} frac{partial}{partial theta}left(delta_theta(x)right) dx - c_2theta cdot 0 = c_2 frac{partial }{partial theta}inthat{theta}delta_theta(x) dx = c_2 frac{partial }{partial theta}mathbb{E}(hat{theta}) = c_2 frac{partial }{partial theta}theta = c_2,$$
so the constant $c_2 = I(theta)$. Therefore
$$frac{partial l}{partial theta} = I(theta)(hat{theta} - theta).tag*{$blacksquare$}$$
Since
$$l = log(S(x)) + theta T(x) - f(theta),$$
we have
$$frac{partial l}{partial theta} = T(x) - f^prime(theta).$$
At $hat{theta}$, $frac{partial l}{partial theta} = 0$ so
$$T(x) = f^prime(hat{theta}).$$
We now calculate $frac{partial^2 l}{partial theta^2}$ in order to calculate the Fisher information:
$$frac{partial^2 l}{partial theta^2} = - f^{primeprime}(theta).$$
The information $I(theta)$ can be calculated as follows:
$$I(theta) = mathbb{E}left[-frac{partial^2 l}{partial theta^2}right] = mathbb{E}left[f^{primeprime}(theta)right] = f^{primeprime}(theta).$$
Putting everything together, we have that $hat{theta}$ is an unbiased estimator for $theta$ if and only if
$$f^prime(hat{theta}) - f^prime(theta) = T(x) - f^prime(theta) = frac{partial l}{partial theta} = f^{primeprime}(theta)(hat{theta} - theta).$$
Now we want to solve this differential equation
$$f^prime(y) - f^prime(x) = f^{primeprime}(x) (y - x).$$
Let $g = f^prime$. Then
$$g(y) - g(x) = g^{prime}(x) (y - x).$$
This is bijective since $g^prime$ is always strictly positive or negative. Making sure that $y neq x$ we can then solve this:
$$intfrac{1}{y - x} dx = int frac{g^prime(x)}{g(y) - g(x)} dx$$
so
$$-int frac{d}{dx} (log(y - x)) dx = - int frac{d}{dx} log(g(y) - g(x)) dx$$
so
$$log(y - x) + C = log(g(y) - g(x))$$
so
$$alpha(y - x) = g(y) - g(x)$$
so
$$g(x) = alpha x + beta = f^prime(x).$$
Thinking about if the estimator is biased, we would replace the equation
$$frac{partial l}{partial theta} = f^{primeprime}(theta)(hat{theta} - theta)$$
with
$$frac{partial l}{partial theta} = f^{primeprime}(theta)frac{hat{theta} - theta - b(theta)}{1 + b^prime(theta)},$$
where $b(theta)$ is the bias of $hat{theta}$, so I'm assuming that the differential equation has different solutions to $f^prime(x) = alpha x + b$.
$endgroup$
$begingroup$
Hey Alex, Thanks for your help. Do you know what the author means at the bottom of the fifth slide when he says "not all MLE's attain the CRLB because not all MLE's are unbiased."? Do you think its possible that when someone talks about an estimator achieving the CRLB, they automatically assume the estimator is unbiased?
$endgroup$
– Tim kinsella
Dec 10 '18 at 1:49
$begingroup$
Hey Tim, if the MLE is biased then it can't reach the $1/I(theta)$ lower bound. A biased estimator can reach the $(1 + b^prime(theta))^2/I(theta)$ lower bound as long as $frac{partial l}{partial theta} = c_1 + c_2 hat{theta}$. For Marchini's statement to make sense, I think that he must be referring to the unbiased CRLB. I would guess that if someone wants to know if an estimator achieves the CRLB, they are talking about the unbiased version.
$endgroup$
– Alex
Dec 10 '18 at 7:45
$begingroup$
yep, your proof nails the unbiased cased (+1). The biased case is still a mystery to me. I've consulted with the author actually and he says unbiased is not required, so I'm stumped on that. Btw, re: your last comment. can't b′(θ)<0? So if we call 1/I(θ) the CRLB, can't a biased estimator actually go below it?... its all very confusing
$endgroup$
– Tim kinsella
Dec 10 '18 at 22:30
$begingroup$
The variance of $hat{theta}$ can certainly be below $1/I(theta)$ if $b^prime(theta) < 0$ (extreme example: if $hat{theta} = 0$ then $b(theta) = -theta$ and $Var(hat{theta}) = 0$).
$endgroup$
– Alex
Dec 11 '18 at 22:32
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3022648%2fwhen-an-mle-attains-the-cramer-rao-bound-for-an-exponential-family%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
To start with I'll assume that $hat{theta}$ is unbiased for $theta$. I'll discuss the biased case at the end.
$textbf{Claim}$ Let $l = log(delta_theta(x))$. There exists an unbiased estimator $hat{theta}$, which attains the Cramér-Rao lower bound (under regularity conditions) if and only if
$$frac{partial l}{partial theta} = I(theta)(hat{theta} - {theta}).$$
I came across this statement and its proof in these lecture notes by Jonathan Marchini.
$textbf{Proof}$ Let $U = hat{theta}$ and $V = frac{partial l}{partial theta}$. In the proof of Cramér-Rao we derive that
$$Cov(U, V)^2 leq Var(U)Var(V).$$
We have equality here (and $hat{theta}$ attains the Cramér-Rao lower bound) if and only if $V = c_1 + c_2 U$. We know that $mathbb{E}(V) = 0$, so $c_1 = -c_2 mathbb{E}(U) = -c_2 mathbb{E}(hat{theta}) = -c_2 theta$. So $$V = frac{partial l}{partial theta} = c_2(hat{theta} - theta).$$ Now we want to calculate $c_2$. If we multiply both sides by $frac{partial l}{partial theta}$ and then take expectations, we get
$$mathbb{E}left[left(frac{partial l}{partial theta}right)^2right] = c_2mathbb{E}left[hat{theta}frac{partial l}{partial theta}right] - c_2thetamathbb{E}left[frac{partial l}{partial theta}right].$$
The LHS equals $I(theta)$, and the RHS equals
$$c_2 int hat{theta} frac{partial}{partial theta}left(delta_theta(x)right) dx - c_2theta cdot 0 = c_2 frac{partial }{partial theta}inthat{theta}delta_theta(x) dx = c_2 frac{partial }{partial theta}mathbb{E}(hat{theta}) = c_2 frac{partial }{partial theta}theta = c_2,$$
so the constant $c_2 = I(theta)$. Therefore
$$frac{partial l}{partial theta} = I(theta)(hat{theta} - theta).tag*{$blacksquare$}$$
Since
$$l = log(S(x)) + theta T(x) - f(theta),$$
we have
$$frac{partial l}{partial theta} = T(x) - f^prime(theta).$$
At $hat{theta}$, $frac{partial l}{partial theta} = 0$ so
$$T(x) = f^prime(hat{theta}).$$
We now calculate $frac{partial^2 l}{partial theta^2}$ in order to calculate the Fisher information:
$$frac{partial^2 l}{partial theta^2} = - f^{primeprime}(theta).$$
The information $I(theta)$ can be calculated as follows:
$$I(theta) = mathbb{E}left[-frac{partial^2 l}{partial theta^2}right] = mathbb{E}left[f^{primeprime}(theta)right] = f^{primeprime}(theta).$$
Putting everything together, we have that $hat{theta}$ is an unbiased estimator for $theta$ if and only if
$$f^prime(hat{theta}) - f^prime(theta) = T(x) - f^prime(theta) = frac{partial l}{partial theta} = f^{primeprime}(theta)(hat{theta} - theta).$$
Now we want to solve this differential equation
$$f^prime(y) - f^prime(x) = f^{primeprime}(x) (y - x).$$
Let $g = f^prime$. Then
$$g(y) - g(x) = g^{prime}(x) (y - x).$$
This is bijective since $g^prime$ is always strictly positive or negative. Making sure that $y neq x$ we can then solve this:
$$intfrac{1}{y - x} dx = int frac{g^prime(x)}{g(y) - g(x)} dx$$
so
$$-int frac{d}{dx} (log(y - x)) dx = - int frac{d}{dx} log(g(y) - g(x)) dx$$
so
$$log(y - x) + C = log(g(y) - g(x))$$
so
$$alpha(y - x) = g(y) - g(x)$$
so
$$g(x) = alpha x + beta = f^prime(x).$$
Thinking about if the estimator is biased, we would replace the equation
$$frac{partial l}{partial theta} = f^{primeprime}(theta)(hat{theta} - theta)$$
with
$$frac{partial l}{partial theta} = f^{primeprime}(theta)frac{hat{theta} - theta - b(theta)}{1 + b^prime(theta)},$$
where $b(theta)$ is the bias of $hat{theta}$, so I'm assuming that the differential equation has different solutions to $f^prime(x) = alpha x + b$.
$endgroup$
$begingroup$
Hey Alex, Thanks for your help. Do you know what the author means at the bottom of the fifth slide when he says "not all MLE's attain the CRLB because not all MLE's are unbiased."? Do you think its possible that when someone talks about an estimator achieving the CRLB, they automatically assume the estimator is unbiased?
$endgroup$
– Tim kinsella
Dec 10 '18 at 1:49
$begingroup$
Hey Tim, if the MLE is biased then it can't reach the $1/I(theta)$ lower bound. A biased estimator can reach the $(1 + b^prime(theta))^2/I(theta)$ lower bound as long as $frac{partial l}{partial theta} = c_1 + c_2 hat{theta}$. For Marchini's statement to make sense, I think that he must be referring to the unbiased CRLB. I would guess that if someone wants to know if an estimator achieves the CRLB, they are talking about the unbiased version.
$endgroup$
– Alex
Dec 10 '18 at 7:45
$begingroup$
yep, your proof nails the unbiased cased (+1). The biased case is still a mystery to me. I've consulted with the author actually and he says unbiased is not required, so I'm stumped on that. Btw, re: your last comment. can't b′(θ)<0? So if we call 1/I(θ) the CRLB, can't a biased estimator actually go below it?... its all very confusing
$endgroup$
– Tim kinsella
Dec 10 '18 at 22:30
$begingroup$
The variance of $hat{theta}$ can certainly be below $1/I(theta)$ if $b^prime(theta) < 0$ (extreme example: if $hat{theta} = 0$ then $b(theta) = -theta$ and $Var(hat{theta}) = 0$).
$endgroup$
– Alex
Dec 11 '18 at 22:32
add a comment |
$begingroup$
To start with I'll assume that $hat{theta}$ is unbiased for $theta$. I'll discuss the biased case at the end.
$textbf{Claim}$ Let $l = log(delta_theta(x))$. There exists an unbiased estimator $hat{theta}$, which attains the Cramér-Rao lower bound (under regularity conditions) if and only if
$$frac{partial l}{partial theta} = I(theta)(hat{theta} - {theta}).$$
I came across this statement and its proof in these lecture notes by Jonathan Marchini.
$textbf{Proof}$ Let $U = hat{theta}$ and $V = frac{partial l}{partial theta}$. In the proof of Cramér-Rao we derive that
$$Cov(U, V)^2 leq Var(U)Var(V).$$
We have equality here (and $hat{theta}$ attains the Cramér-Rao lower bound) if and only if $V = c_1 + c_2 U$. We know that $mathbb{E}(V) = 0$, so $c_1 = -c_2 mathbb{E}(U) = -c_2 mathbb{E}(hat{theta}) = -c_2 theta$. So $$V = frac{partial l}{partial theta} = c_2(hat{theta} - theta).$$ Now we want to calculate $c_2$. If we multiply both sides by $frac{partial l}{partial theta}$ and then take expectations, we get
$$mathbb{E}left[left(frac{partial l}{partial theta}right)^2right] = c_2mathbb{E}left[hat{theta}frac{partial l}{partial theta}right] - c_2thetamathbb{E}left[frac{partial l}{partial theta}right].$$
The LHS equals $I(theta)$, and the RHS equals
$$c_2 int hat{theta} frac{partial}{partial theta}left(delta_theta(x)right) dx - c_2theta cdot 0 = c_2 frac{partial }{partial theta}inthat{theta}delta_theta(x) dx = c_2 frac{partial }{partial theta}mathbb{E}(hat{theta}) = c_2 frac{partial }{partial theta}theta = c_2,$$
so the constant $c_2 = I(theta)$. Therefore
$$frac{partial l}{partial theta} = I(theta)(hat{theta} - theta).tag*{$blacksquare$}$$
Since
$$l = log(S(x)) + theta T(x) - f(theta),$$
we have
$$frac{partial l}{partial theta} = T(x) - f^prime(theta).$$
At $hat{theta}$, $frac{partial l}{partial theta} = 0$ so
$$T(x) = f^prime(hat{theta}).$$
We now calculate $frac{partial^2 l}{partial theta^2}$ in order to calculate the Fisher information:
$$frac{partial^2 l}{partial theta^2} = - f^{primeprime}(theta).$$
The information $I(theta)$ can be calculated as follows:
$$I(theta) = mathbb{E}left[-frac{partial^2 l}{partial theta^2}right] = mathbb{E}left[f^{primeprime}(theta)right] = f^{primeprime}(theta).$$
Putting everything together, we have that $hat{theta}$ is an unbiased estimator for $theta$ if and only if
$$f^prime(hat{theta}) - f^prime(theta) = T(x) - f^prime(theta) = frac{partial l}{partial theta} = f^{primeprime}(theta)(hat{theta} - theta).$$
Now we want to solve this differential equation
$$f^prime(y) - f^prime(x) = f^{primeprime}(x) (y - x).$$
Let $g = f^prime$. Then
$$g(y) - g(x) = g^{prime}(x) (y - x).$$
This is bijective since $g^prime$ is always strictly positive or negative. Making sure that $y neq x$ we can then solve this:
$$intfrac{1}{y - x} dx = int frac{g^prime(x)}{g(y) - g(x)} dx$$
so
$$-int frac{d}{dx} (log(y - x)) dx = - int frac{d}{dx} log(g(y) - g(x)) dx$$
so
$$log(y - x) + C = log(g(y) - g(x))$$
so
$$alpha(y - x) = g(y) - g(x)$$
so
$$g(x) = alpha x + beta = f^prime(x).$$
Thinking about if the estimator is biased, we would replace the equation
$$frac{partial l}{partial theta} = f^{primeprime}(theta)(hat{theta} - theta)$$
with
$$frac{partial l}{partial theta} = f^{primeprime}(theta)frac{hat{theta} - theta - b(theta)}{1 + b^prime(theta)},$$
where $b(theta)$ is the bias of $hat{theta}$, so I'm assuming that the differential equation has different solutions to $f^prime(x) = alpha x + b$.
$endgroup$
$begingroup$
Hey Alex, Thanks for your help. Do you know what the author means at the bottom of the fifth slide when he says "not all MLE's attain the CRLB because not all MLE's are unbiased."? Do you think its possible that when someone talks about an estimator achieving the CRLB, they automatically assume the estimator is unbiased?
$endgroup$
– Tim kinsella
Dec 10 '18 at 1:49
$begingroup$
Hey Tim, if the MLE is biased then it can't reach the $1/I(theta)$ lower bound. A biased estimator can reach the $(1 + b^prime(theta))^2/I(theta)$ lower bound as long as $frac{partial l}{partial theta} = c_1 + c_2 hat{theta}$. For Marchini's statement to make sense, I think that he must be referring to the unbiased CRLB. I would guess that if someone wants to know if an estimator achieves the CRLB, they are talking about the unbiased version.
$endgroup$
– Alex
Dec 10 '18 at 7:45
$begingroup$
yep, your proof nails the unbiased cased (+1). The biased case is still a mystery to me. I've consulted with the author actually and he says unbiased is not required, so I'm stumped on that. Btw, re: your last comment. can't b′(θ)<0? So if we call 1/I(θ) the CRLB, can't a biased estimator actually go below it?... its all very confusing
$endgroup$
– Tim kinsella
Dec 10 '18 at 22:30
$begingroup$
The variance of $hat{theta}$ can certainly be below $1/I(theta)$ if $b^prime(theta) < 0$ (extreme example: if $hat{theta} = 0$ then $b(theta) = -theta$ and $Var(hat{theta}) = 0$).
$endgroup$
– Alex
Dec 11 '18 at 22:32
add a comment |
$begingroup$
To start with I'll assume that $hat{theta}$ is unbiased for $theta$. I'll discuss the biased case at the end.
$textbf{Claim}$ Let $l = log(delta_theta(x))$. There exists an unbiased estimator $hat{theta}$, which attains the Cramér-Rao lower bound (under regularity conditions) if and only if
$$frac{partial l}{partial theta} = I(theta)(hat{theta} - {theta}).$$
I came across this statement and its proof in these lecture notes by Jonathan Marchini.
$textbf{Proof}$ Let $U = hat{theta}$ and $V = frac{partial l}{partial theta}$. In the proof of Cramér-Rao we derive that
$$Cov(U, V)^2 leq Var(U)Var(V).$$
We have equality here (and $hat{theta}$ attains the Cramér-Rao lower bound) if and only if $V = c_1 + c_2 U$. We know that $mathbb{E}(V) = 0$, so $c_1 = -c_2 mathbb{E}(U) = -c_2 mathbb{E}(hat{theta}) = -c_2 theta$. So $$V = frac{partial l}{partial theta} = c_2(hat{theta} - theta).$$ Now we want to calculate $c_2$. If we multiply both sides by $frac{partial l}{partial theta}$ and then take expectations, we get
$$mathbb{E}left[left(frac{partial l}{partial theta}right)^2right] = c_2mathbb{E}left[hat{theta}frac{partial l}{partial theta}right] - c_2thetamathbb{E}left[frac{partial l}{partial theta}right].$$
The LHS equals $I(theta)$, and the RHS equals
$$c_2 int hat{theta} frac{partial}{partial theta}left(delta_theta(x)right) dx - c_2theta cdot 0 = c_2 frac{partial }{partial theta}inthat{theta}delta_theta(x) dx = c_2 frac{partial }{partial theta}mathbb{E}(hat{theta}) = c_2 frac{partial }{partial theta}theta = c_2,$$
so the constant $c_2 = I(theta)$. Therefore
$$frac{partial l}{partial theta} = I(theta)(hat{theta} - theta).tag*{$blacksquare$}$$
Since
$$l = log(S(x)) + theta T(x) - f(theta),$$
we have
$$frac{partial l}{partial theta} = T(x) - f^prime(theta).$$
At $hat{theta}$, $frac{partial l}{partial theta} = 0$ so
$$T(x) = f^prime(hat{theta}).$$
We now calculate $frac{partial^2 l}{partial theta^2}$ in order to calculate the Fisher information:
$$frac{partial^2 l}{partial theta^2} = - f^{primeprime}(theta).$$
The information $I(theta)$ can be calculated as follows:
$$I(theta) = mathbb{E}left[-frac{partial^2 l}{partial theta^2}right] = mathbb{E}left[f^{primeprime}(theta)right] = f^{primeprime}(theta).$$
Putting everything together, we have that $hat{theta}$ is an unbiased estimator for $theta$ if and only if
$$f^prime(hat{theta}) - f^prime(theta) = T(x) - f^prime(theta) = frac{partial l}{partial theta} = f^{primeprime}(theta)(hat{theta} - theta).$$
Now we want to solve this differential equation
$$f^prime(y) - f^prime(x) = f^{primeprime}(x) (y - x).$$
Let $g = f^prime$. Then
$$g(y) - g(x) = g^{prime}(x) (y - x).$$
This is bijective since $g^prime$ is always strictly positive or negative. Making sure that $y neq x$ we can then solve this:
$$intfrac{1}{y - x} dx = int frac{g^prime(x)}{g(y) - g(x)} dx$$
so
$$-int frac{d}{dx} (log(y - x)) dx = - int frac{d}{dx} log(g(y) - g(x)) dx$$
so
$$log(y - x) + C = log(g(y) - g(x))$$
so
$$alpha(y - x) = g(y) - g(x)$$
so
$$g(x) = alpha x + beta = f^prime(x).$$
Thinking about if the estimator is biased, we would replace the equation
$$frac{partial l}{partial theta} = f^{primeprime}(theta)(hat{theta} - theta)$$
with
$$frac{partial l}{partial theta} = f^{primeprime}(theta)frac{hat{theta} - theta - b(theta)}{1 + b^prime(theta)},$$
where $b(theta)$ is the bias of $hat{theta}$, so I'm assuming that the differential equation has different solutions to $f^prime(x) = alpha x + b$.
$endgroup$
To start with I'll assume that $hat{theta}$ is unbiased for $theta$. I'll discuss the biased case at the end.
$textbf{Claim}$ Let $l = log(delta_theta(x))$. There exists an unbiased estimator $hat{theta}$, which attains the Cramér-Rao lower bound (under regularity conditions) if and only if
$$frac{partial l}{partial theta} = I(theta)(hat{theta} - {theta}).$$
I came across this statement and its proof in these lecture notes by Jonathan Marchini.
$textbf{Proof}$ Let $U = hat{theta}$ and $V = frac{partial l}{partial theta}$. In the proof of Cramér-Rao we derive that
$$Cov(U, V)^2 leq Var(U)Var(V).$$
We have equality here (and $hat{theta}$ attains the Cramér-Rao lower bound) if and only if $V = c_1 + c_2 U$. We know that $mathbb{E}(V) = 0$, so $c_1 = -c_2 mathbb{E}(U) = -c_2 mathbb{E}(hat{theta}) = -c_2 theta$. So $$V = frac{partial l}{partial theta} = c_2(hat{theta} - theta).$$ Now we want to calculate $c_2$. If we multiply both sides by $frac{partial l}{partial theta}$ and then take expectations, we get
$$mathbb{E}left[left(frac{partial l}{partial theta}right)^2right] = c_2mathbb{E}left[hat{theta}frac{partial l}{partial theta}right] - c_2thetamathbb{E}left[frac{partial l}{partial theta}right].$$
The LHS equals $I(theta)$, and the RHS equals
$$c_2 int hat{theta} frac{partial}{partial theta}left(delta_theta(x)right) dx - c_2theta cdot 0 = c_2 frac{partial }{partial theta}inthat{theta}delta_theta(x) dx = c_2 frac{partial }{partial theta}mathbb{E}(hat{theta}) = c_2 frac{partial }{partial theta}theta = c_2,$$
so the constant $c_2 = I(theta)$. Therefore
$$frac{partial l}{partial theta} = I(theta)(hat{theta} - theta).tag*{$blacksquare$}$$
Since
$$l = log(S(x)) + theta T(x) - f(theta),$$
we have
$$frac{partial l}{partial theta} = T(x) - f^prime(theta).$$
At $hat{theta}$, $frac{partial l}{partial theta} = 0$ so
$$T(x) = f^prime(hat{theta}).$$
We now calculate $frac{partial^2 l}{partial theta^2}$ in order to calculate the Fisher information:
$$frac{partial^2 l}{partial theta^2} = - f^{primeprime}(theta).$$
The information $I(theta)$ can be calculated as follows:
$$I(theta) = mathbb{E}left[-frac{partial^2 l}{partial theta^2}right] = mathbb{E}left[f^{primeprime}(theta)right] = f^{primeprime}(theta).$$
Putting everything together, we have that $hat{theta}$ is an unbiased estimator for $theta$ if and only if
$$f^prime(hat{theta}) - f^prime(theta) = T(x) - f^prime(theta) = frac{partial l}{partial theta} = f^{primeprime}(theta)(hat{theta} - theta).$$
Now we want to solve this differential equation
$$f^prime(y) - f^prime(x) = f^{primeprime}(x) (y - x).$$
Let $g = f^prime$. Then
$$g(y) - g(x) = g^{prime}(x) (y - x).$$
This is bijective since $g^prime$ is always strictly positive or negative. Making sure that $y neq x$ we can then solve this:
$$intfrac{1}{y - x} dx = int frac{g^prime(x)}{g(y) - g(x)} dx$$
so
$$-int frac{d}{dx} (log(y - x)) dx = - int frac{d}{dx} log(g(y) - g(x)) dx$$
so
$$log(y - x) + C = log(g(y) - g(x))$$
so
$$alpha(y - x) = g(y) - g(x)$$
so
$$g(x) = alpha x + beta = f^prime(x).$$
Thinking about if the estimator is biased, we would replace the equation
$$frac{partial l}{partial theta} = f^{primeprime}(theta)(hat{theta} - theta)$$
with
$$frac{partial l}{partial theta} = f^{primeprime}(theta)frac{hat{theta} - theta - b(theta)}{1 + b^prime(theta)},$$
where $b(theta)$ is the bias of $hat{theta}$, so I'm assuming that the differential equation has different solutions to $f^prime(x) = alpha x + b$.
answered Dec 9 '18 at 17:21
AlexAlex
559312
559312
$begingroup$
Hey Alex, Thanks for your help. Do you know what the author means at the bottom of the fifth slide when he says "not all MLE's attain the CRLB because not all MLE's are unbiased."? Do you think its possible that when someone talks about an estimator achieving the CRLB, they automatically assume the estimator is unbiased?
$endgroup$
– Tim kinsella
Dec 10 '18 at 1:49
$begingroup$
Hey Tim, if the MLE is biased then it can't reach the $1/I(theta)$ lower bound. A biased estimator can reach the $(1 + b^prime(theta))^2/I(theta)$ lower bound as long as $frac{partial l}{partial theta} = c_1 + c_2 hat{theta}$. For Marchini's statement to make sense, I think that he must be referring to the unbiased CRLB. I would guess that if someone wants to know if an estimator achieves the CRLB, they are talking about the unbiased version.
$endgroup$
– Alex
Dec 10 '18 at 7:45
$begingroup$
yep, your proof nails the unbiased cased (+1). The biased case is still a mystery to me. I've consulted with the author actually and he says unbiased is not required, so I'm stumped on that. Btw, re: your last comment. can't b′(θ)<0? So if we call 1/I(θ) the CRLB, can't a biased estimator actually go below it?... its all very confusing
$endgroup$
– Tim kinsella
Dec 10 '18 at 22:30
$begingroup$
The variance of $hat{theta}$ can certainly be below $1/I(theta)$ if $b^prime(theta) < 0$ (extreme example: if $hat{theta} = 0$ then $b(theta) = -theta$ and $Var(hat{theta}) = 0$).
$endgroup$
– Alex
Dec 11 '18 at 22:32
add a comment |
$begingroup$
Hey Alex, Thanks for your help. Do you know what the author means at the bottom of the fifth slide when he says "not all MLE's attain the CRLB because not all MLE's are unbiased."? Do you think its possible that when someone talks about an estimator achieving the CRLB, they automatically assume the estimator is unbiased?
$endgroup$
– Tim kinsella
Dec 10 '18 at 1:49
$begingroup$
Hey Tim, if the MLE is biased then it can't reach the $1/I(theta)$ lower bound. A biased estimator can reach the $(1 + b^prime(theta))^2/I(theta)$ lower bound as long as $frac{partial l}{partial theta} = c_1 + c_2 hat{theta}$. For Marchini's statement to make sense, I think that he must be referring to the unbiased CRLB. I would guess that if someone wants to know if an estimator achieves the CRLB, they are talking about the unbiased version.
$endgroup$
– Alex
Dec 10 '18 at 7:45
$begingroup$
yep, your proof nails the unbiased cased (+1). The biased case is still a mystery to me. I've consulted with the author actually and he says unbiased is not required, so I'm stumped on that. Btw, re: your last comment. can't b′(θ)<0? So if we call 1/I(θ) the CRLB, can't a biased estimator actually go below it?... its all very confusing
$endgroup$
– Tim kinsella
Dec 10 '18 at 22:30
$begingroup$
The variance of $hat{theta}$ can certainly be below $1/I(theta)$ if $b^prime(theta) < 0$ (extreme example: if $hat{theta} = 0$ then $b(theta) = -theta$ and $Var(hat{theta}) = 0$).
$endgroup$
– Alex
Dec 11 '18 at 22:32
$begingroup$
Hey Alex, Thanks for your help. Do you know what the author means at the bottom of the fifth slide when he says "not all MLE's attain the CRLB because not all MLE's are unbiased."? Do you think its possible that when someone talks about an estimator achieving the CRLB, they automatically assume the estimator is unbiased?
$endgroup$
– Tim kinsella
Dec 10 '18 at 1:49
$begingroup$
Hey Alex, Thanks for your help. Do you know what the author means at the bottom of the fifth slide when he says "not all MLE's attain the CRLB because not all MLE's are unbiased."? Do you think its possible that when someone talks about an estimator achieving the CRLB, they automatically assume the estimator is unbiased?
$endgroup$
– Tim kinsella
Dec 10 '18 at 1:49
$begingroup$
Hey Tim, if the MLE is biased then it can't reach the $1/I(theta)$ lower bound. A biased estimator can reach the $(1 + b^prime(theta))^2/I(theta)$ lower bound as long as $frac{partial l}{partial theta} = c_1 + c_2 hat{theta}$. For Marchini's statement to make sense, I think that he must be referring to the unbiased CRLB. I would guess that if someone wants to know if an estimator achieves the CRLB, they are talking about the unbiased version.
$endgroup$
– Alex
Dec 10 '18 at 7:45
$begingroup$
Hey Tim, if the MLE is biased then it can't reach the $1/I(theta)$ lower bound. A biased estimator can reach the $(1 + b^prime(theta))^2/I(theta)$ lower bound as long as $frac{partial l}{partial theta} = c_1 + c_2 hat{theta}$. For Marchini's statement to make sense, I think that he must be referring to the unbiased CRLB. I would guess that if someone wants to know if an estimator achieves the CRLB, they are talking about the unbiased version.
$endgroup$
– Alex
Dec 10 '18 at 7:45
$begingroup$
yep, your proof nails the unbiased cased (+1). The biased case is still a mystery to me. I've consulted with the author actually and he says unbiased is not required, so I'm stumped on that. Btw, re: your last comment. can't b′(θ)<0? So if we call 1/I(θ) the CRLB, can't a biased estimator actually go below it?... its all very confusing
$endgroup$
– Tim kinsella
Dec 10 '18 at 22:30
$begingroup$
yep, your proof nails the unbiased cased (+1). The biased case is still a mystery to me. I've consulted with the author actually and he says unbiased is not required, so I'm stumped on that. Btw, re: your last comment. can't b′(θ)<0? So if we call 1/I(θ) the CRLB, can't a biased estimator actually go below it?... its all very confusing
$endgroup$
– Tim kinsella
Dec 10 '18 at 22:30
$begingroup$
The variance of $hat{theta}$ can certainly be below $1/I(theta)$ if $b^prime(theta) < 0$ (extreme example: if $hat{theta} = 0$ then $b(theta) = -theta$ and $Var(hat{theta}) = 0$).
$endgroup$
– Alex
Dec 11 '18 at 22:32
$begingroup$
The variance of $hat{theta}$ can certainly be below $1/I(theta)$ if $b^prime(theta) < 0$ (extreme example: if $hat{theta} = 0$ then $b(theta) = -theta$ and $Var(hat{theta}) = 0$).
$endgroup$
– Alex
Dec 11 '18 at 22:32
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3022648%2fwhen-an-mle-attains-the-cramer-rao-bound-for-an-exponential-family%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown