Matrix Differentiation of Kronecker Product
$begingroup$
I have a question about differentiating an expression which has multiple kronecker products.
I have the following objective function I would like to differentiate with respect to $mathbf{Q}$:
begin{equation*}
lVertmathbf{y}-mathbf{A}(mathbf{Q}otimesmathbf{Q}otimesmathbf{Q}otimesmathbf{Q})mathbf{x}rVert^2_2
end{equation*}
where $mathbf{y}inmathbb{R}^m$, $mathbf{A}inmathbb{R}^{mtimes K^4}$, $mathbf{Q}inmathbb{R}^{Ktimes K}$ and $mathbf{x}inmathbb{R}^{K^4}$. I am confused with how the chain rule works with respect to matrix differentiation. This is how I proceeded:
Let $ f=lVertmathbf{y}-mathbf{A}(mathbf{Q}otimesmathbf{Q}otimesmathbf{Q}otimesmathbf{Q})mathbf{x}rVert^2_2$
and $mathbf{B}=mathbf{Q}otimesmathbf{Q}otimesmathbf{Q}otimesmathbf{Q}$. Therefore $frac{df}{dmathbf{Q}}=frac{df}{dmathbf{B}}frac{dmathbf{B}}{dmathbf{Q}}$
When I calculate $frac{df}{dmathbf{B}}=mathbf{A}^T(mathbf{y}-mathbf{ABx})mathbf{x}^T$ I gain a $mathbb{R}^{K^4times K^4}$ matrix not a $mathbb{R}^{Ktimes K}$ matrix that I am hoping for.
Therefore I am using the chain rule wrong because of the change in dimensions i.e scalar to matrix.
Thank you for your help in advance.
matrix-calculus chain-rule kronecker-product
$endgroup$
|
show 3 more comments
$begingroup$
I have a question about differentiating an expression which has multiple kronecker products.
I have the following objective function I would like to differentiate with respect to $mathbf{Q}$:
begin{equation*}
lVertmathbf{y}-mathbf{A}(mathbf{Q}otimesmathbf{Q}otimesmathbf{Q}otimesmathbf{Q})mathbf{x}rVert^2_2
end{equation*}
where $mathbf{y}inmathbb{R}^m$, $mathbf{A}inmathbb{R}^{mtimes K^4}$, $mathbf{Q}inmathbb{R}^{Ktimes K}$ and $mathbf{x}inmathbb{R}^{K^4}$. I am confused with how the chain rule works with respect to matrix differentiation. This is how I proceeded:
Let $ f=lVertmathbf{y}-mathbf{A}(mathbf{Q}otimesmathbf{Q}otimesmathbf{Q}otimesmathbf{Q})mathbf{x}rVert^2_2$
and $mathbf{B}=mathbf{Q}otimesmathbf{Q}otimesmathbf{Q}otimesmathbf{Q}$. Therefore $frac{df}{dmathbf{Q}}=frac{df}{dmathbf{B}}frac{dmathbf{B}}{dmathbf{Q}}$
When I calculate $frac{df}{dmathbf{B}}=mathbf{A}^T(mathbf{y}-mathbf{ABx})mathbf{x}^T$ I gain a $mathbb{R}^{K^4times K^4}$ matrix not a $mathbb{R}^{Ktimes K}$ matrix that I am hoping for.
Therefore I am using the chain rule wrong because of the change in dimensions i.e scalar to matrix.
Thank you for your help in advance.
matrix-calculus chain-rule kronecker-product
$endgroup$
$begingroup$
Of course $df/dB$ is $K^4times K^4$. $B$ itself is $K^4times K^4$. There is no mystery here
$endgroup$
– Federico
Dec 13 '18 at 16:50
$begingroup$
Thanks for your comments! In reply to your first comment I agree there is no mystery but I was lead to believe a scalar-matrix differential has the same dimensions as the matrix i.e $dim(frac{df}{dmathbf{Q}})=dim(mathbf{Q})$ therefore it seems like the chain rule in this case does not preserve dimensionality. Secondly thanks for your longer answer, that definitely clears up part of my question :)
$endgroup$
– shex95
Dec 13 '18 at 17:04
1
$begingroup$
Yes, $frac{df}{dQ}$ has the same dimension as $Q$, that is correct. The problem is how you interpret $frac{df}{dB}frac{dB}{dQ}$. The former is a $k^4times k^4$ matrix, the second a $(k^4times k^4)times(ktimes k)$ tensor, and you have to contract the correct indices to obtain the chain rule: $$ frac{df}{dQ_{ij}} = sum_{k,l=1}^{k^4} frac{df}{dB_{kl}} frac{dB_{kl}}{dQ_{ij}} = sum_{k,l=1}^{k^4} left(frac{df}{dB}right)_{kl} frac{dB_{kl}}{dQ_{ij}} $$
$endgroup$
– Federico
Dec 13 '18 at 17:15
$begingroup$
Thanks Federico, you have cleared things up there. I appreciate your help.
$endgroup$
– shex95
Dec 13 '18 at 17:17
$begingroup$
So the contraction between $frac{df}{dB}$ and $frac{dB}{dQ}$ is what some people call double dot product. But my advice is to write out the indices in order to not commit mistakes
$endgroup$
– Federico
Dec 13 '18 at 17:18
|
show 3 more comments
$begingroup$
I have a question about differentiating an expression which has multiple kronecker products.
I have the following objective function I would like to differentiate with respect to $mathbf{Q}$:
begin{equation*}
lVertmathbf{y}-mathbf{A}(mathbf{Q}otimesmathbf{Q}otimesmathbf{Q}otimesmathbf{Q})mathbf{x}rVert^2_2
end{equation*}
where $mathbf{y}inmathbb{R}^m$, $mathbf{A}inmathbb{R}^{mtimes K^4}$, $mathbf{Q}inmathbb{R}^{Ktimes K}$ and $mathbf{x}inmathbb{R}^{K^4}$. I am confused with how the chain rule works with respect to matrix differentiation. This is how I proceeded:
Let $ f=lVertmathbf{y}-mathbf{A}(mathbf{Q}otimesmathbf{Q}otimesmathbf{Q}otimesmathbf{Q})mathbf{x}rVert^2_2$
and $mathbf{B}=mathbf{Q}otimesmathbf{Q}otimesmathbf{Q}otimesmathbf{Q}$. Therefore $frac{df}{dmathbf{Q}}=frac{df}{dmathbf{B}}frac{dmathbf{B}}{dmathbf{Q}}$
When I calculate $frac{df}{dmathbf{B}}=mathbf{A}^T(mathbf{y}-mathbf{ABx})mathbf{x}^T$ I gain a $mathbb{R}^{K^4times K^4}$ matrix not a $mathbb{R}^{Ktimes K}$ matrix that I am hoping for.
Therefore I am using the chain rule wrong because of the change in dimensions i.e scalar to matrix.
Thank you for your help in advance.
matrix-calculus chain-rule kronecker-product
$endgroup$
I have a question about differentiating an expression which has multiple kronecker products.
I have the following objective function I would like to differentiate with respect to $mathbf{Q}$:
begin{equation*}
lVertmathbf{y}-mathbf{A}(mathbf{Q}otimesmathbf{Q}otimesmathbf{Q}otimesmathbf{Q})mathbf{x}rVert^2_2
end{equation*}
where $mathbf{y}inmathbb{R}^m$, $mathbf{A}inmathbb{R}^{mtimes K^4}$, $mathbf{Q}inmathbb{R}^{Ktimes K}$ and $mathbf{x}inmathbb{R}^{K^4}$. I am confused with how the chain rule works with respect to matrix differentiation. This is how I proceeded:
Let $ f=lVertmathbf{y}-mathbf{A}(mathbf{Q}otimesmathbf{Q}otimesmathbf{Q}otimesmathbf{Q})mathbf{x}rVert^2_2$
and $mathbf{B}=mathbf{Q}otimesmathbf{Q}otimesmathbf{Q}otimesmathbf{Q}$. Therefore $frac{df}{dmathbf{Q}}=frac{df}{dmathbf{B}}frac{dmathbf{B}}{dmathbf{Q}}$
When I calculate $frac{df}{dmathbf{B}}=mathbf{A}^T(mathbf{y}-mathbf{ABx})mathbf{x}^T$ I gain a $mathbb{R}^{K^4times K^4}$ matrix not a $mathbb{R}^{Ktimes K}$ matrix that I am hoping for.
Therefore I am using the chain rule wrong because of the change in dimensions i.e scalar to matrix.
Thank you for your help in advance.
matrix-calculus chain-rule kronecker-product
matrix-calculus chain-rule kronecker-product
asked Dec 13 '18 at 14:20
shex95shex95
325
325
$begingroup$
Of course $df/dB$ is $K^4times K^4$. $B$ itself is $K^4times K^4$. There is no mystery here
$endgroup$
– Federico
Dec 13 '18 at 16:50
$begingroup$
Thanks for your comments! In reply to your first comment I agree there is no mystery but I was lead to believe a scalar-matrix differential has the same dimensions as the matrix i.e $dim(frac{df}{dmathbf{Q}})=dim(mathbf{Q})$ therefore it seems like the chain rule in this case does not preserve dimensionality. Secondly thanks for your longer answer, that definitely clears up part of my question :)
$endgroup$
– shex95
Dec 13 '18 at 17:04
1
$begingroup$
Yes, $frac{df}{dQ}$ has the same dimension as $Q$, that is correct. The problem is how you interpret $frac{df}{dB}frac{dB}{dQ}$. The former is a $k^4times k^4$ matrix, the second a $(k^4times k^4)times(ktimes k)$ tensor, and you have to contract the correct indices to obtain the chain rule: $$ frac{df}{dQ_{ij}} = sum_{k,l=1}^{k^4} frac{df}{dB_{kl}} frac{dB_{kl}}{dQ_{ij}} = sum_{k,l=1}^{k^4} left(frac{df}{dB}right)_{kl} frac{dB_{kl}}{dQ_{ij}} $$
$endgroup$
– Federico
Dec 13 '18 at 17:15
$begingroup$
Thanks Federico, you have cleared things up there. I appreciate your help.
$endgroup$
– shex95
Dec 13 '18 at 17:17
$begingroup$
So the contraction between $frac{df}{dB}$ and $frac{dB}{dQ}$ is what some people call double dot product. But my advice is to write out the indices in order to not commit mistakes
$endgroup$
– Federico
Dec 13 '18 at 17:18
|
show 3 more comments
$begingroup$
Of course $df/dB$ is $K^4times K^4$. $B$ itself is $K^4times K^4$. There is no mystery here
$endgroup$
– Federico
Dec 13 '18 at 16:50
$begingroup$
Thanks for your comments! In reply to your first comment I agree there is no mystery but I was lead to believe a scalar-matrix differential has the same dimensions as the matrix i.e $dim(frac{df}{dmathbf{Q}})=dim(mathbf{Q})$ therefore it seems like the chain rule in this case does not preserve dimensionality. Secondly thanks for your longer answer, that definitely clears up part of my question :)
$endgroup$
– shex95
Dec 13 '18 at 17:04
1
$begingroup$
Yes, $frac{df}{dQ}$ has the same dimension as $Q$, that is correct. The problem is how you interpret $frac{df}{dB}frac{dB}{dQ}$. The former is a $k^4times k^4$ matrix, the second a $(k^4times k^4)times(ktimes k)$ tensor, and you have to contract the correct indices to obtain the chain rule: $$ frac{df}{dQ_{ij}} = sum_{k,l=1}^{k^4} frac{df}{dB_{kl}} frac{dB_{kl}}{dQ_{ij}} = sum_{k,l=1}^{k^4} left(frac{df}{dB}right)_{kl} frac{dB_{kl}}{dQ_{ij}} $$
$endgroup$
– Federico
Dec 13 '18 at 17:15
$begingroup$
Thanks Federico, you have cleared things up there. I appreciate your help.
$endgroup$
– shex95
Dec 13 '18 at 17:17
$begingroup$
So the contraction between $frac{df}{dB}$ and $frac{dB}{dQ}$ is what some people call double dot product. But my advice is to write out the indices in order to not commit mistakes
$endgroup$
– Federico
Dec 13 '18 at 17:18
$begingroup$
Of course $df/dB$ is $K^4times K^4$. $B$ itself is $K^4times K^4$. There is no mystery here
$endgroup$
– Federico
Dec 13 '18 at 16:50
$begingroup$
Of course $df/dB$ is $K^4times K^4$. $B$ itself is $K^4times K^4$. There is no mystery here
$endgroup$
– Federico
Dec 13 '18 at 16:50
$begingroup$
Thanks for your comments! In reply to your first comment I agree there is no mystery but I was lead to believe a scalar-matrix differential has the same dimensions as the matrix i.e $dim(frac{df}{dmathbf{Q}})=dim(mathbf{Q})$ therefore it seems like the chain rule in this case does not preserve dimensionality. Secondly thanks for your longer answer, that definitely clears up part of my question :)
$endgroup$
– shex95
Dec 13 '18 at 17:04
$begingroup$
Thanks for your comments! In reply to your first comment I agree there is no mystery but I was lead to believe a scalar-matrix differential has the same dimensions as the matrix i.e $dim(frac{df}{dmathbf{Q}})=dim(mathbf{Q})$ therefore it seems like the chain rule in this case does not preserve dimensionality. Secondly thanks for your longer answer, that definitely clears up part of my question :)
$endgroup$
– shex95
Dec 13 '18 at 17:04
1
1
$begingroup$
Yes, $frac{df}{dQ}$ has the same dimension as $Q$, that is correct. The problem is how you interpret $frac{df}{dB}frac{dB}{dQ}$. The former is a $k^4times k^4$ matrix, the second a $(k^4times k^4)times(ktimes k)$ tensor, and you have to contract the correct indices to obtain the chain rule: $$ frac{df}{dQ_{ij}} = sum_{k,l=1}^{k^4} frac{df}{dB_{kl}} frac{dB_{kl}}{dQ_{ij}} = sum_{k,l=1}^{k^4} left(frac{df}{dB}right)_{kl} frac{dB_{kl}}{dQ_{ij}} $$
$endgroup$
– Federico
Dec 13 '18 at 17:15
$begingroup$
Yes, $frac{df}{dQ}$ has the same dimension as $Q$, that is correct. The problem is how you interpret $frac{df}{dB}frac{dB}{dQ}$. The former is a $k^4times k^4$ matrix, the second a $(k^4times k^4)times(ktimes k)$ tensor, and you have to contract the correct indices to obtain the chain rule: $$ frac{df}{dQ_{ij}} = sum_{k,l=1}^{k^4} frac{df}{dB_{kl}} frac{dB_{kl}}{dQ_{ij}} = sum_{k,l=1}^{k^4} left(frac{df}{dB}right)_{kl} frac{dB_{kl}}{dQ_{ij}} $$
$endgroup$
– Federico
Dec 13 '18 at 17:15
$begingroup$
Thanks Federico, you have cleared things up there. I appreciate your help.
$endgroup$
– shex95
Dec 13 '18 at 17:17
$begingroup$
Thanks Federico, you have cleared things up there. I appreciate your help.
$endgroup$
– shex95
Dec 13 '18 at 17:17
$begingroup$
So the contraction between $frac{df}{dB}$ and $frac{dB}{dQ}$ is what some people call double dot product. But my advice is to write out the indices in order to not commit mistakes
$endgroup$
– Federico
Dec 13 '18 at 17:18
$begingroup$
So the contraction between $frac{df}{dB}$ and $frac{dB}{dQ}$ is what some people call double dot product. But my advice is to write out the indices in order to not commit mistakes
$endgroup$
– Federico
Dec 13 '18 at 17:18
|
show 3 more comments
1 Answer
1
active
oldest
votes
$begingroup$
Short answer: the derivative of $Qotimes Qotimes Qotimes Q$ with respect to $Q$ is a mess, at first sight...
Let's start simple. Let $Q$ be a $Ktimes K$ matrix with entries $Q_{ij}$ and let $E^{ab}$ be the $Ktimes K$ matrix with all $0$ entries, except the entry $(a,b)$ which is $1$; in other words, $(E^{ab})_{ij} = delta_a^idelta_b^j$.
Then I claim that
$$
frac{partial(Qotimes Q)}{partial Q_{ij}} = E^{ij}otimes Q+Qotimes E^{ij} .
$$
I leave it to you to see why, because trying to write out the involved matrices will probably crash the entire Stack Exchange network...
Jokes aside, this is really immediate to see: just write $Qtimes Q$ as in the first formula of the definition and think which elements are affected by $Q_{ij}$. There is the entire $(i,j)$th block, so you get $E^{ij}otimes Q$, but there is also the $(i,j)$th entry in each block, which gives you $Qotimes E^{ij}$.
Now, if $A$ and $B$ are matrices which are functions of $Q$, by the same reasoning you get
$$
frac{partial(Aotimes B)}{partial Q_{ij}} = frac{partial A}{partial Q_{ij}}otimes B + Aotimes frac{partial B}{partial Q_{ij}} .
$$
So you can iterate for instance
$$
begin{split}
frac{partial (Qotimes Qotimes Q)}{partial Q_{ij}}
&= frac{partial(Qotimes Q)}{partial Q_{ij}}otimes Q
+ (Qotimes Q)otimes frac{partial Q}{partial Q_{ij}} \
&= (E^{ij}otimes Q+Qotimes E^{ij})otimes Q + (Qotimes Q)otimes E^{ij} \
&= E^{ij}otimes Qotimes Q + Qotimes E^{ij}otimes Q + Qotimes Qotimes E^{ij}.
end{split}
$$
Now you can prove by induction that
$$
frac{partial bigl(bigotimes_{n=1}^N Qbigr)}{partial Q_{ij}}
= sum_{n=1}^N left(bigotimes_{h=1}^{n-1} Qright) otimes E^{ij} otimes left(bigotimes_{h=n+1}^{N} Qright).
$$
Written more concisely,
$$
frac{partial Q^{otimes N}}{partial Q_{ij}}
= sum_{n=1}^N Q^{otimes (n-1)}otimes E^{ij} otimes Q^{otimes (N-n)} .
$$
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3038058%2fmatrix-differentiation-of-kronecker-product%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Short answer: the derivative of $Qotimes Qotimes Qotimes Q$ with respect to $Q$ is a mess, at first sight...
Let's start simple. Let $Q$ be a $Ktimes K$ matrix with entries $Q_{ij}$ and let $E^{ab}$ be the $Ktimes K$ matrix with all $0$ entries, except the entry $(a,b)$ which is $1$; in other words, $(E^{ab})_{ij} = delta_a^idelta_b^j$.
Then I claim that
$$
frac{partial(Qotimes Q)}{partial Q_{ij}} = E^{ij}otimes Q+Qotimes E^{ij} .
$$
I leave it to you to see why, because trying to write out the involved matrices will probably crash the entire Stack Exchange network...
Jokes aside, this is really immediate to see: just write $Qtimes Q$ as in the first formula of the definition and think which elements are affected by $Q_{ij}$. There is the entire $(i,j)$th block, so you get $E^{ij}otimes Q$, but there is also the $(i,j)$th entry in each block, which gives you $Qotimes E^{ij}$.
Now, if $A$ and $B$ are matrices which are functions of $Q$, by the same reasoning you get
$$
frac{partial(Aotimes B)}{partial Q_{ij}} = frac{partial A}{partial Q_{ij}}otimes B + Aotimes frac{partial B}{partial Q_{ij}} .
$$
So you can iterate for instance
$$
begin{split}
frac{partial (Qotimes Qotimes Q)}{partial Q_{ij}}
&= frac{partial(Qotimes Q)}{partial Q_{ij}}otimes Q
+ (Qotimes Q)otimes frac{partial Q}{partial Q_{ij}} \
&= (E^{ij}otimes Q+Qotimes E^{ij})otimes Q + (Qotimes Q)otimes E^{ij} \
&= E^{ij}otimes Qotimes Q + Qotimes E^{ij}otimes Q + Qotimes Qotimes E^{ij}.
end{split}
$$
Now you can prove by induction that
$$
frac{partial bigl(bigotimes_{n=1}^N Qbigr)}{partial Q_{ij}}
= sum_{n=1}^N left(bigotimes_{h=1}^{n-1} Qright) otimes E^{ij} otimes left(bigotimes_{h=n+1}^{N} Qright).
$$
Written more concisely,
$$
frac{partial Q^{otimes N}}{partial Q_{ij}}
= sum_{n=1}^N Q^{otimes (n-1)}otimes E^{ij} otimes Q^{otimes (N-n)} .
$$
$endgroup$
add a comment |
$begingroup$
Short answer: the derivative of $Qotimes Qotimes Qotimes Q$ with respect to $Q$ is a mess, at first sight...
Let's start simple. Let $Q$ be a $Ktimes K$ matrix with entries $Q_{ij}$ and let $E^{ab}$ be the $Ktimes K$ matrix with all $0$ entries, except the entry $(a,b)$ which is $1$; in other words, $(E^{ab})_{ij} = delta_a^idelta_b^j$.
Then I claim that
$$
frac{partial(Qotimes Q)}{partial Q_{ij}} = E^{ij}otimes Q+Qotimes E^{ij} .
$$
I leave it to you to see why, because trying to write out the involved matrices will probably crash the entire Stack Exchange network...
Jokes aside, this is really immediate to see: just write $Qtimes Q$ as in the first formula of the definition and think which elements are affected by $Q_{ij}$. There is the entire $(i,j)$th block, so you get $E^{ij}otimes Q$, but there is also the $(i,j)$th entry in each block, which gives you $Qotimes E^{ij}$.
Now, if $A$ and $B$ are matrices which are functions of $Q$, by the same reasoning you get
$$
frac{partial(Aotimes B)}{partial Q_{ij}} = frac{partial A}{partial Q_{ij}}otimes B + Aotimes frac{partial B}{partial Q_{ij}} .
$$
So you can iterate for instance
$$
begin{split}
frac{partial (Qotimes Qotimes Q)}{partial Q_{ij}}
&= frac{partial(Qotimes Q)}{partial Q_{ij}}otimes Q
+ (Qotimes Q)otimes frac{partial Q}{partial Q_{ij}} \
&= (E^{ij}otimes Q+Qotimes E^{ij})otimes Q + (Qotimes Q)otimes E^{ij} \
&= E^{ij}otimes Qotimes Q + Qotimes E^{ij}otimes Q + Qotimes Qotimes E^{ij}.
end{split}
$$
Now you can prove by induction that
$$
frac{partial bigl(bigotimes_{n=1}^N Qbigr)}{partial Q_{ij}}
= sum_{n=1}^N left(bigotimes_{h=1}^{n-1} Qright) otimes E^{ij} otimes left(bigotimes_{h=n+1}^{N} Qright).
$$
Written more concisely,
$$
frac{partial Q^{otimes N}}{partial Q_{ij}}
= sum_{n=1}^N Q^{otimes (n-1)}otimes E^{ij} otimes Q^{otimes (N-n)} .
$$
$endgroup$
add a comment |
$begingroup$
Short answer: the derivative of $Qotimes Qotimes Qotimes Q$ with respect to $Q$ is a mess, at first sight...
Let's start simple. Let $Q$ be a $Ktimes K$ matrix with entries $Q_{ij}$ and let $E^{ab}$ be the $Ktimes K$ matrix with all $0$ entries, except the entry $(a,b)$ which is $1$; in other words, $(E^{ab})_{ij} = delta_a^idelta_b^j$.
Then I claim that
$$
frac{partial(Qotimes Q)}{partial Q_{ij}} = E^{ij}otimes Q+Qotimes E^{ij} .
$$
I leave it to you to see why, because trying to write out the involved matrices will probably crash the entire Stack Exchange network...
Jokes aside, this is really immediate to see: just write $Qtimes Q$ as in the first formula of the definition and think which elements are affected by $Q_{ij}$. There is the entire $(i,j)$th block, so you get $E^{ij}otimes Q$, but there is also the $(i,j)$th entry in each block, which gives you $Qotimes E^{ij}$.
Now, if $A$ and $B$ are matrices which are functions of $Q$, by the same reasoning you get
$$
frac{partial(Aotimes B)}{partial Q_{ij}} = frac{partial A}{partial Q_{ij}}otimes B + Aotimes frac{partial B}{partial Q_{ij}} .
$$
So you can iterate for instance
$$
begin{split}
frac{partial (Qotimes Qotimes Q)}{partial Q_{ij}}
&= frac{partial(Qotimes Q)}{partial Q_{ij}}otimes Q
+ (Qotimes Q)otimes frac{partial Q}{partial Q_{ij}} \
&= (E^{ij}otimes Q+Qotimes E^{ij})otimes Q + (Qotimes Q)otimes E^{ij} \
&= E^{ij}otimes Qotimes Q + Qotimes E^{ij}otimes Q + Qotimes Qotimes E^{ij}.
end{split}
$$
Now you can prove by induction that
$$
frac{partial bigl(bigotimes_{n=1}^N Qbigr)}{partial Q_{ij}}
= sum_{n=1}^N left(bigotimes_{h=1}^{n-1} Qright) otimes E^{ij} otimes left(bigotimes_{h=n+1}^{N} Qright).
$$
Written more concisely,
$$
frac{partial Q^{otimes N}}{partial Q_{ij}}
= sum_{n=1}^N Q^{otimes (n-1)}otimes E^{ij} otimes Q^{otimes (N-n)} .
$$
$endgroup$
Short answer: the derivative of $Qotimes Qotimes Qotimes Q$ with respect to $Q$ is a mess, at first sight...
Let's start simple. Let $Q$ be a $Ktimes K$ matrix with entries $Q_{ij}$ and let $E^{ab}$ be the $Ktimes K$ matrix with all $0$ entries, except the entry $(a,b)$ which is $1$; in other words, $(E^{ab})_{ij} = delta_a^idelta_b^j$.
Then I claim that
$$
frac{partial(Qotimes Q)}{partial Q_{ij}} = E^{ij}otimes Q+Qotimes E^{ij} .
$$
I leave it to you to see why, because trying to write out the involved matrices will probably crash the entire Stack Exchange network...
Jokes aside, this is really immediate to see: just write $Qtimes Q$ as in the first formula of the definition and think which elements are affected by $Q_{ij}$. There is the entire $(i,j)$th block, so you get $E^{ij}otimes Q$, but there is also the $(i,j)$th entry in each block, which gives you $Qotimes E^{ij}$.
Now, if $A$ and $B$ are matrices which are functions of $Q$, by the same reasoning you get
$$
frac{partial(Aotimes B)}{partial Q_{ij}} = frac{partial A}{partial Q_{ij}}otimes B + Aotimes frac{partial B}{partial Q_{ij}} .
$$
So you can iterate for instance
$$
begin{split}
frac{partial (Qotimes Qotimes Q)}{partial Q_{ij}}
&= frac{partial(Qotimes Q)}{partial Q_{ij}}otimes Q
+ (Qotimes Q)otimes frac{partial Q}{partial Q_{ij}} \
&= (E^{ij}otimes Q+Qotimes E^{ij})otimes Q + (Qotimes Q)otimes E^{ij} \
&= E^{ij}otimes Qotimes Q + Qotimes E^{ij}otimes Q + Qotimes Qotimes E^{ij}.
end{split}
$$
Now you can prove by induction that
$$
frac{partial bigl(bigotimes_{n=1}^N Qbigr)}{partial Q_{ij}}
= sum_{n=1}^N left(bigotimes_{h=1}^{n-1} Qright) otimes E^{ij} otimes left(bigotimes_{h=n+1}^{N} Qright).
$$
Written more concisely,
$$
frac{partial Q^{otimes N}}{partial Q_{ij}}
= sum_{n=1}^N Q^{otimes (n-1)}otimes E^{ij} otimes Q^{otimes (N-n)} .
$$
answered Dec 13 '18 at 16:47
FedericoFederico
5,124514
5,124514
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3038058%2fmatrix-differentiation-of-kronecker-product%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Of course $df/dB$ is $K^4times K^4$. $B$ itself is $K^4times K^4$. There is no mystery here
$endgroup$
– Federico
Dec 13 '18 at 16:50
$begingroup$
Thanks for your comments! In reply to your first comment I agree there is no mystery but I was lead to believe a scalar-matrix differential has the same dimensions as the matrix i.e $dim(frac{df}{dmathbf{Q}})=dim(mathbf{Q})$ therefore it seems like the chain rule in this case does not preserve dimensionality. Secondly thanks for your longer answer, that definitely clears up part of my question :)
$endgroup$
– shex95
Dec 13 '18 at 17:04
1
$begingroup$
Yes, $frac{df}{dQ}$ has the same dimension as $Q$, that is correct. The problem is how you interpret $frac{df}{dB}frac{dB}{dQ}$. The former is a $k^4times k^4$ matrix, the second a $(k^4times k^4)times(ktimes k)$ tensor, and you have to contract the correct indices to obtain the chain rule: $$ frac{df}{dQ_{ij}} = sum_{k,l=1}^{k^4} frac{df}{dB_{kl}} frac{dB_{kl}}{dQ_{ij}} = sum_{k,l=1}^{k^4} left(frac{df}{dB}right)_{kl} frac{dB_{kl}}{dQ_{ij}} $$
$endgroup$
– Federico
Dec 13 '18 at 17:15
$begingroup$
Thanks Federico, you have cleared things up there. I appreciate your help.
$endgroup$
– shex95
Dec 13 '18 at 17:17
$begingroup$
So the contraction between $frac{df}{dB}$ and $frac{dB}{dQ}$ is what some people call double dot product. But my advice is to write out the indices in order to not commit mistakes
$endgroup$
– Federico
Dec 13 '18 at 17:18