Derivative of inner product

If the inner product of some vector $mathbf{x}$ can be expressed as

$$langle mathbf{x}, mathbf{x}rangle_G = mathbf{x}^T Gmathbf{x}$$

where $G$ is some symmetric matrix, if I want the derivative of this inner product with respect to $mathbf{x}$, I should get a vector as a result since this is the derivative of a scalar function by a vector (https://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-vector).

Nevertheless, this formula tells me that I should get a row-vector, and not a normal vector.

$$frac{mathrm{d}}{mathrm{d} mathbf{x}} (mathbf{x}^TGmathbf{x}) = 2mathbf{x}^T G$$

(http://www.cs.huji.ac.il/~csip/tirgul3_derivatives.pdf)
which is a row-vector.

Why do I get this contradiction?

asked Nov 30 '18 at 9:15

The Bosco

541212

add a comment |

If the inner product of some vector $mathbf{x}$ can be expressed as

$$langle mathbf{x}, mathbf{x}rangle_G = mathbf{x}^T Gmathbf{x}$$

Nevertheless, this formula tells me that I should get a row-vector, and not a normal vector.

$$frac{mathrm{d}}{mathrm{d} mathbf{x}} (mathbf{x}^TGmathbf{x}) = 2mathbf{x}^T G$$

(http://www.cs.huji.ac.il/~csip/tirgul3_derivatives.pdf)
which is a row-vector.

Why do I get this contradiction?

asked Nov 30 '18 at 9:15

The Bosco

541212

add a comment |

If the inner product of some vector $mathbf{x}$ can be expressed as

$$langle mathbf{x}, mathbf{x}rangle_G = mathbf{x}^T Gmathbf{x}$$

Nevertheless, this formula tells me that I should get a row-vector, and not a normal vector.

$$frac{mathrm{d}}{mathrm{d} mathbf{x}} (mathbf{x}^TGmathbf{x}) = 2mathbf{x}^T G$$

(http://www.cs.huji.ac.il/~csip/tirgul3_derivatives.pdf)
which is a row-vector.

Why do I get this contradiction?

asked Nov 30 '18 at 9:15

The Bosco

541212

If the inner product of some vector $mathbf{x}$ can be expressed as

$$langle mathbf{x}, mathbf{x}rangle_G = mathbf{x}^T Gmathbf{x}$$

Nevertheless, this formula tells me that I should get a row-vector, and not a normal vector.

$$frac{mathrm{d}}{mathrm{d} mathbf{x}} (mathbf{x}^TGmathbf{x}) = 2mathbf{x}^T G$$

(http://www.cs.huji.ac.il/~csip/tirgul3_derivatives.pdf)
which is a row-vector.

Why do I get this contradiction?

linear-algebra derivatives vectors inner-product-space

asked Nov 30 '18 at 9:15

The Bosco

541212

asked Nov 30 '18 at 9:15

The Bosco

541212

asked Nov 30 '18 at 9:15

The Bosco

541212

asked Nov 30 '18 at 9:15

The Bosco

541212

asked Nov 30 '18 at 9:15

The Bosco

541212

add a comment |

4 Answers
4

active

oldest

votes

For a smooth $f:mathbb{R}^ntomathbb{R}^m$, you have $df:mathbb{R}^ntomathcal{L}(mathbb{R}^n,mathbb{R}^m)$

Being differentiable is equivalent to:
$$
f(x+h)=f(x)+df(x)cdot h+o(|h|)
$$

In your case, $f(x)=langle x,x rangle_G$ and $m=1$, hence differential at $x$, $df(x)$ is in $mathcal{L}(mathbb{R}^n,mathbb{R})$. It's a linear form.

Let's be more explicit:
begin{align*}
f(x+h)=& langle x+h,x+h rangle_G \
=& underbrace{langle x,x rangle_G}_{f(x)} + underbrace{2langle x,h rangle_G }_{df(x)cdot h}+ underbrace{langle h,h rangle_G}_{in o(|h|)}\
end{align*}

Hence your differential is defined by
$$
df(x)cdot h = 2langle x,h rangle_G = (2x^tG)h
$$
where $2x^tG=left(partial_{x_1} f,dots,partial_{x_n} fright)$ is your "row" vector.

Note that, because $m=1$, you can also use a vector $nabla f(x)$ to represent $df(x)$ using the canonical scalar product. This vector is by definition the gradient of $f$:

$$
df(x)cdot h = langle nabla f(x),h rangle = langle 2Gx,h rangle
$$
where $nabla f(x)=2Gx=left(begin{array}{c}partial_{x_1} f \ ... \partial_{x_n} fend{array}right)$. This is your "column" vector.

edited Nov 30 '18 at 10:43

answered Nov 30 '18 at 10:09

Picaud Vincent

1,33439

add a comment |

The difference is in the fact the author in the second reference prefers to arrange the components of the gradient. In the first paragraph they state

Let $xin mathbb{R}^n$ (a column vector) and let $f : mathbb{R}^n to R$. The derivative of $f$ with respect to $x$ is a row vector:
$$
frac{partial f}{partial x} = left(frac{partial f}{partial x_1}, cdots , frac{partial f}{partial x_n} right)
$$

You can argue this is a better option than the first one (e.g. this answer), but at the end of the day is just a matter of notation. Pick the one you prefer and stick with it to avoid problems down the line

answered Nov 30 '18 at 9:29

caverac

14.1k21130

add a comment |

More generally, suppose we differentiate any scalar-valued function $f$ of a vector $mathbf{x}$ with respect to $mathbf{x}$. By the chain rule, $$df=sum_ifrac{partial f}{partial x_i}dx_i=boldsymbol{nabla}fcdot dmathbf{x}=boldsymbol{nabla}f^T dmathbf{x}.$$(Technically, I should write $df=(boldsymbol{nabla}f^T dmathbf{x})_{11}$ to take the unique entry of a $1times 1$ matrix.)

If you want to define the derivative of $f$ with respect to $mathbf{x}$ as the $dmathbf{x}$ coefficient in $df$, you use the last expression, obtaining the row vector $boldsymbol{nabla}f^T$. Defining it instead as the left-hand argument of the dot product, giving the column vector $boldsymbol{nabla}f$, is an alternative convention.

answered Nov 30 '18 at 9:42

J.G.

23.4k22237

add a comment |

Why not use the Leibniz-rule? We have, where $langle .,.rangle$ denotes the standard inner product
$$D_p(langle x,xrangle_G)=2langle p,xrangle_G=2p^TGx=2langle p,Gxrangle.$$

Note that the derivative of $fcolonmathbb R^ntomathbb R$ is not a vector, but a linear form instead. The gradient $nabla^{langle .,.rangle_G}f$ in respect to the inner product $langle .,.rangle_G$ is the unique vector which represents this linear form in presence of the specified inner product. In our case we have
$$nabla^{langle .,.rangle_G}f(x)=2x,quadtext{that is}quad
D_p(langle x,xrangle_G)=langle p,2xrangle_G$$
whereas
$$nabla^{langle .,.rangle}f(x)=2Gx,quadtext{and that is}quad
D_p(langle x,xrangle_G)=langle p,2Gxrangle$$

edited Nov 30 '18 at 16:21

answered Nov 30 '18 at 16:14

Michael Hoppe

10.8k31834

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3019859%2fderivative-of-inner-product%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

For a smooth $f:mathbb{R}^ntomathbb{R}^m$, you have $df:mathbb{R}^ntomathcal{L}(mathbb{R}^n,mathbb{R}^m)$

Being differentiable is equivalent to:
$$
f(x+h)=f(x)+df(x)cdot h+o(|h|)
$$

In your case, $f(x)=langle x,x rangle_G$ and $m=1$, hence differential at $x$, $df(x)$ is in $mathcal{L}(mathbb{R}^n,mathbb{R})$. It's a linear form.

Hence your differential is defined by
$$
df(x)cdot h = 2langle x,h rangle_G = (2x^tG)h
$$
where $2x^tG=left(partial_{x_1} f,dots,partial_{x_n} fright)$ is your "row" vector.

Note that, because $m=1$, you can also use a vector $nabla f(x)$ to represent $df(x)$ using the canonical scalar product. This vector is by definition the gradient of $f$:

edited Nov 30 '18 at 10:43

answered Nov 30 '18 at 10:09

Picaud Vincent

1,33439

add a comment |

For a smooth $f:mathbb{R}^ntomathbb{R}^m$, you have $df:mathbb{R}^ntomathcal{L}(mathbb{R}^n,mathbb{R}^m)$

Being differentiable is equivalent to:
$$
f(x+h)=f(x)+df(x)cdot h+o(|h|)
$$

In your case, $f(x)=langle x,x rangle_G$ and $m=1$, hence differential at $x$, $df(x)$ is in $mathcal{L}(mathbb{R}^n,mathbb{R})$. It's a linear form.

Hence your differential is defined by
$$
df(x)cdot h = 2langle x,h rangle_G = (2x^tG)h
$$
where $2x^tG=left(partial_{x_1} f,dots,partial_{x_n} fright)$ is your "row" vector.

Note that, because $m=1$, you can also use a vector $nabla f(x)$ to represent $df(x)$ using the canonical scalar product. This vector is by definition the gradient of $f$:

edited Nov 30 '18 at 10:43

answered Nov 30 '18 at 10:09

Picaud Vincent

1,33439

add a comment |

For a smooth $f:mathbb{R}^ntomathbb{R}^m$, you have $df:mathbb{R}^ntomathcal{L}(mathbb{R}^n,mathbb{R}^m)$

Being differentiable is equivalent to:
$$
f(x+h)=f(x)+df(x)cdot h+o(|h|)
$$

In your case, $f(x)=langle x,x rangle_G$ and $m=1$, hence differential at $x$, $df(x)$ is in $mathcal{L}(mathbb{R}^n,mathbb{R})$. It's a linear form.

Hence your differential is defined by
$$
df(x)cdot h = 2langle x,h rangle_G = (2x^tG)h
$$
where $2x^tG=left(partial_{x_1} f,dots,partial_{x_n} fright)$ is your "row" vector.

Note that, because $m=1$, you can also use a vector $nabla f(x)$ to represent $df(x)$ using the canonical scalar product. This vector is by definition the gradient of $f$:

edited Nov 30 '18 at 10:43

answered Nov 30 '18 at 10:09

Picaud Vincent

1,33439

For a smooth $f:mathbb{R}^ntomathbb{R}^m$, you have $df:mathbb{R}^ntomathcal{L}(mathbb{R}^n,mathbb{R}^m)$

Being differentiable is equivalent to:
$$
f(x+h)=f(x)+df(x)cdot h+o(|h|)
$$

In your case, $f(x)=langle x,x rangle_G$ and $m=1$, hence differential at $x$, $df(x)$ is in $mathcal{L}(mathbb{R}^n,mathbb{R})$. It's a linear form.

Hence your differential is defined by
$$
df(x)cdot h = 2langle x,h rangle_G = (2x^tG)h
$$
where $2x^tG=left(partial_{x_1} f,dots,partial_{x_n} fright)$ is your "row" vector.

Note that, because $m=1$, you can also use a vector $nabla f(x)$ to represent $df(x)$ using the canonical scalar product. This vector is by definition the gradient of $f$:

edited Nov 30 '18 at 10:43

answered Nov 30 '18 at 10:09

Picaud Vincent

1,33439

edited Nov 30 '18 at 10:43

answered Nov 30 '18 at 10:09

Picaud Vincent

1,33439

answered Nov 30 '18 at 10:09

Picaud Vincent

1,33439

answered Nov 30 '18 at 10:09

Picaud Vincent

1,33439

add a comment |

The difference is in the fact the author in the second reference prefers to arrange the components of the gradient. In the first paragraph they state

Let $xin mathbb{R}^n$ (a column vector) and let $f : mathbb{R}^n to R$. The derivative of $f$ with respect to $x$ is a row vector:
$$
frac{partial f}{partial x} = left(frac{partial f}{partial x_1}, cdots , frac{partial f}{partial x_n} right)
$$

answered Nov 30 '18 at 9:29

caverac

14.1k21130

add a comment |

The difference is in the fact the author in the second reference prefers to arrange the components of the gradient. In the first paragraph they state

Let $xin mathbb{R}^n$ (a column vector) and let $f : mathbb{R}^n to R$. The derivative of $f$ with respect to $x$ is a row vector:
$$
frac{partial f}{partial x} = left(frac{partial f}{partial x_1}, cdots , frac{partial f}{partial x_n} right)
$$

answered Nov 30 '18 at 9:29

caverac

14.1k21130

add a comment |

The difference is in the fact the author in the second reference prefers to arrange the components of the gradient. In the first paragraph they state

Let $xin mathbb{R}^n$ (a column vector) and let $f : mathbb{R}^n to R$. The derivative of $f$ with respect to $x$ is a row vector:
$$
frac{partial f}{partial x} = left(frac{partial f}{partial x_1}, cdots , frac{partial f}{partial x_n} right)
$$

answered Nov 30 '18 at 9:29

caverac

14.1k21130

The difference is in the fact the author in the second reference prefers to arrange the components of the gradient. In the first paragraph they state

Let $xin mathbb{R}^n$ (a column vector) and let $f : mathbb{R}^n to R$. The derivative of $f$ with respect to $x$ is a row vector:
$$
frac{partial f}{partial x} = left(frac{partial f}{partial x_1}, cdots , frac{partial f}{partial x_n} right)
$$

answered Nov 30 '18 at 9:29

caverac

14.1k21130

answered Nov 30 '18 at 9:29

caverac

14.1k21130

answered Nov 30 '18 at 9:29

caverac

14.1k21130

answered Nov 30 '18 at 9:29

caverac

14.1k21130

add a comment |

answered Nov 30 '18 at 9:42

J.G.

23.4k22237

add a comment |

answered Nov 30 '18 at 9:42

J.G.

23.4k22237

add a comment |

answered Nov 30 '18 at 9:42

J.G.

23.4k22237

answered Nov 30 '18 at 9:42

J.G.

23.4k22237

answered Nov 30 '18 at 9:42

J.G.

23.4k22237

answered Nov 30 '18 at 9:42

J.G.

23.4k22237

answered Nov 30 '18 at 9:42

J.G.

23.4k22237

add a comment |

Why not use the Leibniz-rule? We have, where $langle .,.rangle$ denotes the standard inner product
$$D_p(langle x,xrangle_G)=2langle p,xrangle_G=2p^TGx=2langle p,Gxrangle.$$

edited Nov 30 '18 at 16:21

answered Nov 30 '18 at 16:14

Michael Hoppe

10.8k31834

add a comment |

Why not use the Leibniz-rule? We have, where $langle .,.rangle$ denotes the standard inner product
$$D_p(langle x,xrangle_G)=2langle p,xrangle_G=2p^TGx=2langle p,Gxrangle.$$

edited Nov 30 '18 at 16:21

answered Nov 30 '18 at 16:14

Michael Hoppe

10.8k31834

add a comment |

Why not use the Leibniz-rule? We have, where $langle .,.rangle$ denotes the standard inner product
$$D_p(langle x,xrangle_G)=2langle p,xrangle_G=2p^TGx=2langle p,Gxrangle.$$

edited Nov 30 '18 at 16:21

answered Nov 30 '18 at 16:14

Michael Hoppe

10.8k31834

Why not use the Leibniz-rule? We have, where $langle .,.rangle$ denotes the standard inner product
$$D_p(langle x,xrangle_G)=2langle p,xrangle_G=2p^TGx=2langle p,Gxrangle.$$

edited Nov 30 '18 at 16:21

answered Nov 30 '18 at 16:14

Michael Hoppe

10.8k31834

edited Nov 30 '18 at 16:21

answered Nov 30 '18 at 16:14

Michael Hoppe

10.8k31834

answered Nov 30 '18 at 16:14

Michael Hoppe

10.8k31834

answered Nov 30 '18 at 16:14

Michael Hoppe

10.8k31834

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vrftsjtryk