Conditions for two optimization problems to yield the same solution












4












$begingroup$



Problem: Consider the optimization problems
$$min_beta |y-Xbeta|^2+alpha|beta|^2 tag 1$$
and
$$min_beta |beta|^2 text{ subject to } |y-Xbeta|^2 le c tag 2$$
where $|x|$ is the $2$-norm. Fix $alpha$, and suppose $beta^*$ is the solution to ($1$), and let $c=|y-Xbeta^*|^2$. Is it true that the solution to ($2$) is also $beta^*$?




Attempt: I believe this is true. The argument should be very similar to the one in Why are additional constraint and penalty term equivalent in ridge regression?. However, I was running some numerical experiments and it turns out the two problems have different solutions. Hence my question here: are the two problems really yielding the same solutions? Are there exceptions that I should be careful of?










share|cite|improve this question











$endgroup$

















    4












    $begingroup$



    Problem: Consider the optimization problems
    $$min_beta |y-Xbeta|^2+alpha|beta|^2 tag 1$$
    and
    $$min_beta |beta|^2 text{ subject to } |y-Xbeta|^2 le c tag 2$$
    where $|x|$ is the $2$-norm. Fix $alpha$, and suppose $beta^*$ is the solution to ($1$), and let $c=|y-Xbeta^*|^2$. Is it true that the solution to ($2$) is also $beta^*$?




    Attempt: I believe this is true. The argument should be very similar to the one in Why are additional constraint and penalty term equivalent in ridge regression?. However, I was running some numerical experiments and it turns out the two problems have different solutions. Hence my question here: are the two problems really yielding the same solutions? Are there exceptions that I should be careful of?










    share|cite|improve this question











    $endgroup$















      4












      4








      4


      1



      $begingroup$



      Problem: Consider the optimization problems
      $$min_beta |y-Xbeta|^2+alpha|beta|^2 tag 1$$
      and
      $$min_beta |beta|^2 text{ subject to } |y-Xbeta|^2 le c tag 2$$
      where $|x|$ is the $2$-norm. Fix $alpha$, and suppose $beta^*$ is the solution to ($1$), and let $c=|y-Xbeta^*|^2$. Is it true that the solution to ($2$) is also $beta^*$?




      Attempt: I believe this is true. The argument should be very similar to the one in Why are additional constraint and penalty term equivalent in ridge regression?. However, I was running some numerical experiments and it turns out the two problems have different solutions. Hence my question here: are the two problems really yielding the same solutions? Are there exceptions that I should be careful of?










      share|cite|improve this question











      $endgroup$





      Problem: Consider the optimization problems
      $$min_beta |y-Xbeta|^2+alpha|beta|^2 tag 1$$
      and
      $$min_beta |beta|^2 text{ subject to } |y-Xbeta|^2 le c tag 2$$
      where $|x|$ is the $2$-norm. Fix $alpha$, and suppose $beta^*$ is the solution to ($1$), and let $c=|y-Xbeta^*|^2$. Is it true that the solution to ($2$) is also $beta^*$?




      Attempt: I believe this is true. The argument should be very similar to the one in Why are additional constraint and penalty term equivalent in ridge regression?. However, I was running some numerical experiments and it turns out the two problems have different solutions. Hence my question here: are the two problems really yielding the same solutions? Are there exceptions that I should be careful of?







      analysis optimization numerical-methods convex-optimization regression






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Dec 26 '18 at 0:53









      Rócherz

      3,0263823




      3,0263823










      asked Dec 25 '18 at 23:21









      LongtiLongti

      84112




      84112






















          2 Answers
          2






          active

          oldest

          votes


















          3












          $begingroup$

          It is true for $alpha>0$. Since $beta^*$ is solution of (1), we have:
          $$|y-Xbeta^*|^2 + alpha|beta^*|^2 le |y-Xbeta|^2 + alpha|beta|^2.$$
          Reordering:
          $$|beta^*|^2 le frac{1}{alpha}(|y-Xbeta|^2 - |y-Xbeta^*|^2) + |beta|^2.$$
          Now, in (2) we take $beta$ such that $|y-Xbeta|^2 le |y-Xbeta^*|^2$, so we conclude that
          $$|beta^*|^2 le |beta|^2,$$
          which implies that $beta^*$ is a minimum of (2).



          Analogously, for $alpha<0$ you can check that $beta^*$ solution of (1) is also solution of (2), BUT maximizing instead of minimizing:
          $$alpha(|beta^*|^2 - |beta|^2) le |y-Xbeta|^2 - |y-Xbeta^*|^2 le 0 quadLongrightarrowquad |beta^*|^2 ge |beta|^2.$$
          Anyhow, note that you cannot assure equivalence, since the problem becomes non-convex for $alpha<0$.






          share|cite|improve this answer











          $endgroup$













          • $begingroup$
            I'm not convinced of your final statement. For one thing setting $alpha<0$ may make this problem non-convex. But I see no reason to believe it would give equivalent solutions if I were "maximizing instead of minimizing". After all, you're not negating the entire objective, just the regularizer.
            $endgroup$
            – Michael Grant
            Dec 26 '18 at 22:24










          • $begingroup$
            @MichaelGrant I did not say it is equivalent ;) Anyway, I have added a note to make clear that solution of (1) would be solution of (2) (maximizing instead of minimizing), but not the other way around.
            $endgroup$
            – AugSB
            Dec 27 '18 at 8:39












          • $begingroup$
            I'm afraid I still don't see it for the non-convex case at all. Fortunately I believe the OP is likely concerning himself with the convex case.
            $endgroup$
            – Michael Grant
            Dec 27 '18 at 13:38










          • $begingroup$
            Please, correct me if I'm wrong, because maybe I'm missing something. For any $alpha<0$, $beta^*$ being solution of (1) implies $|y-Xbeta^*|^2 + alpha|beta^*|^2 le |y-Xbeta|^2 + alpha|beta|^2$ for all $beta$. Therefore, $alpha(|beta^*|^2 - |beta|^2) le |y-Xbeta|^2 - |y-Xbeta^*|^2$ for all $beta$. Assuming that $|y-Xbeta|^2 le |y-Xbeta^*|^2$, we conclude that $|beta^*|^2 ge |beta|^2$ for all $beta$. So $beta^* = max_beta |beta|^2$ subject to $|y-Xbeta|^2 le |y-Xbeta^*|^2$.
            $endgroup$
            – AugSB
            Dec 27 '18 at 18:31












          • $begingroup$
            "Assuming that $|y-Xbeta|^2leq |y-Xbeta^*|$" but that's just it, you can't make that assumption. The point of adding a regularizer like $|beta|^2$ is that you're willing to trade on the optimal value of error in exchange for other criteria.
            $endgroup$
            – Michael Grant
            Jan 7 at 16:51





















          0












          $begingroup$

          Here from (1)



          $$
          f(beta) = y'cdot y-2beta'cdot X'y+beta'cdot X'cdot Xcdotbeta+alpha beta'cdotbeta
          $$



          so the minimum condition gives



          $$
          -X'cdot y+X'cdot Xbeta+alphabeta = 0
          $$



          and then



          $$
          beta^* = (Ialpha +X'cdot X)^{-1}X'cdot y
          $$



          and from (2)



          $$
          L(beta,lambda,epsilon)=beta'cdotbeta + lambda(y'cdot y-2beta'cdot X'y+beta'cdot X'cdot Xcdotbeta-c+epsilon^2)
          $$



          the stationary points are



          $$
          L_{beta} = 2beta-2lambda X'cdot y + 2lambda X'cdot Xcdotbeta = 0
          $$



          then



          $$
          (I+lambda X'cdot X)cdotbeta^* = lambda X'cdot y
          $$



          or



          $$
          beta^* = (Ifrac{1}{lambda}+X'cdot X)^{-1}cdot X'cdot y
          $$



          but



          $$
          beta'^*cdotbeta^*-lambdabeta'^*cdot X'cdot y+lambda beta'^*cdot X'cdot Xcdotbeta^* = 0
          $$



          and



          $$
          lambda = frac{beta'cdotbeta}{y'cdot y-beta'cdot X'cdot y-c+epsilon^2}
          $$



          so the equivalence between (1) and (2) needs



          $$
          lambda = frac{1}{alpha} = frac{beta'^*cdotbeta^*}{y'cdot y-beta'^*cdot X'cdot y-c+epsilon^2}
          $$



          or



          $$
          alpha = frac{(y-Xcdot beta^*)'cdot y-c+epsilon}{beta'^*cdotbeta^*}
          $$



          which is quite unlikely






          share|cite|improve this answer









          $endgroup$














            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "69"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3052512%2fconditions-for-two-optimization-problems-to-yield-the-same-solution%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            3












            $begingroup$

            It is true for $alpha>0$. Since $beta^*$ is solution of (1), we have:
            $$|y-Xbeta^*|^2 + alpha|beta^*|^2 le |y-Xbeta|^2 + alpha|beta|^2.$$
            Reordering:
            $$|beta^*|^2 le frac{1}{alpha}(|y-Xbeta|^2 - |y-Xbeta^*|^2) + |beta|^2.$$
            Now, in (2) we take $beta$ such that $|y-Xbeta|^2 le |y-Xbeta^*|^2$, so we conclude that
            $$|beta^*|^2 le |beta|^2,$$
            which implies that $beta^*$ is a minimum of (2).



            Analogously, for $alpha<0$ you can check that $beta^*$ solution of (1) is also solution of (2), BUT maximizing instead of minimizing:
            $$alpha(|beta^*|^2 - |beta|^2) le |y-Xbeta|^2 - |y-Xbeta^*|^2 le 0 quadLongrightarrowquad |beta^*|^2 ge |beta|^2.$$
            Anyhow, note that you cannot assure equivalence, since the problem becomes non-convex for $alpha<0$.






            share|cite|improve this answer











            $endgroup$













            • $begingroup$
              I'm not convinced of your final statement. For one thing setting $alpha<0$ may make this problem non-convex. But I see no reason to believe it would give equivalent solutions if I were "maximizing instead of minimizing". After all, you're not negating the entire objective, just the regularizer.
              $endgroup$
              – Michael Grant
              Dec 26 '18 at 22:24










            • $begingroup$
              @MichaelGrant I did not say it is equivalent ;) Anyway, I have added a note to make clear that solution of (1) would be solution of (2) (maximizing instead of minimizing), but not the other way around.
              $endgroup$
              – AugSB
              Dec 27 '18 at 8:39












            • $begingroup$
              I'm afraid I still don't see it for the non-convex case at all. Fortunately I believe the OP is likely concerning himself with the convex case.
              $endgroup$
              – Michael Grant
              Dec 27 '18 at 13:38










            • $begingroup$
              Please, correct me if I'm wrong, because maybe I'm missing something. For any $alpha<0$, $beta^*$ being solution of (1) implies $|y-Xbeta^*|^2 + alpha|beta^*|^2 le |y-Xbeta|^2 + alpha|beta|^2$ for all $beta$. Therefore, $alpha(|beta^*|^2 - |beta|^2) le |y-Xbeta|^2 - |y-Xbeta^*|^2$ for all $beta$. Assuming that $|y-Xbeta|^2 le |y-Xbeta^*|^2$, we conclude that $|beta^*|^2 ge |beta|^2$ for all $beta$. So $beta^* = max_beta |beta|^2$ subject to $|y-Xbeta|^2 le |y-Xbeta^*|^2$.
              $endgroup$
              – AugSB
              Dec 27 '18 at 18:31












            • $begingroup$
              "Assuming that $|y-Xbeta|^2leq |y-Xbeta^*|$" but that's just it, you can't make that assumption. The point of adding a regularizer like $|beta|^2$ is that you're willing to trade on the optimal value of error in exchange for other criteria.
              $endgroup$
              – Michael Grant
              Jan 7 at 16:51


















            3












            $begingroup$

            It is true for $alpha>0$. Since $beta^*$ is solution of (1), we have:
            $$|y-Xbeta^*|^2 + alpha|beta^*|^2 le |y-Xbeta|^2 + alpha|beta|^2.$$
            Reordering:
            $$|beta^*|^2 le frac{1}{alpha}(|y-Xbeta|^2 - |y-Xbeta^*|^2) + |beta|^2.$$
            Now, in (2) we take $beta$ such that $|y-Xbeta|^2 le |y-Xbeta^*|^2$, so we conclude that
            $$|beta^*|^2 le |beta|^2,$$
            which implies that $beta^*$ is a minimum of (2).



            Analogously, for $alpha<0$ you can check that $beta^*$ solution of (1) is also solution of (2), BUT maximizing instead of minimizing:
            $$alpha(|beta^*|^2 - |beta|^2) le |y-Xbeta|^2 - |y-Xbeta^*|^2 le 0 quadLongrightarrowquad |beta^*|^2 ge |beta|^2.$$
            Anyhow, note that you cannot assure equivalence, since the problem becomes non-convex for $alpha<0$.






            share|cite|improve this answer











            $endgroup$













            • $begingroup$
              I'm not convinced of your final statement. For one thing setting $alpha<0$ may make this problem non-convex. But I see no reason to believe it would give equivalent solutions if I were "maximizing instead of minimizing". After all, you're not negating the entire objective, just the regularizer.
              $endgroup$
              – Michael Grant
              Dec 26 '18 at 22:24










            • $begingroup$
              @MichaelGrant I did not say it is equivalent ;) Anyway, I have added a note to make clear that solution of (1) would be solution of (2) (maximizing instead of minimizing), but not the other way around.
              $endgroup$
              – AugSB
              Dec 27 '18 at 8:39












            • $begingroup$
              I'm afraid I still don't see it for the non-convex case at all. Fortunately I believe the OP is likely concerning himself with the convex case.
              $endgroup$
              – Michael Grant
              Dec 27 '18 at 13:38










            • $begingroup$
              Please, correct me if I'm wrong, because maybe I'm missing something. For any $alpha<0$, $beta^*$ being solution of (1) implies $|y-Xbeta^*|^2 + alpha|beta^*|^2 le |y-Xbeta|^2 + alpha|beta|^2$ for all $beta$. Therefore, $alpha(|beta^*|^2 - |beta|^2) le |y-Xbeta|^2 - |y-Xbeta^*|^2$ for all $beta$. Assuming that $|y-Xbeta|^2 le |y-Xbeta^*|^2$, we conclude that $|beta^*|^2 ge |beta|^2$ for all $beta$. So $beta^* = max_beta |beta|^2$ subject to $|y-Xbeta|^2 le |y-Xbeta^*|^2$.
              $endgroup$
              – AugSB
              Dec 27 '18 at 18:31












            • $begingroup$
              "Assuming that $|y-Xbeta|^2leq |y-Xbeta^*|$" but that's just it, you can't make that assumption. The point of adding a regularizer like $|beta|^2$ is that you're willing to trade on the optimal value of error in exchange for other criteria.
              $endgroup$
              – Michael Grant
              Jan 7 at 16:51
















            3












            3








            3





            $begingroup$

            It is true for $alpha>0$. Since $beta^*$ is solution of (1), we have:
            $$|y-Xbeta^*|^2 + alpha|beta^*|^2 le |y-Xbeta|^2 + alpha|beta|^2.$$
            Reordering:
            $$|beta^*|^2 le frac{1}{alpha}(|y-Xbeta|^2 - |y-Xbeta^*|^2) + |beta|^2.$$
            Now, in (2) we take $beta$ such that $|y-Xbeta|^2 le |y-Xbeta^*|^2$, so we conclude that
            $$|beta^*|^2 le |beta|^2,$$
            which implies that $beta^*$ is a minimum of (2).



            Analogously, for $alpha<0$ you can check that $beta^*$ solution of (1) is also solution of (2), BUT maximizing instead of minimizing:
            $$alpha(|beta^*|^2 - |beta|^2) le |y-Xbeta|^2 - |y-Xbeta^*|^2 le 0 quadLongrightarrowquad |beta^*|^2 ge |beta|^2.$$
            Anyhow, note that you cannot assure equivalence, since the problem becomes non-convex for $alpha<0$.






            share|cite|improve this answer











            $endgroup$



            It is true for $alpha>0$. Since $beta^*$ is solution of (1), we have:
            $$|y-Xbeta^*|^2 + alpha|beta^*|^2 le |y-Xbeta|^2 + alpha|beta|^2.$$
            Reordering:
            $$|beta^*|^2 le frac{1}{alpha}(|y-Xbeta|^2 - |y-Xbeta^*|^2) + |beta|^2.$$
            Now, in (2) we take $beta$ such that $|y-Xbeta|^2 le |y-Xbeta^*|^2$, so we conclude that
            $$|beta^*|^2 le |beta|^2,$$
            which implies that $beta^*$ is a minimum of (2).



            Analogously, for $alpha<0$ you can check that $beta^*$ solution of (1) is also solution of (2), BUT maximizing instead of minimizing:
            $$alpha(|beta^*|^2 - |beta|^2) le |y-Xbeta|^2 - |y-Xbeta^*|^2 le 0 quadLongrightarrowquad |beta^*|^2 ge |beta|^2.$$
            Anyhow, note that you cannot assure equivalence, since the problem becomes non-convex for $alpha<0$.







            share|cite|improve this answer














            share|cite|improve this answer



            share|cite|improve this answer








            edited Dec 27 '18 at 8:35

























            answered Dec 26 '18 at 0:18









            AugSBAugSB

            3,45421734




            3,45421734












            • $begingroup$
              I'm not convinced of your final statement. For one thing setting $alpha<0$ may make this problem non-convex. But I see no reason to believe it would give equivalent solutions if I were "maximizing instead of minimizing". After all, you're not negating the entire objective, just the regularizer.
              $endgroup$
              – Michael Grant
              Dec 26 '18 at 22:24










            • $begingroup$
              @MichaelGrant I did not say it is equivalent ;) Anyway, I have added a note to make clear that solution of (1) would be solution of (2) (maximizing instead of minimizing), but not the other way around.
              $endgroup$
              – AugSB
              Dec 27 '18 at 8:39












            • $begingroup$
              I'm afraid I still don't see it for the non-convex case at all. Fortunately I believe the OP is likely concerning himself with the convex case.
              $endgroup$
              – Michael Grant
              Dec 27 '18 at 13:38










            • $begingroup$
              Please, correct me if I'm wrong, because maybe I'm missing something. For any $alpha<0$, $beta^*$ being solution of (1) implies $|y-Xbeta^*|^2 + alpha|beta^*|^2 le |y-Xbeta|^2 + alpha|beta|^2$ for all $beta$. Therefore, $alpha(|beta^*|^2 - |beta|^2) le |y-Xbeta|^2 - |y-Xbeta^*|^2$ for all $beta$. Assuming that $|y-Xbeta|^2 le |y-Xbeta^*|^2$, we conclude that $|beta^*|^2 ge |beta|^2$ for all $beta$. So $beta^* = max_beta |beta|^2$ subject to $|y-Xbeta|^2 le |y-Xbeta^*|^2$.
              $endgroup$
              – AugSB
              Dec 27 '18 at 18:31












            • $begingroup$
              "Assuming that $|y-Xbeta|^2leq |y-Xbeta^*|$" but that's just it, you can't make that assumption. The point of adding a regularizer like $|beta|^2$ is that you're willing to trade on the optimal value of error in exchange for other criteria.
              $endgroup$
              – Michael Grant
              Jan 7 at 16:51




















            • $begingroup$
              I'm not convinced of your final statement. For one thing setting $alpha<0$ may make this problem non-convex. But I see no reason to believe it would give equivalent solutions if I were "maximizing instead of minimizing". After all, you're not negating the entire objective, just the regularizer.
              $endgroup$
              – Michael Grant
              Dec 26 '18 at 22:24










            • $begingroup$
              @MichaelGrant I did not say it is equivalent ;) Anyway, I have added a note to make clear that solution of (1) would be solution of (2) (maximizing instead of minimizing), but not the other way around.
              $endgroup$
              – AugSB
              Dec 27 '18 at 8:39












            • $begingroup$
              I'm afraid I still don't see it for the non-convex case at all. Fortunately I believe the OP is likely concerning himself with the convex case.
              $endgroup$
              – Michael Grant
              Dec 27 '18 at 13:38










            • $begingroup$
              Please, correct me if I'm wrong, because maybe I'm missing something. For any $alpha<0$, $beta^*$ being solution of (1) implies $|y-Xbeta^*|^2 + alpha|beta^*|^2 le |y-Xbeta|^2 + alpha|beta|^2$ for all $beta$. Therefore, $alpha(|beta^*|^2 - |beta|^2) le |y-Xbeta|^2 - |y-Xbeta^*|^2$ for all $beta$. Assuming that $|y-Xbeta|^2 le |y-Xbeta^*|^2$, we conclude that $|beta^*|^2 ge |beta|^2$ for all $beta$. So $beta^* = max_beta |beta|^2$ subject to $|y-Xbeta|^2 le |y-Xbeta^*|^2$.
              $endgroup$
              – AugSB
              Dec 27 '18 at 18:31












            • $begingroup$
              "Assuming that $|y-Xbeta|^2leq |y-Xbeta^*|$" but that's just it, you can't make that assumption. The point of adding a regularizer like $|beta|^2$ is that you're willing to trade on the optimal value of error in exchange for other criteria.
              $endgroup$
              – Michael Grant
              Jan 7 at 16:51


















            $begingroup$
            I'm not convinced of your final statement. For one thing setting $alpha<0$ may make this problem non-convex. But I see no reason to believe it would give equivalent solutions if I were "maximizing instead of minimizing". After all, you're not negating the entire objective, just the regularizer.
            $endgroup$
            – Michael Grant
            Dec 26 '18 at 22:24




            $begingroup$
            I'm not convinced of your final statement. For one thing setting $alpha<0$ may make this problem non-convex. But I see no reason to believe it would give equivalent solutions if I were "maximizing instead of minimizing". After all, you're not negating the entire objective, just the regularizer.
            $endgroup$
            – Michael Grant
            Dec 26 '18 at 22:24












            $begingroup$
            @MichaelGrant I did not say it is equivalent ;) Anyway, I have added a note to make clear that solution of (1) would be solution of (2) (maximizing instead of minimizing), but not the other way around.
            $endgroup$
            – AugSB
            Dec 27 '18 at 8:39






            $begingroup$
            @MichaelGrant I did not say it is equivalent ;) Anyway, I have added a note to make clear that solution of (1) would be solution of (2) (maximizing instead of minimizing), but not the other way around.
            $endgroup$
            – AugSB
            Dec 27 '18 at 8:39














            $begingroup$
            I'm afraid I still don't see it for the non-convex case at all. Fortunately I believe the OP is likely concerning himself with the convex case.
            $endgroup$
            – Michael Grant
            Dec 27 '18 at 13:38




            $begingroup$
            I'm afraid I still don't see it for the non-convex case at all. Fortunately I believe the OP is likely concerning himself with the convex case.
            $endgroup$
            – Michael Grant
            Dec 27 '18 at 13:38












            $begingroup$
            Please, correct me if I'm wrong, because maybe I'm missing something. For any $alpha<0$, $beta^*$ being solution of (1) implies $|y-Xbeta^*|^2 + alpha|beta^*|^2 le |y-Xbeta|^2 + alpha|beta|^2$ for all $beta$. Therefore, $alpha(|beta^*|^2 - |beta|^2) le |y-Xbeta|^2 - |y-Xbeta^*|^2$ for all $beta$. Assuming that $|y-Xbeta|^2 le |y-Xbeta^*|^2$, we conclude that $|beta^*|^2 ge |beta|^2$ for all $beta$. So $beta^* = max_beta |beta|^2$ subject to $|y-Xbeta|^2 le |y-Xbeta^*|^2$.
            $endgroup$
            – AugSB
            Dec 27 '18 at 18:31






            $begingroup$
            Please, correct me if I'm wrong, because maybe I'm missing something. For any $alpha<0$, $beta^*$ being solution of (1) implies $|y-Xbeta^*|^2 + alpha|beta^*|^2 le |y-Xbeta|^2 + alpha|beta|^2$ for all $beta$. Therefore, $alpha(|beta^*|^2 - |beta|^2) le |y-Xbeta|^2 - |y-Xbeta^*|^2$ for all $beta$. Assuming that $|y-Xbeta|^2 le |y-Xbeta^*|^2$, we conclude that $|beta^*|^2 ge |beta|^2$ for all $beta$. So $beta^* = max_beta |beta|^2$ subject to $|y-Xbeta|^2 le |y-Xbeta^*|^2$.
            $endgroup$
            – AugSB
            Dec 27 '18 at 18:31














            $begingroup$
            "Assuming that $|y-Xbeta|^2leq |y-Xbeta^*|$" but that's just it, you can't make that assumption. The point of adding a regularizer like $|beta|^2$ is that you're willing to trade on the optimal value of error in exchange for other criteria.
            $endgroup$
            – Michael Grant
            Jan 7 at 16:51






            $begingroup$
            "Assuming that $|y-Xbeta|^2leq |y-Xbeta^*|$" but that's just it, you can't make that assumption. The point of adding a regularizer like $|beta|^2$ is that you're willing to trade on the optimal value of error in exchange for other criteria.
            $endgroup$
            – Michael Grant
            Jan 7 at 16:51













            0












            $begingroup$

            Here from (1)



            $$
            f(beta) = y'cdot y-2beta'cdot X'y+beta'cdot X'cdot Xcdotbeta+alpha beta'cdotbeta
            $$



            so the minimum condition gives



            $$
            -X'cdot y+X'cdot Xbeta+alphabeta = 0
            $$



            and then



            $$
            beta^* = (Ialpha +X'cdot X)^{-1}X'cdot y
            $$



            and from (2)



            $$
            L(beta,lambda,epsilon)=beta'cdotbeta + lambda(y'cdot y-2beta'cdot X'y+beta'cdot X'cdot Xcdotbeta-c+epsilon^2)
            $$



            the stationary points are



            $$
            L_{beta} = 2beta-2lambda X'cdot y + 2lambda X'cdot Xcdotbeta = 0
            $$



            then



            $$
            (I+lambda X'cdot X)cdotbeta^* = lambda X'cdot y
            $$



            or



            $$
            beta^* = (Ifrac{1}{lambda}+X'cdot X)^{-1}cdot X'cdot y
            $$



            but



            $$
            beta'^*cdotbeta^*-lambdabeta'^*cdot X'cdot y+lambda beta'^*cdot X'cdot Xcdotbeta^* = 0
            $$



            and



            $$
            lambda = frac{beta'cdotbeta}{y'cdot y-beta'cdot X'cdot y-c+epsilon^2}
            $$



            so the equivalence between (1) and (2) needs



            $$
            lambda = frac{1}{alpha} = frac{beta'^*cdotbeta^*}{y'cdot y-beta'^*cdot X'cdot y-c+epsilon^2}
            $$



            or



            $$
            alpha = frac{(y-Xcdot beta^*)'cdot y-c+epsilon}{beta'^*cdotbeta^*}
            $$



            which is quite unlikely






            share|cite|improve this answer









            $endgroup$


















              0












              $begingroup$

              Here from (1)



              $$
              f(beta) = y'cdot y-2beta'cdot X'y+beta'cdot X'cdot Xcdotbeta+alpha beta'cdotbeta
              $$



              so the minimum condition gives



              $$
              -X'cdot y+X'cdot Xbeta+alphabeta = 0
              $$



              and then



              $$
              beta^* = (Ialpha +X'cdot X)^{-1}X'cdot y
              $$



              and from (2)



              $$
              L(beta,lambda,epsilon)=beta'cdotbeta + lambda(y'cdot y-2beta'cdot X'y+beta'cdot X'cdot Xcdotbeta-c+epsilon^2)
              $$



              the stationary points are



              $$
              L_{beta} = 2beta-2lambda X'cdot y + 2lambda X'cdot Xcdotbeta = 0
              $$



              then



              $$
              (I+lambda X'cdot X)cdotbeta^* = lambda X'cdot y
              $$



              or



              $$
              beta^* = (Ifrac{1}{lambda}+X'cdot X)^{-1}cdot X'cdot y
              $$



              but



              $$
              beta'^*cdotbeta^*-lambdabeta'^*cdot X'cdot y+lambda beta'^*cdot X'cdot Xcdotbeta^* = 0
              $$



              and



              $$
              lambda = frac{beta'cdotbeta}{y'cdot y-beta'cdot X'cdot y-c+epsilon^2}
              $$



              so the equivalence between (1) and (2) needs



              $$
              lambda = frac{1}{alpha} = frac{beta'^*cdotbeta^*}{y'cdot y-beta'^*cdot X'cdot y-c+epsilon^2}
              $$



              or



              $$
              alpha = frac{(y-Xcdot beta^*)'cdot y-c+epsilon}{beta'^*cdotbeta^*}
              $$



              which is quite unlikely






              share|cite|improve this answer









              $endgroup$
















                0












                0








                0





                $begingroup$

                Here from (1)



                $$
                f(beta) = y'cdot y-2beta'cdot X'y+beta'cdot X'cdot Xcdotbeta+alpha beta'cdotbeta
                $$



                so the minimum condition gives



                $$
                -X'cdot y+X'cdot Xbeta+alphabeta = 0
                $$



                and then



                $$
                beta^* = (Ialpha +X'cdot X)^{-1}X'cdot y
                $$



                and from (2)



                $$
                L(beta,lambda,epsilon)=beta'cdotbeta + lambda(y'cdot y-2beta'cdot X'y+beta'cdot X'cdot Xcdotbeta-c+epsilon^2)
                $$



                the stationary points are



                $$
                L_{beta} = 2beta-2lambda X'cdot y + 2lambda X'cdot Xcdotbeta = 0
                $$



                then



                $$
                (I+lambda X'cdot X)cdotbeta^* = lambda X'cdot y
                $$



                or



                $$
                beta^* = (Ifrac{1}{lambda}+X'cdot X)^{-1}cdot X'cdot y
                $$



                but



                $$
                beta'^*cdotbeta^*-lambdabeta'^*cdot X'cdot y+lambda beta'^*cdot X'cdot Xcdotbeta^* = 0
                $$



                and



                $$
                lambda = frac{beta'cdotbeta}{y'cdot y-beta'cdot X'cdot y-c+epsilon^2}
                $$



                so the equivalence between (1) and (2) needs



                $$
                lambda = frac{1}{alpha} = frac{beta'^*cdotbeta^*}{y'cdot y-beta'^*cdot X'cdot y-c+epsilon^2}
                $$



                or



                $$
                alpha = frac{(y-Xcdot beta^*)'cdot y-c+epsilon}{beta'^*cdotbeta^*}
                $$



                which is quite unlikely






                share|cite|improve this answer









                $endgroup$



                Here from (1)



                $$
                f(beta) = y'cdot y-2beta'cdot X'y+beta'cdot X'cdot Xcdotbeta+alpha beta'cdotbeta
                $$



                so the minimum condition gives



                $$
                -X'cdot y+X'cdot Xbeta+alphabeta = 0
                $$



                and then



                $$
                beta^* = (Ialpha +X'cdot X)^{-1}X'cdot y
                $$



                and from (2)



                $$
                L(beta,lambda,epsilon)=beta'cdotbeta + lambda(y'cdot y-2beta'cdot X'y+beta'cdot X'cdot Xcdotbeta-c+epsilon^2)
                $$



                the stationary points are



                $$
                L_{beta} = 2beta-2lambda X'cdot y + 2lambda X'cdot Xcdotbeta = 0
                $$



                then



                $$
                (I+lambda X'cdot X)cdotbeta^* = lambda X'cdot y
                $$



                or



                $$
                beta^* = (Ifrac{1}{lambda}+X'cdot X)^{-1}cdot X'cdot y
                $$



                but



                $$
                beta'^*cdotbeta^*-lambdabeta'^*cdot X'cdot y+lambda beta'^*cdot X'cdot Xcdotbeta^* = 0
                $$



                and



                $$
                lambda = frac{beta'cdotbeta}{y'cdot y-beta'cdot X'cdot y-c+epsilon^2}
                $$



                so the equivalence between (1) and (2) needs



                $$
                lambda = frac{1}{alpha} = frac{beta'^*cdotbeta^*}{y'cdot y-beta'^*cdot X'cdot y-c+epsilon^2}
                $$



                or



                $$
                alpha = frac{(y-Xcdot beta^*)'cdot y-c+epsilon}{beta'^*cdotbeta^*}
                $$



                which is quite unlikely







                share|cite|improve this answer












                share|cite|improve this answer



                share|cite|improve this answer










                answered Dec 28 '18 at 11:55









                CesareoCesareo

                9,7863517




                9,7863517






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Mathematics Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3052512%2fconditions-for-two-optimization-problems-to-yield-the-same-solution%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Le Mesnil-Réaume

                    Ida-Boy-Ed-Garten

                    web3.py web3.isConnected() returns false always