Why prior in MAP could be ignored?information leakage when using empirical Bayesian to generate a...

How to deal with an incendiary email that was recalled

Is a debit card dangerous in my situation?

What is the lore based reason that the Spectator has the Create Food and Water trait, instead of simply not requiring food and water?

Writing a character who is going through a civilizing process without overdoing it?

Why isn't there a non-conducting core wire for high-frequency coil applications

How would an AI self awareness kill switch work?

If I deleted a game I lost the disc for, can I reinstall it digitally?

awk + sum all numbers

Am I a Rude Number?

Roman Numerals equation 1

How to count the characters of jar files by wc

How should I handle players who ignore the session zero agreement?

Why avoid shared user accounts?

Why did other German political parties disband so fast when Hitler was appointed chancellor?

Why did the villain in the first Men in Black movie care about Earth's Cockroaches?

Using only 1s, make 29 with the minimum number of digits

Why exactly do action photographers need high fps burst cameras?

Can I become debt free or should I file bankruptcy ? How to manage my debt and finances?

Strange Sign on Lab Door

Could a phylactery of a lich be a mirror or does it have to be a box?

Why do neural networks need so many training examples to perform?

Why publish a research paper when a blog post or a lecture slide can have more citation count than a journal paper?

Is that a center tap tranformer just labelled differently?

Dilemma of explaining to interviewer that he is the reason for declining second interview



Why prior in MAP could be ignored?


information leakage when using empirical Bayesian to generate a predictorBayesian linear regression / categorical variable / Laplace priorHow does binary cross entropy work?Limits of using a normal distribution in Bayesian inferenceHow exactly do Convolution and Pooling act as infinitely strong prior? A mathematically written explanation will be very usefulCalibrate the predicted class probability to make it represent a true probability?Intuitive logic behind Naive Bayes and Bayes Theorem. Why does Naive Bayes multiply/input prior probability twice?Why the outputs of a machine learning model are not sampled at the prediction time?How do I combine two electromagnetic readings to predict the position of a sensor?MAP estimator - Computation Solution













1












$begingroup$


A posterior $p(thetavert x) = frac{p(x vert theta)p(theta)}{p(x)} $



Many materials say that since the $p(x)$ is a constant, the $p(x)$ can be ignored. Thus, $p(thetavert x) propto p(x vert theta)p(theta)$



My question is why $p(x)$ is a constant and ignored. Is this because even though we don't know the distribution x, there is a corresponding true distribution for $x$? So, $p(x) $ is a constant (we don't know but already determined) and thus, can be ignored?










share|improve this question









New contributor




shashack is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    Oh, I made the mistake $p(x)$ is not a prior. Prior is $p(theta)$. I edited my question!
    $endgroup$
    – shashack
    1 min ago
















1












$begingroup$


A posterior $p(thetavert x) = frac{p(x vert theta)p(theta)}{p(x)} $



Many materials say that since the $p(x)$ is a constant, the $p(x)$ can be ignored. Thus, $p(thetavert x) propto p(x vert theta)p(theta)$



My question is why $p(x)$ is a constant and ignored. Is this because even though we don't know the distribution x, there is a corresponding true distribution for $x$? So, $p(x) $ is a constant (we don't know but already determined) and thus, can be ignored?










share|improve this question









New contributor




shashack is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    Oh, I made the mistake $p(x)$ is not a prior. Prior is $p(theta)$. I edited my question!
    $endgroup$
    – shashack
    1 min ago














1












1








1





$begingroup$


A posterior $p(thetavert x) = frac{p(x vert theta)p(theta)}{p(x)} $



Many materials say that since the $p(x)$ is a constant, the $p(x)$ can be ignored. Thus, $p(thetavert x) propto p(x vert theta)p(theta)$



My question is why $p(x)$ is a constant and ignored. Is this because even though we don't know the distribution x, there is a corresponding true distribution for $x$? So, $p(x) $ is a constant (we don't know but already determined) and thus, can be ignored?










share|improve this question









New contributor




shashack is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




A posterior $p(thetavert x) = frac{p(x vert theta)p(theta)}{p(x)} $



Many materials say that since the $p(x)$ is a constant, the $p(x)$ can be ignored. Thus, $p(thetavert x) propto p(x vert theta)p(theta)$



My question is why $p(x)$ is a constant and ignored. Is this because even though we don't know the distribution x, there is a corresponding true distribution for $x$? So, $p(x) $ is a constant (we don't know but already determined) and thus, can be ignored?







probability bayesian






share|improve this question









New contributor




shashack is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




shashack is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 1 min ago







shashack













New contributor




shashack is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 21 hours ago









shashackshashack

1084




1084




New contributor




shashack is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





shashack is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






shashack is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • $begingroup$
    Oh, I made the mistake $p(x)$ is not a prior. Prior is $p(theta)$. I edited my question!
    $endgroup$
    – shashack
    1 min ago


















  • $begingroup$
    Oh, I made the mistake $p(x)$ is not a prior. Prior is $p(theta)$. I edited my question!
    $endgroup$
    – shashack
    1 min ago
















$begingroup$
Oh, I made the mistake $p(x)$ is not a prior. Prior is $p(theta)$. I edited my question!
$endgroup$
– shashack
1 min ago




$begingroup$
Oh, I made the mistake $p(x)$ is not a prior. Prior is $p(theta)$. I edited my question!
$endgroup$
– shashack
1 min ago










3 Answers
3






active

oldest

votes


















1












$begingroup$

You are right, $P(x)$ is the underlying distribution of data, and it can be assumed constant, simply because it is independent of our modeling practice. However $P(x)$ may change gradually over time, this phenomenon is called "distribution shift". For example, the distribution of height in a country may change over years.



Note that "prior" is a reserved word for $P(theta)$ which is the distribution of model parameters in Bayesian modeling. In non-Bayesian modeling there is no notion of "prior". $P(x|theta)$ is called likelihood and $P(x)=sum_{theta}P(x|theta)P(theta)$ is called marginalized likelihood (likelihood marginalized over model parameters).






share|improve this answer










New contributor




P. Esmailian is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$













  • $begingroup$
    Thank you I edited the prior part!
    $endgroup$
    – shashack
    21 secs ago



















1












$begingroup$

The maximum a posteriori definition usually goes like this:



$$ argmax_{theta} p(theta|x) = argmax_{theta} frac{p(x|theta)cdot p(theta)}{p(x)} $$



and given that $p(x)$ is independent of $theta$, it's not needed for finding the $argmax_{theta}$, then you have



$$ argmax_{theta} p(theta|x) = argmax_{theta} frac{p(x|theta)cdot p(theta)}{p(x)} = argmax_{theta} p(x|theta)cdot p(theta)$$






share|improve this answer








New contributor




glhuilli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$





















    1












    $begingroup$

    Yes. Presumably whatever process 'generates' data produces data that follows some distribution. $p(x)$, however, is just a number, and whatever it is, it is a constant w.r.t. $theta$.



    I wouldn't call $p(x)$ a prior, as it implies there's some posterior probability of the data we compute. We don't; nothing about this process updates belief about the probability of the. "Prior" refers to $p(theta)$.



    $p(x)$ can't be ignored if you really want the value of $p(theta|x)$. But usually you compute that from the posterior distribution that you will have a closed-form formula for.






    share|improve this answer









    $endgroup$













      Your Answer





      StackExchange.ifUsing("editor", function () {
      return StackExchange.using("mathjaxEditing", function () {
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      });
      });
      }, "mathjax-editing");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "557"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });






      shashack is a new contributor. Be nice, and check out our Code of Conduct.










      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46375%2fwhy-prior-in-map-could-be-ignored%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1












      $begingroup$

      You are right, $P(x)$ is the underlying distribution of data, and it can be assumed constant, simply because it is independent of our modeling practice. However $P(x)$ may change gradually over time, this phenomenon is called "distribution shift". For example, the distribution of height in a country may change over years.



      Note that "prior" is a reserved word for $P(theta)$ which is the distribution of model parameters in Bayesian modeling. In non-Bayesian modeling there is no notion of "prior". $P(x|theta)$ is called likelihood and $P(x)=sum_{theta}P(x|theta)P(theta)$ is called marginalized likelihood (likelihood marginalized over model parameters).






      share|improve this answer










      New contributor




      P. Esmailian is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$













      • $begingroup$
        Thank you I edited the prior part!
        $endgroup$
        – shashack
        21 secs ago
















      1












      $begingroup$

      You are right, $P(x)$ is the underlying distribution of data, and it can be assumed constant, simply because it is independent of our modeling practice. However $P(x)$ may change gradually over time, this phenomenon is called "distribution shift". For example, the distribution of height in a country may change over years.



      Note that "prior" is a reserved word for $P(theta)$ which is the distribution of model parameters in Bayesian modeling. In non-Bayesian modeling there is no notion of "prior". $P(x|theta)$ is called likelihood and $P(x)=sum_{theta}P(x|theta)P(theta)$ is called marginalized likelihood (likelihood marginalized over model parameters).






      share|improve this answer










      New contributor




      P. Esmailian is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$













      • $begingroup$
        Thank you I edited the prior part!
        $endgroup$
        – shashack
        21 secs ago














      1












      1








      1





      $begingroup$

      You are right, $P(x)$ is the underlying distribution of data, and it can be assumed constant, simply because it is independent of our modeling practice. However $P(x)$ may change gradually over time, this phenomenon is called "distribution shift". For example, the distribution of height in a country may change over years.



      Note that "prior" is a reserved word for $P(theta)$ which is the distribution of model parameters in Bayesian modeling. In non-Bayesian modeling there is no notion of "prior". $P(x|theta)$ is called likelihood and $P(x)=sum_{theta}P(x|theta)P(theta)$ is called marginalized likelihood (likelihood marginalized over model parameters).






      share|improve this answer










      New contributor




      P. Esmailian is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$



      You are right, $P(x)$ is the underlying distribution of data, and it can be assumed constant, simply because it is independent of our modeling practice. However $P(x)$ may change gradually over time, this phenomenon is called "distribution shift". For example, the distribution of height in a country may change over years.



      Note that "prior" is a reserved word for $P(theta)$ which is the distribution of model parameters in Bayesian modeling. In non-Bayesian modeling there is no notion of "prior". $P(x|theta)$ is called likelihood and $P(x)=sum_{theta}P(x|theta)P(theta)$ is called marginalized likelihood (likelihood marginalized over model parameters).







      share|improve this answer










      New contributor




      P. Esmailian is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this answer



      share|improve this answer








      edited 6 hours ago





















      New contributor




      P. Esmailian is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      answered 13 hours ago









      P. EsmailianP. Esmailian

      862




      862




      New contributor




      P. Esmailian is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      P. Esmailian is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      P. Esmailian is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.












      • $begingroup$
        Thank you I edited the prior part!
        $endgroup$
        – shashack
        21 secs ago


















      • $begingroup$
        Thank you I edited the prior part!
        $endgroup$
        – shashack
        21 secs ago
















      $begingroup$
      Thank you I edited the prior part!
      $endgroup$
      – shashack
      21 secs ago




      $begingroup$
      Thank you I edited the prior part!
      $endgroup$
      – shashack
      21 secs ago











      1












      $begingroup$

      The maximum a posteriori definition usually goes like this:



      $$ argmax_{theta} p(theta|x) = argmax_{theta} frac{p(x|theta)cdot p(theta)}{p(x)} $$



      and given that $p(x)$ is independent of $theta$, it's not needed for finding the $argmax_{theta}$, then you have



      $$ argmax_{theta} p(theta|x) = argmax_{theta} frac{p(x|theta)cdot p(theta)}{p(x)} = argmax_{theta} p(x|theta)cdot p(theta)$$






      share|improve this answer








      New contributor




      glhuilli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$


















        1












        $begingroup$

        The maximum a posteriori definition usually goes like this:



        $$ argmax_{theta} p(theta|x) = argmax_{theta} frac{p(x|theta)cdot p(theta)}{p(x)} $$



        and given that $p(x)$ is independent of $theta$, it's not needed for finding the $argmax_{theta}$, then you have



        $$ argmax_{theta} p(theta|x) = argmax_{theta} frac{p(x|theta)cdot p(theta)}{p(x)} = argmax_{theta} p(x|theta)cdot p(theta)$$






        share|improve this answer








        New contributor




        glhuilli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        $endgroup$
















          1












          1








          1





          $begingroup$

          The maximum a posteriori definition usually goes like this:



          $$ argmax_{theta} p(theta|x) = argmax_{theta} frac{p(x|theta)cdot p(theta)}{p(x)} $$



          and given that $p(x)$ is independent of $theta$, it's not needed for finding the $argmax_{theta}$, then you have



          $$ argmax_{theta} p(theta|x) = argmax_{theta} frac{p(x|theta)cdot p(theta)}{p(x)} = argmax_{theta} p(x|theta)cdot p(theta)$$






          share|improve this answer








          New contributor




          glhuilli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$



          The maximum a posteriori definition usually goes like this:



          $$ argmax_{theta} p(theta|x) = argmax_{theta} frac{p(x|theta)cdot p(theta)}{p(x)} $$



          and given that $p(x)$ is independent of $theta$, it's not needed for finding the $argmax_{theta}$, then you have



          $$ argmax_{theta} p(theta|x) = argmax_{theta} frac{p(x|theta)cdot p(theta)}{p(x)} = argmax_{theta} p(x|theta)cdot p(theta)$$







          share|improve this answer








          New contributor




          glhuilli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          share|improve this answer



          share|improve this answer






          New contributor




          glhuilli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          answered 16 hours ago









          glhuilliglhuilli

          616




          616




          New contributor




          glhuilli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





          New contributor





          glhuilli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          glhuilli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.























              1












              $begingroup$

              Yes. Presumably whatever process 'generates' data produces data that follows some distribution. $p(x)$, however, is just a number, and whatever it is, it is a constant w.r.t. $theta$.



              I wouldn't call $p(x)$ a prior, as it implies there's some posterior probability of the data we compute. We don't; nothing about this process updates belief about the probability of the. "Prior" refers to $p(theta)$.



              $p(x)$ can't be ignored if you really want the value of $p(theta|x)$. But usually you compute that from the posterior distribution that you will have a closed-form formula for.






              share|improve this answer









              $endgroup$


















                1












                $begingroup$

                Yes. Presumably whatever process 'generates' data produces data that follows some distribution. $p(x)$, however, is just a number, and whatever it is, it is a constant w.r.t. $theta$.



                I wouldn't call $p(x)$ a prior, as it implies there's some posterior probability of the data we compute. We don't; nothing about this process updates belief about the probability of the. "Prior" refers to $p(theta)$.



                $p(x)$ can't be ignored if you really want the value of $p(theta|x)$. But usually you compute that from the posterior distribution that you will have a closed-form formula for.






                share|improve this answer









                $endgroup$
















                  1












                  1








                  1





                  $begingroup$

                  Yes. Presumably whatever process 'generates' data produces data that follows some distribution. $p(x)$, however, is just a number, and whatever it is, it is a constant w.r.t. $theta$.



                  I wouldn't call $p(x)$ a prior, as it implies there's some posterior probability of the data we compute. We don't; nothing about this process updates belief about the probability of the. "Prior" refers to $p(theta)$.



                  $p(x)$ can't be ignored if you really want the value of $p(theta|x)$. But usually you compute that from the posterior distribution that you will have a closed-form formula for.






                  share|improve this answer









                  $endgroup$



                  Yes. Presumably whatever process 'generates' data produces data that follows some distribution. $p(x)$, however, is just a number, and whatever it is, it is a constant w.r.t. $theta$.



                  I wouldn't call $p(x)$ a prior, as it implies there's some posterior probability of the data we compute. We don't; nothing about this process updates belief about the probability of the. "Prior" refers to $p(theta)$.



                  $p(x)$ can't be ignored if you really want the value of $p(theta|x)$. But usually you compute that from the posterior distribution that you will have a closed-form formula for.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 13 hours ago









                  Sean OwenSean Owen

                  4,17141934




                  4,17141934






















                      shashack is a new contributor. Be nice, and check out our Code of Conduct.










                      draft saved

                      draft discarded


















                      shashack is a new contributor. Be nice, and check out our Code of Conduct.













                      shashack is a new contributor. Be nice, and check out our Code of Conduct.












                      shashack is a new contributor. Be nice, and check out our Code of Conduct.
















                      Thanks for contributing an answer to Data Science Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46375%2fwhy-prior-in-map-could-be-ignored%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      is 'sed' thread safeWhat should someone know about using Python scripts in the shell?Nexenta bash script uses...

                      How do i solve the “ No module named 'mlxtend' ” issue on Jupyter?

                      Pilgersdorf Inhaltsverzeichnis Geografie | Geschichte | Bevölkerungsentwicklung | Politik | Kultur...