Why Normality assumption in linear regressionProbability of x given past data and linear model...

If I deleted a game I lost the disc for, can I reinstall it digitally?

Can an insurance company drop you after receiving a bill and refusing to pay?

Is a debit card dangerous in my situation?

Why zero tolerance on nudity in space?

Can a hotel cancel a confirmed reservation?

How much mayhem could I cause as a sentient fish?

what does しにみえてる mean?

How do you funnel food off a cutting board?

Why do no American passenger airlines still operate dedicated cargo flights?

Am I a Rude Number?

How would an AI self awareness kill switch work?

Why do neural networks need so many training examples to perform?

Can a person refuse a presidential pardon?

Using only 1s, make 29 with the minimum number of digits

Intern applicant asking for compensation equivalent to that of permanent employee

Can I write a book of my D&D game?

Can you tell from a blurry photo if focus was too close or too far?

Does SQL Server 2017, including older versions, support 8k disk sector sizes?

awk + sum all numbers

Blindfold battle as a gladiatorial spectacle - what are the tactics and communication methods?

Would a National Army of mercenaries be a feasible idea?

Caruana vs Carlsen game 10 (WCC) why not 18...Nxb6?

How to say "Brexit" in Latin?

Porting Linux to another platform requirements



Why Normality assumption in linear regression


Probability of x given past data and linear model assumptionNormality assumption in linear regressionIs it necessary to plot histogram of dependent variable before running simple linear regression?Assumptions behind simple linear regression modelOLS vs. maximum likelihood under Normal distribution in linear regressionfrom where the error in target variable comes in linear regressionWhy linear regression has assumption on residual but generalized linear model has assumptions on response?Distribution of $(n-2)MSres/sigma^2$ in simple linear regressionHomoscedasticity assumption in simple linear regressionWhat if the Error is Not Normal in Linear Regression?













2












$begingroup$


My question is very simple: why we choose normal as the distribution that error term follows in the assumption of linear regression? Why we don't choose others like uniform, t or whatever?










share|cite|improve this question









$endgroup$












  • $begingroup$
    We don't choose the normal assumption. It just happens to be the case that when the error is normal, the model coefficients exactly follow a normal distribution and an exact F-test can be used to test hypotheses about them.
    $endgroup$
    – AdamO
    3 hours ago








  • 1




    $begingroup$
    Because the math works out easily enough that people could use it before modern computers.
    $endgroup$
    – Nat
    3 hours ago


















2












$begingroup$


My question is very simple: why we choose normal as the distribution that error term follows in the assumption of linear regression? Why we don't choose others like uniform, t or whatever?










share|cite|improve this question









$endgroup$












  • $begingroup$
    We don't choose the normal assumption. It just happens to be the case that when the error is normal, the model coefficients exactly follow a normal distribution and an exact F-test can be used to test hypotheses about them.
    $endgroup$
    – AdamO
    3 hours ago








  • 1




    $begingroup$
    Because the math works out easily enough that people could use it before modern computers.
    $endgroup$
    – Nat
    3 hours ago
















2












2








2





$begingroup$


My question is very simple: why we choose normal as the distribution that error term follows in the assumption of linear regression? Why we don't choose others like uniform, t or whatever?










share|cite|improve this question









$endgroup$




My question is very simple: why we choose normal as the distribution that error term follows in the assumption of linear regression? Why we don't choose others like uniform, t or whatever?







regression mathematical-statistics normal-distribution error linear






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked 3 hours ago









Master ShiMaster Shi

211




211












  • $begingroup$
    We don't choose the normal assumption. It just happens to be the case that when the error is normal, the model coefficients exactly follow a normal distribution and an exact F-test can be used to test hypotheses about them.
    $endgroup$
    – AdamO
    3 hours ago








  • 1




    $begingroup$
    Because the math works out easily enough that people could use it before modern computers.
    $endgroup$
    – Nat
    3 hours ago




















  • $begingroup$
    We don't choose the normal assumption. It just happens to be the case that when the error is normal, the model coefficients exactly follow a normal distribution and an exact F-test can be used to test hypotheses about them.
    $endgroup$
    – AdamO
    3 hours ago








  • 1




    $begingroup$
    Because the math works out easily enough that people could use it before modern computers.
    $endgroup$
    – Nat
    3 hours ago


















$begingroup$
We don't choose the normal assumption. It just happens to be the case that when the error is normal, the model coefficients exactly follow a normal distribution and an exact F-test can be used to test hypotheses about them.
$endgroup$
– AdamO
3 hours ago






$begingroup$
We don't choose the normal assumption. It just happens to be the case that when the error is normal, the model coefficients exactly follow a normal distribution and an exact F-test can be used to test hypotheses about them.
$endgroup$
– AdamO
3 hours ago






1




1




$begingroup$
Because the math works out easily enough that people could use it before modern computers.
$endgroup$
– Nat
3 hours ago






$begingroup$
Because the math works out easily enough that people could use it before modern computers.
$endgroup$
– Nat
3 hours ago












1 Answer
1






active

oldest

votes


















8












$begingroup$

You can choose another error distribution; they basically just change the loss function.



This is certainly done.



Laplace (double exponential errors) correspond to least absolute deviations regression/$L_1$ regression (which numerous posts on site discuss). Regressions with t-errors are occasionally used (in some cases because they're more robust to gross errors), though they can have a disadvantage -- the likelihood (and therefore the negative of the loss) can have multiple modes.



Uniform errors correspond to an $L_infty$ loss (minimize the maximum deviation); such regression is sometimes called Chebyshev approximation (though beware, since there's another thing with essentially the same name). Again, this is sometimes done (indeed for simple regression and smallish data sets with bounded errors with constant spread the fit is often easy enough to find by hand, directly on a plot, though in practice you can use linear programming methods, or other algorithms; indeed, $L_infty$ and $L_1$ regression problems are duals of each other, which can lead to sometimes convenient shortcuts for some problems).



Many other choices are possible and quite a few have been used in practice.



[Note that if you have additive, independent, constant-spread errors with a density of the form $k,exp(-c.g(varepsilon))$, maximizing the likelihood will correspond to minimizing $sum_i g(e_i)$, where $e_i$ is the $i$th residual.]






share|cite|improve this answer











$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "65"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f395011%2fwhy-normality-assumption-in-linear-regression%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    8












    $begingroup$

    You can choose another error distribution; they basically just change the loss function.



    This is certainly done.



    Laplace (double exponential errors) correspond to least absolute deviations regression/$L_1$ regression (which numerous posts on site discuss). Regressions with t-errors are occasionally used (in some cases because they're more robust to gross errors), though they can have a disadvantage -- the likelihood (and therefore the negative of the loss) can have multiple modes.



    Uniform errors correspond to an $L_infty$ loss (minimize the maximum deviation); such regression is sometimes called Chebyshev approximation (though beware, since there's another thing with essentially the same name). Again, this is sometimes done (indeed for simple regression and smallish data sets with bounded errors with constant spread the fit is often easy enough to find by hand, directly on a plot, though in practice you can use linear programming methods, or other algorithms; indeed, $L_infty$ and $L_1$ regression problems are duals of each other, which can lead to sometimes convenient shortcuts for some problems).



    Many other choices are possible and quite a few have been used in practice.



    [Note that if you have additive, independent, constant-spread errors with a density of the form $k,exp(-c.g(varepsilon))$, maximizing the likelihood will correspond to minimizing $sum_i g(e_i)$, where $e_i$ is the $i$th residual.]






    share|cite|improve this answer











    $endgroup$


















      8












      $begingroup$

      You can choose another error distribution; they basically just change the loss function.



      This is certainly done.



      Laplace (double exponential errors) correspond to least absolute deviations regression/$L_1$ regression (which numerous posts on site discuss). Regressions with t-errors are occasionally used (in some cases because they're more robust to gross errors), though they can have a disadvantage -- the likelihood (and therefore the negative of the loss) can have multiple modes.



      Uniform errors correspond to an $L_infty$ loss (minimize the maximum deviation); such regression is sometimes called Chebyshev approximation (though beware, since there's another thing with essentially the same name). Again, this is sometimes done (indeed for simple regression and smallish data sets with bounded errors with constant spread the fit is often easy enough to find by hand, directly on a plot, though in practice you can use linear programming methods, or other algorithms; indeed, $L_infty$ and $L_1$ regression problems are duals of each other, which can lead to sometimes convenient shortcuts for some problems).



      Many other choices are possible and quite a few have been used in practice.



      [Note that if you have additive, independent, constant-spread errors with a density of the form $k,exp(-c.g(varepsilon))$, maximizing the likelihood will correspond to minimizing $sum_i g(e_i)$, where $e_i$ is the $i$th residual.]






      share|cite|improve this answer











      $endgroup$
















        8












        8








        8





        $begingroup$

        You can choose another error distribution; they basically just change the loss function.



        This is certainly done.



        Laplace (double exponential errors) correspond to least absolute deviations regression/$L_1$ regression (which numerous posts on site discuss). Regressions with t-errors are occasionally used (in some cases because they're more robust to gross errors), though they can have a disadvantage -- the likelihood (and therefore the negative of the loss) can have multiple modes.



        Uniform errors correspond to an $L_infty$ loss (minimize the maximum deviation); such regression is sometimes called Chebyshev approximation (though beware, since there's another thing with essentially the same name). Again, this is sometimes done (indeed for simple regression and smallish data sets with bounded errors with constant spread the fit is often easy enough to find by hand, directly on a plot, though in practice you can use linear programming methods, or other algorithms; indeed, $L_infty$ and $L_1$ regression problems are duals of each other, which can lead to sometimes convenient shortcuts for some problems).



        Many other choices are possible and quite a few have been used in practice.



        [Note that if you have additive, independent, constant-spread errors with a density of the form $k,exp(-c.g(varepsilon))$, maximizing the likelihood will correspond to minimizing $sum_i g(e_i)$, where $e_i$ is the $i$th residual.]






        share|cite|improve this answer











        $endgroup$



        You can choose another error distribution; they basically just change the loss function.



        This is certainly done.



        Laplace (double exponential errors) correspond to least absolute deviations regression/$L_1$ regression (which numerous posts on site discuss). Regressions with t-errors are occasionally used (in some cases because they're more robust to gross errors), though they can have a disadvantage -- the likelihood (and therefore the negative of the loss) can have multiple modes.



        Uniform errors correspond to an $L_infty$ loss (minimize the maximum deviation); such regression is sometimes called Chebyshev approximation (though beware, since there's another thing with essentially the same name). Again, this is sometimes done (indeed for simple regression and smallish data sets with bounded errors with constant spread the fit is often easy enough to find by hand, directly on a plot, though in practice you can use linear programming methods, or other algorithms; indeed, $L_infty$ and $L_1$ regression problems are duals of each other, which can lead to sometimes convenient shortcuts for some problems).



        Many other choices are possible and quite a few have been used in practice.



        [Note that if you have additive, independent, constant-spread errors with a density of the form $k,exp(-c.g(varepsilon))$, maximizing the likelihood will correspond to minimizing $sum_i g(e_i)$, where $e_i$ is the $i$th residual.]







        share|cite|improve this answer














        share|cite|improve this answer



        share|cite|improve this answer








        edited 1 hour ago

























        answered 3 hours ago









        Glen_bGlen_b

        212k22409758




        212k22409758






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Cross Validated!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f395011%2fwhy-normality-assumption-in-linear-regression%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            is 'sed' thread safeWhat should someone know about using Python scripts in the shell?Nexenta bash script uses...

            How do i solve the “ No module named 'mlxtend' ” issue on Jupyter?

            Pilgersdorf Inhaltsverzeichnis Geografie | Geschichte | Bevölkerungsentwicklung | Politik | Kultur...