Knowing Feature Importance from Sparse MatrixDeep Learning vs gradient boosting: When to use what?How to...

Difference between `vector<int> v;` and `vector<int> v = vector<int>();`

Pronunciation of umlaut vowels in the history of German

Why did other German political parties disband so fast when Hitler was appointed chancellor?

Does paint affect EMI ability of enclosure?

Table formatting top left corner caption

Avoiding morning and evening handshakes

What are "industrial chops"?

Why publish a research paper when a blog post or a lecture slide can have more citation count than a journal paper?

Using only 1s, make 29 with the minimum number of digits

Do authors have to be politically correct in article-writing?

Intern applicant asking for compensation equivalent to that of permanent employee

What is the wife of a henpecked husband called?

Injecting creativity into a cookbook

Can an insurance company drop you after receiving a bill and refusing to pay?

How to say "Brexit" in Latin?

Can a person refuse a presidential pardon?

Citing paywalled articles accessed via illegal web sharing

Advice for a new journal editor

How should I handle players who ignore the session zero agreement?

Can a hotel cancel a confirmed reservation?

How can I get my players to come to the game session after agreeing to a date?

How to deal with an incendiary email that was recalled

How to prevent users from executing commands through browser URL

Why avoid shared user accounts?



Knowing Feature Importance from Sparse Matrix


Deep Learning vs gradient boosting: When to use what?How to determine whether a bad performance is caused by data quality?Feature selection using feature importances in random forests with scikit-learnFeature Importance and Partial Dependence plots seem to disagree?XGBoost Feature importance - Gain and Cover are high but Frequency is lowGridsearch XGBoost for ensemble. Do i include first-level prediction matrix of base learners in train set?XGBoost: Quantifying Feature ImportancesHow to understand when partial dependence plot and feature importance don't agreeReducing Bias when trying to find Feature Importance using a Random ForestFeature importance and probability score out of Decision Trees













0












$begingroup$


I was working with a dataset which had a textual column as well as numerical columns, so I used tfidf for textual column and created a sparse matrix, similarly for the numerical features I created a sparse matrix using scipy.sparse.csr_matrix and combined them with the text sparse features.



Then I'm feeding the algorithm to a gradient boosting model and doing the rest of the training and prediction. However I want to know, is there any way I can plot the feature importance, of this sparse matrix and will be able to know the important feature column names?










share|improve this question









$endgroup$

















    0












    $begingroup$


    I was working with a dataset which had a textual column as well as numerical columns, so I used tfidf for textual column and created a sparse matrix, similarly for the numerical features I created a sparse matrix using scipy.sparse.csr_matrix and combined them with the text sparse features.



    Then I'm feeding the algorithm to a gradient boosting model and doing the rest of the training and prediction. However I want to know, is there any way I can plot the feature importance, of this sparse matrix and will be able to know the important feature column names?










    share|improve this question









    $endgroup$















      0












      0








      0





      $begingroup$


      I was working with a dataset which had a textual column as well as numerical columns, so I used tfidf for textual column and created a sparse matrix, similarly for the numerical features I created a sparse matrix using scipy.sparse.csr_matrix and combined them with the text sparse features.



      Then I'm feeding the algorithm to a gradient boosting model and doing the rest of the training and prediction. However I want to know, is there any way I can plot the feature importance, of this sparse matrix and will be able to know the important feature column names?










      share|improve this question









      $endgroup$




      I was working with a dataset which had a textual column as well as numerical columns, so I used tfidf for textual column and created a sparse matrix, similarly for the numerical features I created a sparse matrix using scipy.sparse.csr_matrix and combined them with the text sparse features.



      Then I'm feeding the algorithm to a gradient boosting model and doing the rest of the training and prediction. However I want to know, is there any way I can plot the feature importance, of this sparse matrix and will be able to know the important feature column names?







      machine-learning python nlp feature-selection






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Jan 20 at 11:48









      Debadri DuttaDebadri Dutta

      288




      288






















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          You would have a map of your features from the TFIDF map.



          column_names_from_text_features = vectorizer.vocabulary_
          rev_dictionary = {v:k for k,v in vectorizer.vocabulary_.items()}
          column_names_from_text_features = [v for k,v in rev_dictionary.items()]


          Since you know the column names of your other features, the entire list of features you pass to XGBoost (after the scipy.hstack) could be



          all_columns = column_names_from_text_features + other columns


          (or depending on the order in which you horizontally stacked)



          Now, once you run the XGBoost Model, you can use the plot_importance function for feature importance. Your code would look something like this:



          from xgboost import XGBClassifier, plot_importance
          fig, ax = plt.subplots(figsize=(15, 8))
          plot_importance(<xgb-classifier>, max_num_features = 15, xlabel='F-score', ylabel='Features', ax=ax)
          plt.show()


          These features would be labeled fxxx, fyyy etc where xxx and yyy are the indices of the features passed to xgboost.



          Using the all_columns constructed in the first part, you could map the features to in indices in the plot encoding.






          share|improve this answer










          New contributor




          srjit is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$













            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "557"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44283%2fknowing-feature-importance-from-sparse-matrix%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0












            $begingroup$

            You would have a map of your features from the TFIDF map.



            column_names_from_text_features = vectorizer.vocabulary_
            rev_dictionary = {v:k for k,v in vectorizer.vocabulary_.items()}
            column_names_from_text_features = [v for k,v in rev_dictionary.items()]


            Since you know the column names of your other features, the entire list of features you pass to XGBoost (after the scipy.hstack) could be



            all_columns = column_names_from_text_features + other columns


            (or depending on the order in which you horizontally stacked)



            Now, once you run the XGBoost Model, you can use the plot_importance function for feature importance. Your code would look something like this:



            from xgboost import XGBClassifier, plot_importance
            fig, ax = plt.subplots(figsize=(15, 8))
            plot_importance(<xgb-classifier>, max_num_features = 15, xlabel='F-score', ylabel='Features', ax=ax)
            plt.show()


            These features would be labeled fxxx, fyyy etc where xxx and yyy are the indices of the features passed to xgboost.



            Using the all_columns constructed in the first part, you could map the features to in indices in the plot encoding.






            share|improve this answer










            New contributor




            srjit is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.






            $endgroup$


















              0












              $begingroup$

              You would have a map of your features from the TFIDF map.



              column_names_from_text_features = vectorizer.vocabulary_
              rev_dictionary = {v:k for k,v in vectorizer.vocabulary_.items()}
              column_names_from_text_features = [v for k,v in rev_dictionary.items()]


              Since you know the column names of your other features, the entire list of features you pass to XGBoost (after the scipy.hstack) could be



              all_columns = column_names_from_text_features + other columns


              (or depending on the order in which you horizontally stacked)



              Now, once you run the XGBoost Model, you can use the plot_importance function for feature importance. Your code would look something like this:



              from xgboost import XGBClassifier, plot_importance
              fig, ax = plt.subplots(figsize=(15, 8))
              plot_importance(<xgb-classifier>, max_num_features = 15, xlabel='F-score', ylabel='Features', ax=ax)
              plt.show()


              These features would be labeled fxxx, fyyy etc where xxx and yyy are the indices of the features passed to xgboost.



              Using the all_columns constructed in the first part, you could map the features to in indices in the plot encoding.






              share|improve this answer










              New contributor




              srjit is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






              $endgroup$
















                0












                0








                0





                $begingroup$

                You would have a map of your features from the TFIDF map.



                column_names_from_text_features = vectorizer.vocabulary_
                rev_dictionary = {v:k for k,v in vectorizer.vocabulary_.items()}
                column_names_from_text_features = [v for k,v in rev_dictionary.items()]


                Since you know the column names of your other features, the entire list of features you pass to XGBoost (after the scipy.hstack) could be



                all_columns = column_names_from_text_features + other columns


                (or depending on the order in which you horizontally stacked)



                Now, once you run the XGBoost Model, you can use the plot_importance function for feature importance. Your code would look something like this:



                from xgboost import XGBClassifier, plot_importance
                fig, ax = plt.subplots(figsize=(15, 8))
                plot_importance(<xgb-classifier>, max_num_features = 15, xlabel='F-score', ylabel='Features', ax=ax)
                plt.show()


                These features would be labeled fxxx, fyyy etc where xxx and yyy are the indices of the features passed to xgboost.



                Using the all_columns constructed in the first part, you could map the features to in indices in the plot encoding.






                share|improve this answer










                New contributor




                srjit is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                $endgroup$



                You would have a map of your features from the TFIDF map.



                column_names_from_text_features = vectorizer.vocabulary_
                rev_dictionary = {v:k for k,v in vectorizer.vocabulary_.items()}
                column_names_from_text_features = [v for k,v in rev_dictionary.items()]


                Since you know the column names of your other features, the entire list of features you pass to XGBoost (after the scipy.hstack) could be



                all_columns = column_names_from_text_features + other columns


                (or depending on the order in which you horizontally stacked)



                Now, once you run the XGBoost Model, you can use the plot_importance function for feature importance. Your code would look something like this:



                from xgboost import XGBClassifier, plot_importance
                fig, ax = plt.subplots(figsize=(15, 8))
                plot_importance(<xgb-classifier>, max_num_features = 15, xlabel='F-score', ylabel='Features', ax=ax)
                plt.show()


                These features would be labeled fxxx, fyyy etc where xxx and yyy are the indices of the features passed to xgboost.



                Using the all_columns constructed in the first part, you could map the features to in indices in the plot encoding.







                share|improve this answer










                New contributor




                srjit is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.









                share|improve this answer



                share|improve this answer








                edited 8 hours ago





















                New contributor




                srjit is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.









                answered 8 hours ago









                srjitsrjit

                1014




                1014




                New contributor




                srjit is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.





                New contributor





                srjit is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                srjit is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44283%2fknowing-feature-importance-from-sparse-matrix%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    is 'sed' thread safeWhat should someone know about using Python scripts in the shell?Nexenta bash script uses...

                    How do i solve the “ No module named 'mlxtend' ” issue on Jupyter?

                    Pilgersdorf Inhaltsverzeichnis Geografie | Geschichte | Bevölkerungsentwicklung | Politik | Kultur...