Knowing Feature Importance from Sparse MatrixDeep Learning vs gradient boosting: When to use what?How to...
Difference between `vector<int> v;` and `vector<int> v = vector<int>();`
Pronunciation of umlaut vowels in the history of German
Why did other German political parties disband so fast when Hitler was appointed chancellor?
Does paint affect EMI ability of enclosure?
Table formatting top left corner caption
Avoiding morning and evening handshakes
What are "industrial chops"?
Why publish a research paper when a blog post or a lecture slide can have more citation count than a journal paper?
Using only 1s, make 29 with the minimum number of digits
Do authors have to be politically correct in article-writing?
Intern applicant asking for compensation equivalent to that of permanent employee
What is the wife of a henpecked husband called?
Injecting creativity into a cookbook
Can an insurance company drop you after receiving a bill and refusing to pay?
How to say "Brexit" in Latin?
Can a person refuse a presidential pardon?
Citing paywalled articles accessed via illegal web sharing
Advice for a new journal editor
How should I handle players who ignore the session zero agreement?
Can a hotel cancel a confirmed reservation?
How can I get my players to come to the game session after agreeing to a date?
How to deal with an incendiary email that was recalled
How to prevent users from executing commands through browser URL
Why avoid shared user accounts?
Knowing Feature Importance from Sparse Matrix
Deep Learning vs gradient boosting: When to use what?How to determine whether a bad performance is caused by data quality?Feature selection using feature importances in random forests with scikit-learnFeature Importance and Partial Dependence plots seem to disagree?XGBoost Feature importance - Gain and Cover are high but Frequency is lowGridsearch XGBoost for ensemble. Do i include first-level prediction matrix of base learners in train set?XGBoost: Quantifying Feature ImportancesHow to understand when partial dependence plot and feature importance don't agreeReducing Bias when trying to find Feature Importance using a Random ForestFeature importance and probability score out of Decision Trees
$begingroup$
I was working with a dataset which had a textual column as well as numerical columns, so I used tfidf for textual column and created a sparse matrix, similarly for the numerical features I created a sparse matrix using scipy.sparse.csr_matrix
and combined them with the text sparse features.
Then I'm feeding the algorithm to a gradient boosting model and doing the rest of the training and prediction. However I want to know, is there any way I can plot the feature importance, of this sparse matrix and will be able to know the important feature column names?
machine-learning python nlp feature-selection
$endgroup$
add a comment |
$begingroup$
I was working with a dataset which had a textual column as well as numerical columns, so I used tfidf for textual column and created a sparse matrix, similarly for the numerical features I created a sparse matrix using scipy.sparse.csr_matrix
and combined them with the text sparse features.
Then I'm feeding the algorithm to a gradient boosting model and doing the rest of the training and prediction. However I want to know, is there any way I can plot the feature importance, of this sparse matrix and will be able to know the important feature column names?
machine-learning python nlp feature-selection
$endgroup$
add a comment |
$begingroup$
I was working with a dataset which had a textual column as well as numerical columns, so I used tfidf for textual column and created a sparse matrix, similarly for the numerical features I created a sparse matrix using scipy.sparse.csr_matrix
and combined them with the text sparse features.
Then I'm feeding the algorithm to a gradient boosting model and doing the rest of the training and prediction. However I want to know, is there any way I can plot the feature importance, of this sparse matrix and will be able to know the important feature column names?
machine-learning python nlp feature-selection
$endgroup$
I was working with a dataset which had a textual column as well as numerical columns, so I used tfidf for textual column and created a sparse matrix, similarly for the numerical features I created a sparse matrix using scipy.sparse.csr_matrix
and combined them with the text sparse features.
Then I'm feeding the algorithm to a gradient boosting model and doing the rest of the training and prediction. However I want to know, is there any way I can plot the feature importance, of this sparse matrix and will be able to know the important feature column names?
machine-learning python nlp feature-selection
machine-learning python nlp feature-selection
asked Jan 20 at 11:48
Debadri DuttaDebadri Dutta
288
288
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
You would have a map of your features from the TFIDF map.
column_names_from_text_features = vectorizer.vocabulary_
rev_dictionary = {v:k for k,v in vectorizer.vocabulary_.items()}
column_names_from_text_features = [v for k,v in rev_dictionary.items()]
Since you know the column names of your other features, the entire list of features you pass to XGBoost (after the scipy.hstack
) could be
all_columns = column_names_from_text_features + other columns
(or depending on the order in which you horizontally stacked)
Now, once you run the XGBoost Model, you can use the plot_importance function for feature importance. Your code would look something like this:
from xgboost import XGBClassifier, plot_importance
fig, ax = plt.subplots(figsize=(15, 8))
plot_importance(<xgb-classifier>, max_num_features = 15, xlabel='F-score', ylabel='Features', ax=ax)
plt.show()
These features would be labeled fxxx, fyyy etc where xxx and yyy are the indices of the features passed to xgboost.
Using the all_columns
constructed in the first part, you could map the features to in indices in the plot encoding.
New contributor
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44283%2fknowing-feature-importance-from-sparse-matrix%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You would have a map of your features from the TFIDF map.
column_names_from_text_features = vectorizer.vocabulary_
rev_dictionary = {v:k for k,v in vectorizer.vocabulary_.items()}
column_names_from_text_features = [v for k,v in rev_dictionary.items()]
Since you know the column names of your other features, the entire list of features you pass to XGBoost (after the scipy.hstack
) could be
all_columns = column_names_from_text_features + other columns
(or depending on the order in which you horizontally stacked)
Now, once you run the XGBoost Model, you can use the plot_importance function for feature importance. Your code would look something like this:
from xgboost import XGBClassifier, plot_importance
fig, ax = plt.subplots(figsize=(15, 8))
plot_importance(<xgb-classifier>, max_num_features = 15, xlabel='F-score', ylabel='Features', ax=ax)
plt.show()
These features would be labeled fxxx, fyyy etc where xxx and yyy are the indices of the features passed to xgboost.
Using the all_columns
constructed in the first part, you could map the features to in indices in the plot encoding.
New contributor
$endgroup$
add a comment |
$begingroup$
You would have a map of your features from the TFIDF map.
column_names_from_text_features = vectorizer.vocabulary_
rev_dictionary = {v:k for k,v in vectorizer.vocabulary_.items()}
column_names_from_text_features = [v for k,v in rev_dictionary.items()]
Since you know the column names of your other features, the entire list of features you pass to XGBoost (after the scipy.hstack
) could be
all_columns = column_names_from_text_features + other columns
(or depending on the order in which you horizontally stacked)
Now, once you run the XGBoost Model, you can use the plot_importance function for feature importance. Your code would look something like this:
from xgboost import XGBClassifier, plot_importance
fig, ax = plt.subplots(figsize=(15, 8))
plot_importance(<xgb-classifier>, max_num_features = 15, xlabel='F-score', ylabel='Features', ax=ax)
plt.show()
These features would be labeled fxxx, fyyy etc where xxx and yyy are the indices of the features passed to xgboost.
Using the all_columns
constructed in the first part, you could map the features to in indices in the plot encoding.
New contributor
$endgroup$
add a comment |
$begingroup$
You would have a map of your features from the TFIDF map.
column_names_from_text_features = vectorizer.vocabulary_
rev_dictionary = {v:k for k,v in vectorizer.vocabulary_.items()}
column_names_from_text_features = [v for k,v in rev_dictionary.items()]
Since you know the column names of your other features, the entire list of features you pass to XGBoost (after the scipy.hstack
) could be
all_columns = column_names_from_text_features + other columns
(or depending on the order in which you horizontally stacked)
Now, once you run the XGBoost Model, you can use the plot_importance function for feature importance. Your code would look something like this:
from xgboost import XGBClassifier, plot_importance
fig, ax = plt.subplots(figsize=(15, 8))
plot_importance(<xgb-classifier>, max_num_features = 15, xlabel='F-score', ylabel='Features', ax=ax)
plt.show()
These features would be labeled fxxx, fyyy etc where xxx and yyy are the indices of the features passed to xgboost.
Using the all_columns
constructed in the first part, you could map the features to in indices in the plot encoding.
New contributor
$endgroup$
You would have a map of your features from the TFIDF map.
column_names_from_text_features = vectorizer.vocabulary_
rev_dictionary = {v:k for k,v in vectorizer.vocabulary_.items()}
column_names_from_text_features = [v for k,v in rev_dictionary.items()]
Since you know the column names of your other features, the entire list of features you pass to XGBoost (after the scipy.hstack
) could be
all_columns = column_names_from_text_features + other columns
(or depending on the order in which you horizontally stacked)
Now, once you run the XGBoost Model, you can use the plot_importance function for feature importance. Your code would look something like this:
from xgboost import XGBClassifier, plot_importance
fig, ax = plt.subplots(figsize=(15, 8))
plot_importance(<xgb-classifier>, max_num_features = 15, xlabel='F-score', ylabel='Features', ax=ax)
plt.show()
These features would be labeled fxxx, fyyy etc where xxx and yyy are the indices of the features passed to xgboost.
Using the all_columns
constructed in the first part, you could map the features to in indices in the plot encoding.
New contributor
edited 8 hours ago
New contributor
answered 8 hours ago
srjitsrjit
1014
1014
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44283%2fknowing-feature-importance-from-sparse-matrix%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown