Knowing Feature Importance from Sparse MatrixDeep Learning vs gradient boosting: When to use what?How to...

Difference between `vector<int> v;` and `vector<int> v = vector<int>();`

Pronunciation of umlaut vowels in the history of German

Why did other German political parties disband so fast when Hitler was appointed chancellor?

Does paint affect EMI ability of enclosure?

Table formatting top left corner caption

Avoiding morning and evening handshakes

What are "industrial chops"?

Why publish a research paper when a blog post or a lecture slide can have more citation count than a journal paper?

Using only 1s, make 29 with the minimum number of digits

Do authors have to be politically correct in article-writing?

Intern applicant asking for compensation equivalent to that of permanent employee

What is the wife of a henpecked husband called?

Injecting creativity into a cookbook

Can an insurance company drop you after receiving a bill and refusing to pay?

How to say "Brexit" in Latin?

Can a person refuse a presidential pardon?

Citing paywalled articles accessed via illegal web sharing

Advice for a new journal editor

How should I handle players who ignore the session zero agreement?

Can a hotel cancel a confirmed reservation?

How can I get my players to come to the game session after agreeing to a date?

How to deal with an incendiary email that was recalled

How to prevent users from executing commands through browser URL

Why avoid shared user accounts?

Knowing Feature Importance from Sparse Matrix

Deep Learning vs gradient boosting: When to use what?How to determine whether a bad performance is caused by data quality?Feature selection using feature importances in random forests with scikit-learnFeature Importance and Partial Dependence plots seem to disagree?XGBoost Feature importance - Gain and Cover are high but Frequency is lowGridsearch XGBoost for ensemble. Do i include first-level prediction matrix of base learners in train set?XGBoost: Quantifying Feature ImportancesHow to understand when partial dependence plot and feature importance don't agreeReducing Bias when trying to find Feature Importance using a Random ForestFeature importance and probability score out of Decision Trees

I was working with a dataset which had a textual column as well as numerical columns, so I used tfidf for textual column and created a sparse matrix, similarly for the numerical features I created a sparse matrix using scipy.sparse.csr_matrix and combined them with the text sparse features.

Then I'm feeding the algorithm to a gradient boosting model and doing the rest of the training and prediction. However I want to know, is there any way I can plot the feature importance, of this sparse matrix and will be able to know the important feature column names?

asked Jan 20 at 11:48

Debadri Dutta

288

add a comment |

asked Jan 20 at 11:48

Debadri Dutta

288

add a comment |

asked Jan 20 at 11:48

Debadri Dutta

288

machine-learning python nlp feature-selection

asked Jan 20 at 11:48

Debadri Dutta

288

asked Jan 20 at 11:48

Debadri Dutta

288

asked Jan 20 at 11:48

Debadri Dutta

288

asked Jan 20 at 11:48

Debadri Dutta

288

asked Jan 20 at 11:48

Debadri Dutta

288

add a comment |

1 Answer
1

active

oldest

votes

You would have a map of your features from the TFIDF map.

column_names_from_text_features = vectorizer.vocabulary_

rev_dictionary = {v:k for k,v in vectorizer.vocabulary_.items()}

column_names_from_text_features = [v for k,v in rev_dictionary.items()]

Since you know the column names of your other features, the entire list of features you pass to XGBoost (after the scipy.hstack) could be

all_columns = column_names_from_text_features + other columns

(or depending on the order in which you horizontally stacked)

Now, once you run the XGBoost Model, you can use the plot_importance function for feature importance. Your code would look something like this:

from xgboost import XGBClassifier, plot_importance

fig, ax = plt.subplots(figsize=(15, 8))

plot_importance(<xgb-classifier>, max_num_features = 15, xlabel='F-score', ylabel='Features', ax=ax)

plt.show()

These features would be labeled fxxx, fyyy etc where xxx and yyy are the indices of the features passed to xgboost.

Using the all_columns constructed in the first part, you could map the features to in indices in the plot encoding.

edited 8 hours ago

answered 8 hours ago

srjit

1014

New contributor

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44283%2fknowing-feature-importance-from-sparse-matrix%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You would have a map of your features from the TFIDF map.

column_names_from_text_features = vectorizer.vocabulary_

rev_dictionary = {v:k for k,v in vectorizer.vocabulary_.items()}

column_names_from_text_features = [v for k,v in rev_dictionary.items()]

Since you know the column names of your other features, the entire list of features you pass to XGBoost (after the scipy.hstack) could be

all_columns = column_names_from_text_features + other columns

(or depending on the order in which you horizontally stacked)

Now, once you run the XGBoost Model, you can use the plot_importance function for feature importance. Your code would look something like this:

from xgboost import XGBClassifier, plot_importance

fig, ax = plt.subplots(figsize=(15, 8))

plot_importance(<xgb-classifier>, max_num_features = 15, xlabel='F-score', ylabel='Features', ax=ax)

plt.show()

These features would be labeled fxxx, fyyy etc where xxx and yyy are the indices of the features passed to xgboost.

Using the all_columns constructed in the first part, you could map the features to in indices in the plot encoding.

edited 8 hours ago

answered 8 hours ago

srjit

1014

New contributor

add a comment |

You would have a map of your features from the TFIDF map.

column_names_from_text_features = vectorizer.vocabulary_

rev_dictionary = {v:k for k,v in vectorizer.vocabulary_.items()}

column_names_from_text_features = [v for k,v in rev_dictionary.items()]

Since you know the column names of your other features, the entire list of features you pass to XGBoost (after the scipy.hstack) could be

all_columns = column_names_from_text_features + other columns

(or depending on the order in which you horizontally stacked)

Now, once you run the XGBoost Model, you can use the plot_importance function for feature importance. Your code would look something like this:

from xgboost import XGBClassifier, plot_importance

fig, ax = plt.subplots(figsize=(15, 8))

plot_importance(<xgb-classifier>, max_num_features = 15, xlabel='F-score', ylabel='Features', ax=ax)

plt.show()

These features would be labeled fxxx, fyyy etc where xxx and yyy are the indices of the features passed to xgboost.

Using the all_columns constructed in the first part, you could map the features to in indices in the plot encoding.

edited 8 hours ago

answered 8 hours ago

srjit

1014

New contributor

add a comment |

You would have a map of your features from the TFIDF map.

column_names_from_text_features = vectorizer.vocabulary_

rev_dictionary = {v:k for k,v in vectorizer.vocabulary_.items()}

column_names_from_text_features = [v for k,v in rev_dictionary.items()]

Since you know the column names of your other features, the entire list of features you pass to XGBoost (after the scipy.hstack) could be

all_columns = column_names_from_text_features + other columns

(or depending on the order in which you horizontally stacked)

Now, once you run the XGBoost Model, you can use the plot_importance function for feature importance. Your code would look something like this:

from xgboost import XGBClassifier, plot_importance

fig, ax = plt.subplots(figsize=(15, 8))

plot_importance(<xgb-classifier>, max_num_features = 15, xlabel='F-score', ylabel='Features', ax=ax)

plt.show()

These features would be labeled fxxx, fyyy etc where xxx and yyy are the indices of the features passed to xgboost.

Using the all_columns constructed in the first part, you could map the features to in indices in the plot encoding.

edited 8 hours ago

answered 8 hours ago

srjit

1014

New contributor

You would have a map of your features from the TFIDF map.

column_names_from_text_features = vectorizer.vocabulary_

rev_dictionary = {v:k for k,v in vectorizer.vocabulary_.items()}

column_names_from_text_features = [v for k,v in rev_dictionary.items()]

Since you know the column names of your other features, the entire list of features you pass to XGBoost (after the scipy.hstack) could be

all_columns = column_names_from_text_features + other columns

(or depending on the order in which you horizontally stacked)

Now, once you run the XGBoost Model, you can use the plot_importance function for feature importance. Your code would look something like this:

from xgboost import XGBClassifier, plot_importance

fig, ax = plt.subplots(figsize=(15, 8))

plot_importance(<xgb-classifier>, max_num_features = 15, xlabel='F-score', ylabel='Features', ax=ax)

plt.show()

These features would be labeled fxxx, fyyy etc where xxx and yyy are the indices of the features passed to xgboost.

Using the all_columns constructed in the first part, you could map the features to in indices in the plot encoding.

edited 8 hours ago

answered 8 hours ago

srjit

1014

New contributor

edited 8 hours ago

answered 8 hours ago

srjit

1014

New contributor

answered 8 hours ago

srjit

1014

answered 8 hours ago

srjit

1014

New contributor

srjit is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ggthjy