What is the best to identify the proper hierarchy of this data?Clustering not producing even clustersComplete...
What senses are available to a corpse subjected to a Speak with Dead spell?
Is there a verb that means to inject with poison?
Could a warlock use the One with Shadows warlock invocation to turn invisible, and then move while staying invisible?
Custom shape shows unwanted extra line
Renting a 2CV in France
Prevent Nautilus / Nemo from creating .Trash-1000 folder in mounted devices
A fantasy book with seven white haired women on the cover
Why is it that Bernie Sanders is always called a "socialist"?
What's this assembly doing?
Not a Long-Winded Riddle
Am I correct in stating that the study of topology is purely theoretical?
How do you funnel food off a cutting board?
Non-Cancer terminal illness that can affect young (age 10-13) girls?
Does it take energy to move something in a circle?
Will rerolling initiative each round stop meta-gaming about initiative?
I have trouble understanding this fallacy: "If A, then B. Therefore if not-B, then not-A."
Can we "borrow" our answers to populate our own websites?
How do you get out of your own psychology to write characters?
What makes papers publishable in top-tier journals?
How much mayhem could I cause as a fish?
Why didn't Tom Riddle take the presence of Fawkes and the Sorting Hat as more of a threat?
In harmony: key or the flow?
Is `Object` a function in javascript?
Does an Eldritch Knight's Weapon Bond protect him from losing his weapon to a Telekinesis spell?
What is the best to identify the proper hierarchy of this data?
Clustering not producing even clustersComplete link clusteringI am trying to classify/cluster users profile but don't know how with my attributesNumerical data and different algorithmsgraph database and its clusteringHow to evaluate clusters base on a label?What kind of classification should I use?How to determine x and y in 2 dimensional K-means clustering?Categorical data with order and blanks, is frequent dataset or k-modes a better option?Measure of variety within list/cluster
$begingroup$
So I worked on a hierarchical clustering algorithm to be able to determine which items are most similar, and what attributes are most important. I have two tables:
Table 1: contains a bunch of item codes, and it's attribute (brand, flavor, sales, and so on). It looks something like:
Item_code | Brand | Flavor | Caloric_content | ... | sales
006891313 | Coke | Original | 0 | ....
002349823 | Fanta | Orange | 200 | ...
The other table that I have is what i run my clustering algorithm on. It's an NxN matrix, (where N is the number of distinct item_codes in the previous table) An entry [i,j] in the matrix, corresponds to the number of times j was bought after i was purchased on a previous trip. So more clearly, in the matrix below, what the number 1223 means is that, 1223 times after item 003428734
was purchase on an initial trip to the store, item 003428734
was purchased on the next trip.
|003428734 |009849328 | 09840202 |....
003428734 | 1223 | 13 | 0 |
009849328 | 12 | 945 | 34 | ....
.
.
.
I apply a hierchical clustering on that matrix, using Ward 2, squared euclidean distance and standard Z scores. The final output is a dendrogram, with all the item_codes on the branches of the dendrogram.
This is where the tedious process is. The ultimate goal, is to have a hierarchy for our products, and see which attribute (brand, flavor, size...) falls where on the dendrogram. The only way i can think of doing it, is organizing the item codes in the first in the same order that they are in the dendrogram, put a picture of the dendrogram side by side, and eyeball which attribute clusters the most. You'll see in the link below, it looks like brand A and B are clustering together, so I would say that products cluster by brand first. I would then take a closer look at the smaller cluster within the dendrogram and try to identify where the other attributes fit in the hierarchy (the picture is just a snippet of the dendrogram, it is in reality much, much bigger)
This eye balling process is very tedious and annoying. Is there a way to run a similar cluster, which shows a hierarchy within the products, as well as indicate which attribute fit where in that hierarchy? Or is this even the best way to approach this?
I should mention that I'm very new to these clustering algorithm, so apologies if this is a dumb question
clustering hierarchical-data-format
New contributor
$endgroup$
add a comment |
$begingroup$
So I worked on a hierarchical clustering algorithm to be able to determine which items are most similar, and what attributes are most important. I have two tables:
Table 1: contains a bunch of item codes, and it's attribute (brand, flavor, sales, and so on). It looks something like:
Item_code | Brand | Flavor | Caloric_content | ... | sales
006891313 | Coke | Original | 0 | ....
002349823 | Fanta | Orange | 200 | ...
The other table that I have is what i run my clustering algorithm on. It's an NxN matrix, (where N is the number of distinct item_codes in the previous table) An entry [i,j] in the matrix, corresponds to the number of times j was bought after i was purchased on a previous trip. So more clearly, in the matrix below, what the number 1223 means is that, 1223 times after item 003428734
was purchase on an initial trip to the store, item 003428734
was purchased on the next trip.
|003428734 |009849328 | 09840202 |....
003428734 | 1223 | 13 | 0 |
009849328 | 12 | 945 | 34 | ....
.
.
.
I apply a hierchical clustering on that matrix, using Ward 2, squared euclidean distance and standard Z scores. The final output is a dendrogram, with all the item_codes on the branches of the dendrogram.
This is where the tedious process is. The ultimate goal, is to have a hierarchy for our products, and see which attribute (brand, flavor, size...) falls where on the dendrogram. The only way i can think of doing it, is organizing the item codes in the first in the same order that they are in the dendrogram, put a picture of the dendrogram side by side, and eyeball which attribute clusters the most. You'll see in the link below, it looks like brand A and B are clustering together, so I would say that products cluster by brand first. I would then take a closer look at the smaller cluster within the dendrogram and try to identify where the other attributes fit in the hierarchy (the picture is just a snippet of the dendrogram, it is in reality much, much bigger)
This eye balling process is very tedious and annoying. Is there a way to run a similar cluster, which shows a hierarchy within the products, as well as indicate which attribute fit where in that hierarchy? Or is this even the best way to approach this?
I should mention that I'm very new to these clustering algorithm, so apologies if this is a dumb question
clustering hierarchical-data-format
New contributor
$endgroup$
add a comment |
$begingroup$
So I worked on a hierarchical clustering algorithm to be able to determine which items are most similar, and what attributes are most important. I have two tables:
Table 1: contains a bunch of item codes, and it's attribute (brand, flavor, sales, and so on). It looks something like:
Item_code | Brand | Flavor | Caloric_content | ... | sales
006891313 | Coke | Original | 0 | ....
002349823 | Fanta | Orange | 200 | ...
The other table that I have is what i run my clustering algorithm on. It's an NxN matrix, (where N is the number of distinct item_codes in the previous table) An entry [i,j] in the matrix, corresponds to the number of times j was bought after i was purchased on a previous trip. So more clearly, in the matrix below, what the number 1223 means is that, 1223 times after item 003428734
was purchase on an initial trip to the store, item 003428734
was purchased on the next trip.
|003428734 |009849328 | 09840202 |....
003428734 | 1223 | 13 | 0 |
009849328 | 12 | 945 | 34 | ....
.
.
.
I apply a hierchical clustering on that matrix, using Ward 2, squared euclidean distance and standard Z scores. The final output is a dendrogram, with all the item_codes on the branches of the dendrogram.
This is where the tedious process is. The ultimate goal, is to have a hierarchy for our products, and see which attribute (brand, flavor, size...) falls where on the dendrogram. The only way i can think of doing it, is organizing the item codes in the first in the same order that they are in the dendrogram, put a picture of the dendrogram side by side, and eyeball which attribute clusters the most. You'll see in the link below, it looks like brand A and B are clustering together, so I would say that products cluster by brand first. I would then take a closer look at the smaller cluster within the dendrogram and try to identify where the other attributes fit in the hierarchy (the picture is just a snippet of the dendrogram, it is in reality much, much bigger)
This eye balling process is very tedious and annoying. Is there a way to run a similar cluster, which shows a hierarchy within the products, as well as indicate which attribute fit where in that hierarchy? Or is this even the best way to approach this?
I should mention that I'm very new to these clustering algorithm, so apologies if this is a dumb question
clustering hierarchical-data-format
New contributor
$endgroup$
So I worked on a hierarchical clustering algorithm to be able to determine which items are most similar, and what attributes are most important. I have two tables:
Table 1: contains a bunch of item codes, and it's attribute (brand, flavor, sales, and so on). It looks something like:
Item_code | Brand | Flavor | Caloric_content | ... | sales
006891313 | Coke | Original | 0 | ....
002349823 | Fanta | Orange | 200 | ...
The other table that I have is what i run my clustering algorithm on. It's an NxN matrix, (where N is the number of distinct item_codes in the previous table) An entry [i,j] in the matrix, corresponds to the number of times j was bought after i was purchased on a previous trip. So more clearly, in the matrix below, what the number 1223 means is that, 1223 times after item 003428734
was purchase on an initial trip to the store, item 003428734
was purchased on the next trip.
|003428734 |009849328 | 09840202 |....
003428734 | 1223 | 13 | 0 |
009849328 | 12 | 945 | 34 | ....
.
.
.
I apply a hierchical clustering on that matrix, using Ward 2, squared euclidean distance and standard Z scores. The final output is a dendrogram, with all the item_codes on the branches of the dendrogram.
This is where the tedious process is. The ultimate goal, is to have a hierarchy for our products, and see which attribute (brand, flavor, size...) falls where on the dendrogram. The only way i can think of doing it, is organizing the item codes in the first in the same order that they are in the dendrogram, put a picture of the dendrogram side by side, and eyeball which attribute clusters the most. You'll see in the link below, it looks like brand A and B are clustering together, so I would say that products cluster by brand first. I would then take a closer look at the smaller cluster within the dendrogram and try to identify where the other attributes fit in the hierarchy (the picture is just a snippet of the dendrogram, it is in reality much, much bigger)
This eye balling process is very tedious and annoying. Is there a way to run a similar cluster, which shows a hierarchy within the products, as well as indicate which attribute fit where in that hierarchy? Or is this even the best way to approach this?
I should mention that I'm very new to these clustering algorithm, so apologies if this is a dumb question
clustering hierarchical-data-format
clustering hierarchical-data-format
New contributor
New contributor
edited 3 hours ago
Steven Cunden
New contributor
asked 3 hours ago
Steven CundenSteven Cunden
62
62
New contributor
New contributor
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Steven Cunden is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46231%2fwhat-is-the-best-to-identify-the-proper-hierarchy-of-this-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Steven Cunden is a new contributor. Be nice, and check out our Code of Conduct.
Steven Cunden is a new contributor. Be nice, and check out our Code of Conduct.
Steven Cunden is a new contributor. Be nice, and check out our Code of Conduct.
Steven Cunden is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46231%2fwhat-is-the-best-to-identify-the-proper-hierarchy-of-this-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown