What is the best to identify the proper hierarchy of this data?Clustering not producing even clustersComplete...

What senses are available to a corpse subjected to a Speak with Dead spell?

Is there a verb that means to inject with poison?

Could a warlock use the One with Shadows warlock invocation to turn invisible, and then move while staying invisible?

Custom shape shows unwanted extra line

Renting a 2CV in France

Prevent Nautilus / Nemo from creating .Trash-1000 folder in mounted devices

A fantasy book with seven white haired women on the cover

Why is it that Bernie Sanders is always called a "socialist"?

What's this assembly doing?

Not a Long-Winded Riddle

Am I correct in stating that the study of topology is purely theoretical?

How do you funnel food off a cutting board?

Non-Cancer terminal illness that can affect young (age 10-13) girls?

Does it take energy to move something in a circle?

Will rerolling initiative each round stop meta-gaming about initiative?

I have trouble understanding this fallacy: "If A, then B. Therefore if not-B, then not-A."

Can we "borrow" our answers to populate our own websites?

How do you get out of your own psychology to write characters?

What makes papers publishable in top-tier journals?

How much mayhem could I cause as a fish?

Why didn't Tom Riddle take the presence of Fawkes and the Sorting Hat as more of a threat?

In harmony: key or the flow?

Is `Object` a function in javascript?

Does an Eldritch Knight's Weapon Bond protect him from losing his weapon to a Telekinesis spell?

What is the best to identify the proper hierarchy of this data?

Clustering not producing even clustersComplete link clusteringI am trying to classify/cluster users profile but don't know how with my attributesNumerical data and different algorithmsgraph database and its clusteringHow to evaluate clusters base on a label?What kind of classification should I use?How to determine x and y in 2 dimensional K-means clustering?Categorical data with order and blanks, is frequent dataset or k-modes a better option?Measure of variety within list/cluster

So I worked on a hierarchical clustering algorithm to be able to determine which items are most similar, and what attributes are most important. I have two tables:

Table 1: contains a bunch of item codes, and it's attribute (brand, flavor, sales, and so on). It looks something like:

Item_code | Brand | Flavor   | Caloric_content | ... | sales

006891313 | Coke  | Original | 0               | ....

002349823 | Fanta | Orange   | 200             | ...

The other table that I have is what i run my clustering algorithm on. It's an NxN matrix, (where N is the number of distinct item_codes in the previous table) An entry [i,j] in the matrix, corresponds to the number of times j was bought after i was purchased on a previous trip. So more clearly, in the matrix below, what the number 1223 means is that, 1223 times after item 003428734 was purchase on an initial trip to the store, item 003428734 was purchased on the next trip.

          |003428734 |009849328 | 09840202 |....

003428734 |  1223    |    13    |    0     |

009849328 |    12    |   945    |    34    | ....

   .

   .

   .

I apply a hierchical clustering on that matrix, using Ward 2, squared euclidean distance and standard Z scores. The final output is a dendrogram, with all the item_codes on the branches of the dendrogram.

This is where the tedious process is. The ultimate goal, is to have a hierarchy for our products, and see which attribute (brand, flavor, size...) falls where on the dendrogram. The only way i can think of doing it, is organizing the item codes in the first in the same order that they are in the dendrogram, put a picture of the dendrogram side by side, and eyeball which attribute clusters the most. You'll see in the link below, it looks like brand A and B are clustering together, so I would say that products cluster by brand first. I would then take a closer look at the smaller cluster within the dendrogram and try to identify where the other attributes fit in the hierarchy (the picture is just a snippet of the dendrogram, it is in reality much, much bigger)

dendrogram

This eye balling process is very tedious and annoying. Is there a way to run a similar cluster, which shows a hierarchy within the products, as well as indicate which attribute fit where in that hierarchy? Or is this even the best way to approach this?

I should mention that I'm very new to these clustering algorithm, so apologies if this is a dumb question

edited 3 hours ago

asked 3 hours ago

Steven Cunden

New contributor

add a comment |

So I worked on a hierarchical clustering algorithm to be able to determine which items are most similar, and what attributes are most important. I have two tables:

Table 1: contains a bunch of item codes, and it's attribute (brand, flavor, sales, and so on). It looks something like:

Item_code | Brand | Flavor   | Caloric_content | ... | sales

006891313 | Coke  | Original | 0               | ....

002349823 | Fanta | Orange   | 200             | ...

          |003428734 |009849328 | 09840202 |....

003428734 |  1223    |    13    |    0     |

009849328 |    12    |   945    |    34    | ....

   .

   .

   .

dendrogram

I should mention that I'm very new to these clustering algorithm, so apologies if this is a dumb question

edited 3 hours ago

asked 3 hours ago

Steven Cunden

New contributor

add a comment |

So I worked on a hierarchical clustering algorithm to be able to determine which items are most similar, and what attributes are most important. I have two tables:

Table 1: contains a bunch of item codes, and it's attribute (brand, flavor, sales, and so on). It looks something like:

Item_code | Brand | Flavor   | Caloric_content | ... | sales

006891313 | Coke  | Original | 0               | ....

002349823 | Fanta | Orange   | 200             | ...

          |003428734 |009849328 | 09840202 |....

003428734 |  1223    |    13    |    0     |

009849328 |    12    |   945    |    34    | ....

   .

   .

   .

dendrogram

I should mention that I'm very new to these clustering algorithm, so apologies if this is a dumb question

edited 3 hours ago

asked 3 hours ago

Steven Cunden

New contributor

So I worked on a hierarchical clustering algorithm to be able to determine which items are most similar, and what attributes are most important. I have two tables:

Table 1: contains a bunch of item codes, and it's attribute (brand, flavor, sales, and so on). It looks something like:

Item_code | Brand | Flavor   | Caloric_content | ... | sales

006891313 | Coke  | Original | 0               | ....

002349823 | Fanta | Orange   | 200             | ...

          |003428734 |009849328 | 09840202 |....

003428734 |  1223    |    13    |    0     |

009849328 |    12    |   945    |    34    | ....

   .

   .

   .

dendrogram

I should mention that I'm very new to these clustering algorithm, so apologies if this is a dumb question

clustering hierarchical-data-format

edited 3 hours ago

asked 3 hours ago

Steven Cunden

New contributor

edited 3 hours ago

asked 3 hours ago

Steven Cunden

New contributor

edited 3 hours ago

asked 3 hours ago

Steven Cunden

New contributor

asked 3 hours ago

Steven Cunden

asked 3 hours ago

Steven Cunden

New contributor

Steven Cunden is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Steven Cunden is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46231%2fwhat-is-the-best-to-identify-the-proper-hierarchy-of-this-data%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

Steven Cunden is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Steven Cunden is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ggthjy