What is the best to identify the proper hierarchy of this data?Clustering not producing even clustersComplete...

What senses are available to a corpse subjected to a Speak with Dead spell?

Is there a verb that means to inject with poison?

Could a warlock use the One with Shadows warlock invocation to turn invisible, and then move while staying invisible?

Custom shape shows unwanted extra line

Renting a 2CV in France

Prevent Nautilus / Nemo from creating .Trash-1000 folder in mounted devices

A fantasy book with seven white haired women on the cover

Why is it that Bernie Sanders is always called a "socialist"?

What's this assembly doing?

Not a Long-Winded Riddle

Am I correct in stating that the study of topology is purely theoretical?

How do you funnel food off a cutting board?

Non-Cancer terminal illness that can affect young (age 10-13) girls?

Does it take energy to move something in a circle?

Will rerolling initiative each round stop meta-gaming about initiative?

I have trouble understanding this fallacy: "If A, then B. Therefore if not-B, then not-A."

Can we "borrow" our answers to populate our own websites?

How do you get out of your own psychology to write characters?

What makes papers publishable in top-tier journals?

How much mayhem could I cause as a fish?

Why didn't Tom Riddle take the presence of Fawkes and the Sorting Hat as more of a threat?

In harmony: key or the flow?

Is `Object` a function in javascript?

Does an Eldritch Knight's Weapon Bond protect him from losing his weapon to a Telekinesis spell?



What is the best to identify the proper hierarchy of this data?


Clustering not producing even clustersComplete link clusteringI am trying to classify/cluster users profile but don't know how with my attributesNumerical data and different algorithmsgraph database and its clusteringHow to evaluate clusters base on a label?What kind of classification should I use?How to determine x and y in 2 dimensional K-means clustering?Categorical data with order and blanks, is frequent dataset or k-modes a better option?Measure of variety within list/cluster













1












$begingroup$


So I worked on a hierarchical clustering algorithm to be able to determine which items are most similar, and what attributes are most important. I have two tables:



Table 1: contains a bunch of item codes, and it's attribute (brand, flavor, sales, and so on). It looks something like:



Item_code | Brand | Flavor   | Caloric_content | ... | sales
006891313 | Coke | Original | 0 | ....
002349823 | Fanta | Orange | 200 | ...


The other table that I have is what i run my clustering algorithm on. It's an NxN matrix, (where N is the number of distinct item_codes in the previous table) An entry [i,j] in the matrix, corresponds to the number of times j was bought after i was purchased on a previous trip. So more clearly, in the matrix below, what the number 1223 means is that, 1223 times after item 003428734 was purchase on an initial trip to the store, item 003428734 was purchased on the next trip.



          |003428734 |009849328 | 09840202 |....
003428734 | 1223 | 13 | 0 |
009849328 | 12 | 945 | 34 | ....
.
.
.


I apply a hierchical clustering on that matrix, using Ward 2, squared euclidean distance and standard Z scores. The final output is a dendrogram, with all the item_codes on the branches of the dendrogram.



This is where the tedious process is. The ultimate goal, is to have a hierarchy for our products, and see which attribute (brand, flavor, size...) falls where on the dendrogram. The only way i can think of doing it, is organizing the item codes in the first in the same order that they are in the dendrogram, put a picture of the dendrogram side by side, and eyeball which attribute clusters the most. You'll see in the link below, it looks like brand A and B are clustering together, so I would say that products cluster by brand first. I would then take a closer look at the smaller cluster within the dendrogram and try to identify where the other attributes fit in the hierarchy (the picture is just a snippet of the dendrogram, it is in reality much, much bigger)



dendrogram



This eye balling process is very tedious and annoying. Is there a way to run a similar cluster, which shows a hierarchy within the products, as well as indicate which attribute fit where in that hierarchy? Or is this even the best way to approach this?



I should mention that I'm very new to these clustering algorithm, so apologies if this is a dumb question










share|improve this question









New contributor




Steven Cunden is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$

















    1












    $begingroup$


    So I worked on a hierarchical clustering algorithm to be able to determine which items are most similar, and what attributes are most important. I have two tables:



    Table 1: contains a bunch of item codes, and it's attribute (brand, flavor, sales, and so on). It looks something like:



    Item_code | Brand | Flavor   | Caloric_content | ... | sales
    006891313 | Coke | Original | 0 | ....
    002349823 | Fanta | Orange | 200 | ...


    The other table that I have is what i run my clustering algorithm on. It's an NxN matrix, (where N is the number of distinct item_codes in the previous table) An entry [i,j] in the matrix, corresponds to the number of times j was bought after i was purchased on a previous trip. So more clearly, in the matrix below, what the number 1223 means is that, 1223 times after item 003428734 was purchase on an initial trip to the store, item 003428734 was purchased on the next trip.



              |003428734 |009849328 | 09840202 |....
    003428734 | 1223 | 13 | 0 |
    009849328 | 12 | 945 | 34 | ....
    .
    .
    .


    I apply a hierchical clustering on that matrix, using Ward 2, squared euclidean distance and standard Z scores. The final output is a dendrogram, with all the item_codes on the branches of the dendrogram.



    This is where the tedious process is. The ultimate goal, is to have a hierarchy for our products, and see which attribute (brand, flavor, size...) falls where on the dendrogram. The only way i can think of doing it, is organizing the item codes in the first in the same order that they are in the dendrogram, put a picture of the dendrogram side by side, and eyeball which attribute clusters the most. You'll see in the link below, it looks like brand A and B are clustering together, so I would say that products cluster by brand first. I would then take a closer look at the smaller cluster within the dendrogram and try to identify where the other attributes fit in the hierarchy (the picture is just a snippet of the dendrogram, it is in reality much, much bigger)



    dendrogram



    This eye balling process is very tedious and annoying. Is there a way to run a similar cluster, which shows a hierarchy within the products, as well as indicate which attribute fit where in that hierarchy? Or is this even the best way to approach this?



    I should mention that I'm very new to these clustering algorithm, so apologies if this is a dumb question










    share|improve this question









    New contributor




    Steven Cunden is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      1












      1








      1





      $begingroup$


      So I worked on a hierarchical clustering algorithm to be able to determine which items are most similar, and what attributes are most important. I have two tables:



      Table 1: contains a bunch of item codes, and it's attribute (brand, flavor, sales, and so on). It looks something like:



      Item_code | Brand | Flavor   | Caloric_content | ... | sales
      006891313 | Coke | Original | 0 | ....
      002349823 | Fanta | Orange | 200 | ...


      The other table that I have is what i run my clustering algorithm on. It's an NxN matrix, (where N is the number of distinct item_codes in the previous table) An entry [i,j] in the matrix, corresponds to the number of times j was bought after i was purchased on a previous trip. So more clearly, in the matrix below, what the number 1223 means is that, 1223 times after item 003428734 was purchase on an initial trip to the store, item 003428734 was purchased on the next trip.



                |003428734 |009849328 | 09840202 |....
      003428734 | 1223 | 13 | 0 |
      009849328 | 12 | 945 | 34 | ....
      .
      .
      .


      I apply a hierchical clustering on that matrix, using Ward 2, squared euclidean distance and standard Z scores. The final output is a dendrogram, with all the item_codes on the branches of the dendrogram.



      This is where the tedious process is. The ultimate goal, is to have a hierarchy for our products, and see which attribute (brand, flavor, size...) falls where on the dendrogram. The only way i can think of doing it, is organizing the item codes in the first in the same order that they are in the dendrogram, put a picture of the dendrogram side by side, and eyeball which attribute clusters the most. You'll see in the link below, it looks like brand A and B are clustering together, so I would say that products cluster by brand first. I would then take a closer look at the smaller cluster within the dendrogram and try to identify where the other attributes fit in the hierarchy (the picture is just a snippet of the dendrogram, it is in reality much, much bigger)



      dendrogram



      This eye balling process is very tedious and annoying. Is there a way to run a similar cluster, which shows a hierarchy within the products, as well as indicate which attribute fit where in that hierarchy? Or is this even the best way to approach this?



      I should mention that I'm very new to these clustering algorithm, so apologies if this is a dumb question










      share|improve this question









      New contributor




      Steven Cunden is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      So I worked on a hierarchical clustering algorithm to be able to determine which items are most similar, and what attributes are most important. I have two tables:



      Table 1: contains a bunch of item codes, and it's attribute (brand, flavor, sales, and so on). It looks something like:



      Item_code | Brand | Flavor   | Caloric_content | ... | sales
      006891313 | Coke | Original | 0 | ....
      002349823 | Fanta | Orange | 200 | ...


      The other table that I have is what i run my clustering algorithm on. It's an NxN matrix, (where N is the number of distinct item_codes in the previous table) An entry [i,j] in the matrix, corresponds to the number of times j was bought after i was purchased on a previous trip. So more clearly, in the matrix below, what the number 1223 means is that, 1223 times after item 003428734 was purchase on an initial trip to the store, item 003428734 was purchased on the next trip.



                |003428734 |009849328 | 09840202 |....
      003428734 | 1223 | 13 | 0 |
      009849328 | 12 | 945 | 34 | ....
      .
      .
      .


      I apply a hierchical clustering on that matrix, using Ward 2, squared euclidean distance and standard Z scores. The final output is a dendrogram, with all the item_codes on the branches of the dendrogram.



      This is where the tedious process is. The ultimate goal, is to have a hierarchy for our products, and see which attribute (brand, flavor, size...) falls where on the dendrogram. The only way i can think of doing it, is organizing the item codes in the first in the same order that they are in the dendrogram, put a picture of the dendrogram side by side, and eyeball which attribute clusters the most. You'll see in the link below, it looks like brand A and B are clustering together, so I would say that products cluster by brand first. I would then take a closer look at the smaller cluster within the dendrogram and try to identify where the other attributes fit in the hierarchy (the picture is just a snippet of the dendrogram, it is in reality much, much bigger)



      dendrogram



      This eye balling process is very tedious and annoying. Is there a way to run a similar cluster, which shows a hierarchy within the products, as well as indicate which attribute fit where in that hierarchy? Or is this even the best way to approach this?



      I should mention that I'm very new to these clustering algorithm, so apologies if this is a dumb question







      clustering hierarchical-data-format






      share|improve this question









      New contributor




      Steven Cunden is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      Steven Cunden is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 3 hours ago







      Steven Cunden













      New contributor




      Steven Cunden is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 3 hours ago









      Steven CundenSteven Cunden

      62




      62




      New contributor




      Steven Cunden is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Steven Cunden is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Steven Cunden is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          0






          active

          oldest

          votes











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          Steven Cunden is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46231%2fwhat-is-the-best-to-identify-the-proper-hierarchy-of-this-data%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          Steven Cunden is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          Steven Cunden is a new contributor. Be nice, and check out our Code of Conduct.













          Steven Cunden is a new contributor. Be nice, and check out our Code of Conduct.












          Steven Cunden is a new contributor. Be nice, and check out our Code of Conduct.
















          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46231%2fwhat-is-the-best-to-identify-the-proper-hierarchy-of-this-data%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          is 'sed' thread safeWhat should someone know about using Python scripts in the shell?Nexenta bash script uses...

          How do i solve the “ No module named 'mlxtend' ” issue on Jupyter?

          Pilgersdorf Inhaltsverzeichnis Geografie | Geschichte | Bevölkerungsentwicklung | Politik | Kultur...