Finding the best “depth” of ICD9 codes with pseudo-hierarchical clusteringClustering with multiple...

Can a player sacrifice a creature after declaring that creature as blocker while taking lethal damage?

Crack the bank account's password!

Does it take energy to move something in a circle?

In harmony: key or the flow?

Renting a 2CV in France

How to play a serial killer in a game with good PCs?

Eww, those bytes are gross

What does an unprocessed RAW file look like?

What's this assembly doing?

Integration of two exponential multiplied by each other

When Are Enum Values Defined?

Translation needed for 130 years old church document

Why are carbons of Inositol chiral centers?

How to not let the Identify spell spoil everything?

Does an Eldritch Knight's Weapon Bond protect him from losing his weapon to a Telekinesis spell?

Critique vs nitpicking

Does Skippy chunky peanut butter contain trans fat?

Equivalent of "illegal" for violating civil law

How do you get out of your own psychology to write characters?

Midterm in Mathematics Courses

Is there a file that always exists and a 'normal' user can't lstat it?

What species should be used for storage of human minds?

Does diversity provide anything that meritocracy does not?

Categorical Unification of Jordan Holder Theorems



Finding the best “depth” of ICD9 codes with pseudo-hierarchical clustering


Clustering with multiple distance measuresComparing Kmedoid, Kmean and hierarchical clustering results?Clustering of devices in locations?What approach other than Tf-Idf could I use for text-clustering using K-Means?Classifying variable types on a list of variablesWhat is the best way to visualize the relationship two categorical variablesDatasets for Weighted graph clustering with detailed ground truthUsing an ontology to recognize named entities in free textWhat best/correct algorithm/procedure to cluster a dataset with a lot 0's?What is the best to identify the proper hierarchy of this data?













0












$begingroup$


Here is a common problem in health care modeling. Did I just invent a new algorithm or has someone already thought of this?



The goal is to find the most homogeneous partition of patients by medical costs using ICD9 codes. There are 13,000 individual codes in the data set, so using the full code results in many only having a few observations.



ICD9 codes are in a nested hierarchical structure. For instance, all infectious diseases are 001-139, one particular disease is Cholera (ICD9 001), and this can have several other suffixes which further specify the illness. This is a "drill down" so to speak. 001.0 and 001.1 are different types of the same disease.



The goal is to find the best level of detail for the data set. For example, say that these codes are binary for simplicity, and are only 3 digits long. Then the only possible codes are 000, 010, 011, 001, 111, 110, 101, 001.



One way of searching through all possible substrings of these would be to use a decision tree. The x matrix would be all substring combinations, and the response would be the average costs for that particular code.



X would have a column for substring 0, 01, 010, 011, etc. Each row of X would have 2^3*3 = 24 columns. Each row would represent a single code.



enter image description here



The top 3 rows of X would look like this. The response Y would be the average dollar amount of healthcare costs for that ICD code.



enter image description here



Once the data is in this format, a decision tree (or other model) could be used in order to determine which substring prefixes should stay in the model. Because different ICD codes have different number of patients, this would be used as a weight vector. The model could be tuned to allow deeper trees (more digits).



Is there a name for this already?
Thanks










share|improve this question







New contributor




Sam Castillo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$

















    0












    $begingroup$


    Here is a common problem in health care modeling. Did I just invent a new algorithm or has someone already thought of this?



    The goal is to find the most homogeneous partition of patients by medical costs using ICD9 codes. There are 13,000 individual codes in the data set, so using the full code results in many only having a few observations.



    ICD9 codes are in a nested hierarchical structure. For instance, all infectious diseases are 001-139, one particular disease is Cholera (ICD9 001), and this can have several other suffixes which further specify the illness. This is a "drill down" so to speak. 001.0 and 001.1 are different types of the same disease.



    The goal is to find the best level of detail for the data set. For example, say that these codes are binary for simplicity, and are only 3 digits long. Then the only possible codes are 000, 010, 011, 001, 111, 110, 101, 001.



    One way of searching through all possible substrings of these would be to use a decision tree. The x matrix would be all substring combinations, and the response would be the average costs for that particular code.



    X would have a column for substring 0, 01, 010, 011, etc. Each row of X would have 2^3*3 = 24 columns. Each row would represent a single code.



    enter image description here



    The top 3 rows of X would look like this. The response Y would be the average dollar amount of healthcare costs for that ICD code.



    enter image description here



    Once the data is in this format, a decision tree (or other model) could be used in order to determine which substring prefixes should stay in the model. Because different ICD codes have different number of patients, this would be used as a weight vector. The model could be tuned to allow deeper trees (more digits).



    Is there a name for this already?
    Thanks










    share|improve this question







    New contributor




    Sam Castillo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      0












      0








      0





      $begingroup$


      Here is a common problem in health care modeling. Did I just invent a new algorithm or has someone already thought of this?



      The goal is to find the most homogeneous partition of patients by medical costs using ICD9 codes. There are 13,000 individual codes in the data set, so using the full code results in many only having a few observations.



      ICD9 codes are in a nested hierarchical structure. For instance, all infectious diseases are 001-139, one particular disease is Cholera (ICD9 001), and this can have several other suffixes which further specify the illness. This is a "drill down" so to speak. 001.0 and 001.1 are different types of the same disease.



      The goal is to find the best level of detail for the data set. For example, say that these codes are binary for simplicity, and are only 3 digits long. Then the only possible codes are 000, 010, 011, 001, 111, 110, 101, 001.



      One way of searching through all possible substrings of these would be to use a decision tree. The x matrix would be all substring combinations, and the response would be the average costs for that particular code.



      X would have a column for substring 0, 01, 010, 011, etc. Each row of X would have 2^3*3 = 24 columns. Each row would represent a single code.



      enter image description here



      The top 3 rows of X would look like this. The response Y would be the average dollar amount of healthcare costs for that ICD code.



      enter image description here



      Once the data is in this format, a decision tree (or other model) could be used in order to determine which substring prefixes should stay in the model. Because different ICD codes have different number of patients, this would be used as a weight vector. The model could be tuned to allow deeper trees (more digits).



      Is there a name for this already?
      Thanks










      share|improve this question







      New contributor




      Sam Castillo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      Here is a common problem in health care modeling. Did I just invent a new algorithm or has someone already thought of this?



      The goal is to find the most homogeneous partition of patients by medical costs using ICD9 codes. There are 13,000 individual codes in the data set, so using the full code results in many only having a few observations.



      ICD9 codes are in a nested hierarchical structure. For instance, all infectious diseases are 001-139, one particular disease is Cholera (ICD9 001), and this can have several other suffixes which further specify the illness. This is a "drill down" so to speak. 001.0 and 001.1 are different types of the same disease.



      The goal is to find the best level of detail for the data set. For example, say that these codes are binary for simplicity, and are only 3 digits long. Then the only possible codes are 000, 010, 011, 001, 111, 110, 101, 001.



      One way of searching through all possible substrings of these would be to use a decision tree. The x matrix would be all substring combinations, and the response would be the average costs for that particular code.



      X would have a column for substring 0, 01, 010, 011, etc. Each row of X would have 2^3*3 = 24 columns. Each row would represent a single code.



      enter image description here



      The top 3 rows of X would look like this. The response Y would be the average dollar amount of healthcare costs for that ICD code.



      enter image description here



      Once the data is in this format, a decision tree (or other model) could be used in order to determine which substring prefixes should stay in the model. Because different ICD codes have different number of patients, this would be used as a weight vector. The model could be tuned to allow deeper trees (more digits).



      Is there a name for this already?
      Thanks







      clustering embeddings hierarchical-data-format






      share|improve this question







      New contributor




      Sam Castillo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      Sam Castillo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      Sam Castillo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 13 mins ago









      Sam CastilloSam Castillo

      1




      1




      New contributor




      Sam Castillo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Sam Castillo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Sam Castillo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          0






          active

          oldest

          votes











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          Sam Castillo is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46239%2ffinding-the-best-depth-of-icd9-codes-with-pseudo-hierarchical-clustering%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          Sam Castillo is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          Sam Castillo is a new contributor. Be nice, and check out our Code of Conduct.













          Sam Castillo is a new contributor. Be nice, and check out our Code of Conduct.












          Sam Castillo is a new contributor. Be nice, and check out our Code of Conduct.
















          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46239%2ffinding-the-best-depth-of-icd9-codes-with-pseudo-hierarchical-clustering%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          is 'sed' thread safeWhat should someone know about using Python scripts in the shell?Nexenta bash script uses...

          How do i solve the “ No module named 'mlxtend' ” issue on Jupyter?

          Pilgersdorf Inhaltsverzeichnis Geografie | Geschichte | Bevölkerungsentwicklung | Politik | Kultur...