Finding the best “depth” of ICD9 codes with pseudo-hierarchical clusteringClustering with multiple...
Can a player sacrifice a creature after declaring that creature as blocker while taking lethal damage?
Crack the bank account's password!
Does it take energy to move something in a circle?
In harmony: key or the flow?
Renting a 2CV in France
How to play a serial killer in a game with good PCs?
Eww, those bytes are gross
What does an unprocessed RAW file look like?
What's this assembly doing?
Integration of two exponential multiplied by each other
When Are Enum Values Defined?
Translation needed for 130 years old church document
Why are carbons of Inositol chiral centers?
How to not let the Identify spell spoil everything?
Does an Eldritch Knight's Weapon Bond protect him from losing his weapon to a Telekinesis spell?
Critique vs nitpicking
Does Skippy chunky peanut butter contain trans fat?
Equivalent of "illegal" for violating civil law
How do you get out of your own psychology to write characters?
Midterm in Mathematics Courses
Is there a file that always exists and a 'normal' user can't lstat it?
What species should be used for storage of human minds?
Does diversity provide anything that meritocracy does not?
Categorical Unification of Jordan Holder Theorems
Finding the best “depth” of ICD9 codes with pseudo-hierarchical clustering
Clustering with multiple distance measuresComparing Kmedoid, Kmean and hierarchical clustering results?Clustering of devices in locations?What approach other than Tf-Idf could I use for text-clustering using K-Means?Classifying variable types on a list of variablesWhat is the best way to visualize the relationship two categorical variablesDatasets for Weighted graph clustering with detailed ground truthUsing an ontology to recognize named entities in free textWhat best/correct algorithm/procedure to cluster a dataset with a lot 0's?What is the best to identify the proper hierarchy of this data?
$begingroup$
Here is a common problem in health care modeling. Did I just invent a new algorithm or has someone already thought of this?
The goal is to find the most homogeneous partition of patients by medical costs using ICD9 codes. There are 13,000 individual codes in the data set, so using the full code results in many only having a few observations.
ICD9 codes are in a nested hierarchical structure. For instance, all infectious diseases are 001-139, one particular disease is Cholera (ICD9 001), and this can have several other suffixes which further specify the illness. This is a "drill down" so to speak. 001.0 and 001.1 are different types of the same disease.
The goal is to find the best level of detail for the data set. For example, say that these codes are binary for simplicity, and are only 3 digits long. Then the only possible codes are 000, 010, 011, 001, 111, 110, 101, 001.
One way of searching through all possible substrings of these would be to use a decision tree. The x matrix would be all substring combinations, and the response would be the average costs for that particular code.
X would have a column for substring 0, 01, 010, 011, etc. Each row of X would have 2^3*3 = 24 columns. Each row would represent a single code.
The top 3 rows of X would look like this. The response Y would be the average dollar amount of healthcare costs for that ICD code.
Once the data is in this format, a decision tree (or other model) could be used in order to determine which substring prefixes should stay in the model. Because different ICD codes have different number of patients, this would be used as a weight vector. The model could be tuned to allow deeper trees (more digits).
Is there a name for this already?
Thanks
clustering embeddings hierarchical-data-format
New contributor
$endgroup$
add a comment |
$begingroup$
Here is a common problem in health care modeling. Did I just invent a new algorithm or has someone already thought of this?
The goal is to find the most homogeneous partition of patients by medical costs using ICD9 codes. There are 13,000 individual codes in the data set, so using the full code results in many only having a few observations.
ICD9 codes are in a nested hierarchical structure. For instance, all infectious diseases are 001-139, one particular disease is Cholera (ICD9 001), and this can have several other suffixes which further specify the illness. This is a "drill down" so to speak. 001.0 and 001.1 are different types of the same disease.
The goal is to find the best level of detail for the data set. For example, say that these codes are binary for simplicity, and are only 3 digits long. Then the only possible codes are 000, 010, 011, 001, 111, 110, 101, 001.
One way of searching through all possible substrings of these would be to use a decision tree. The x matrix would be all substring combinations, and the response would be the average costs for that particular code.
X would have a column for substring 0, 01, 010, 011, etc. Each row of X would have 2^3*3 = 24 columns. Each row would represent a single code.
The top 3 rows of X would look like this. The response Y would be the average dollar amount of healthcare costs for that ICD code.
Once the data is in this format, a decision tree (or other model) could be used in order to determine which substring prefixes should stay in the model. Because different ICD codes have different number of patients, this would be used as a weight vector. The model could be tuned to allow deeper trees (more digits).
Is there a name for this already?
Thanks
clustering embeddings hierarchical-data-format
New contributor
$endgroup$
add a comment |
$begingroup$
Here is a common problem in health care modeling. Did I just invent a new algorithm or has someone already thought of this?
The goal is to find the most homogeneous partition of patients by medical costs using ICD9 codes. There are 13,000 individual codes in the data set, so using the full code results in many only having a few observations.
ICD9 codes are in a nested hierarchical structure. For instance, all infectious diseases are 001-139, one particular disease is Cholera (ICD9 001), and this can have several other suffixes which further specify the illness. This is a "drill down" so to speak. 001.0 and 001.1 are different types of the same disease.
The goal is to find the best level of detail for the data set. For example, say that these codes are binary for simplicity, and are only 3 digits long. Then the only possible codes are 000, 010, 011, 001, 111, 110, 101, 001.
One way of searching through all possible substrings of these would be to use a decision tree. The x matrix would be all substring combinations, and the response would be the average costs for that particular code.
X would have a column for substring 0, 01, 010, 011, etc. Each row of X would have 2^3*3 = 24 columns. Each row would represent a single code.
The top 3 rows of X would look like this. The response Y would be the average dollar amount of healthcare costs for that ICD code.
Once the data is in this format, a decision tree (or other model) could be used in order to determine which substring prefixes should stay in the model. Because different ICD codes have different number of patients, this would be used as a weight vector. The model could be tuned to allow deeper trees (more digits).
Is there a name for this already?
Thanks
clustering embeddings hierarchical-data-format
New contributor
$endgroup$
Here is a common problem in health care modeling. Did I just invent a new algorithm or has someone already thought of this?
The goal is to find the most homogeneous partition of patients by medical costs using ICD9 codes. There are 13,000 individual codes in the data set, so using the full code results in many only having a few observations.
ICD9 codes are in a nested hierarchical structure. For instance, all infectious diseases are 001-139, one particular disease is Cholera (ICD9 001), and this can have several other suffixes which further specify the illness. This is a "drill down" so to speak. 001.0 and 001.1 are different types of the same disease.
The goal is to find the best level of detail for the data set. For example, say that these codes are binary for simplicity, and are only 3 digits long. Then the only possible codes are 000, 010, 011, 001, 111, 110, 101, 001.
One way of searching through all possible substrings of these would be to use a decision tree. The x matrix would be all substring combinations, and the response would be the average costs for that particular code.
X would have a column for substring 0, 01, 010, 011, etc. Each row of X would have 2^3*3 = 24 columns. Each row would represent a single code.
The top 3 rows of X would look like this. The response Y would be the average dollar amount of healthcare costs for that ICD code.
Once the data is in this format, a decision tree (or other model) could be used in order to determine which substring prefixes should stay in the model. Because different ICD codes have different number of patients, this would be used as a weight vector. The model could be tuned to allow deeper trees (more digits).
Is there a name for this already?
Thanks
clustering embeddings hierarchical-data-format
clustering embeddings hierarchical-data-format
New contributor
New contributor
New contributor
asked 13 mins ago
Sam CastilloSam Castillo
1
1
New contributor
New contributor
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sam Castillo is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46239%2ffinding-the-best-depth-of-icd9-codes-with-pseudo-hierarchical-clustering%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sam Castillo is a new contributor. Be nice, and check out our Code of Conduct.
Sam Castillo is a new contributor. Be nice, and check out our Code of Conduct.
Sam Castillo is a new contributor. Be nice, and check out our Code of Conduct.
Sam Castillo is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46239%2ffinding-the-best-depth-of-icd9-codes-with-pseudo-hierarchical-clustering%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown