How to subset dataframe based on a “not equal to” criteria applied to a large number of columns?How to...

"You are your self first supporter", a more proper way to say it

Is it legal for company to use my work email to pretend I still work there?

Why was the small council so happy for Tyrion to become the Master of Coin?

Method of fabrication patents, Is it okay to import from abroad?

Smoothness of finite-dimensional functional calculus

Have astronauts in space suits ever taken selfies? If so, how?

Is it tax fraud for an individual to declare non-taxable revenue as taxable income? (US tax laws)

Font hinting is lost in Chrome-like browsers (for some languages )

A newer friend of my brother's gave him a load of baseball cards that are supposedly extremely valuable. Is this a scam?

Why Is Death Allowed In the Matrix?

Why don't electron-positron collisions release infinite energy?

Can an x86 CPU running in real mode be considered to be basically an 8086 CPU?

Equivalence principle before Einstein

What defenses are there against being summoned by the Gate spell?

To string or not to string

In Japanese, what’s the difference between “Tonari ni” (となりに) and “Tsugi” (つぎ)? When would you use one over the other?

I'm planning on buying a laser printer but concerned about the life cycle of toner in the machine

Is it possible to do 50 km distance without any previous training?

When a company launches a new product do they "come out" with a new product or do they "come up" with a new product?

Assigning pointers to atomic type to pointers to non atomic type

Writing rule stating superpower from different root cause is bad writing

Can a Warlock become Neutral Good?

Can I ask the recruiters in my resume to put the reason why I am rejected?

Hiring someone is unethical to Kantians because you're treating them as a means?

How to subset dataframe based on a “not equal to” criteria applied to a large number of columns?

How to sort a dataframe by multiple column(s)Extract a subset of a dataframe based on a condition involving a fieldHow to change the order of DataFrame columns?How to apply a function to two columns of Pandas dataframeHow to drop rows of Pandas DataFrame whose value in certain columns is NaNSelect rows from a DataFrame based on values in a column in pandasHow to convert index of a pandas dataframe into a column?How to count the NaN values in a column in pandas DataFramesubset a dataframe based on sum of a columnSubset dataframe based on number of observations in each column

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I'm new to R and currently trying to subset my data according to my predefined exclusion criteria for analysis. I'm presently trying to remove all cases that have dementia, as coded by the ICD-10. Problem is that there are multiple variables containing information on each individual's disease status (~70 variables), although as they are coded in the same way, the same condition can be applied to all of them.

Some simulated data:

#Create dataframe containing simulated data

df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),

                    disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),

                    disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),

                    disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'))



#data is structured as below:



     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

3  1003           G560            G20             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

6  1006           F011           A049           A481

7  1007           F023             NA             NA

8  1008           C761             NA             NA

9  1009           H653           G300             NA

10 1010           A049           G308             NA

11 1011           J679           A045           D352

Here, I'm trying to remove any case that has a 'dementia code' across any of the "disease_code" variables.

#Remove cases with dementia from dataframe (e.g. F023, G20)

Newdata_df <- subset(df, (2:4 != "F023"|"G20"|"F009"|"F002"|"F001"|"F000"|"F00"|    

                    "G309"| "G308"|"G301"|"G300"|"G30"| "F01"|"F018"|"F013"|

                    "F012"| "F011"| "F010"|"F01"))

The error that I recieve is:

Error in 2:4 != "F023" | "G20" : 

  operations are possible only for numeric, logical or complex types

Ideally, the subsetted dataframe would look like this:

     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

8  1008           C761             NA             NA

11 1011           J679           A045           D352

I know that there is an error in my code although I'm not sure how exactly to fix it. I've tried a few other ways (using dplyr) although haven't had any luck so far.

Any help is greatly appreciated!

edited Mar 29 at 12:55

Sotos

31.2k51741

asked Mar 29 at 12:40

M_Oxford

593

1

You should reshape your data to long format. That will make your life (and analysis) much easier.

– docendo discimus
Mar 29 at 12:51

add a comment |

Some simulated data:

#Create dataframe containing simulated data

df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),

                    disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),

                    disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),

                    disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'))



#data is structured as below:



     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

3  1003           G560            G20             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

6  1006           F011           A049           A481

7  1007           F023             NA             NA

8  1008           C761             NA             NA

9  1009           H653           G300             NA

10 1010           A049           G308             NA

11 1011           J679           A045           D352

Here, I'm trying to remove any case that has a 'dementia code' across any of the "disease_code" variables.

#Remove cases with dementia from dataframe (e.g. F023, G20)

Newdata_df <- subset(df, (2:4 != "F023"|"G20"|"F009"|"F002"|"F001"|"F000"|"F00"|    

                    "G309"| "G308"|"G301"|"G300"|"G30"| "F01"|"F018"|"F013"|

                    "F012"| "F011"| "F010"|"F01"))

The error that I recieve is:

Error in 2:4 != "F023" | "G20" : 

  operations are possible only for numeric, logical or complex types

Ideally, the subsetted dataframe would look like this:

     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

8  1008           C761             NA             NA

11 1011           J679           A045           D352

I know that there is an error in my code although I'm not sure how exactly to fix it. I've tried a few other ways (using dplyr) although haven't had any luck so far.

Any help is greatly appreciated!

edited Mar 29 at 12:55

Sotos

31.2k51741

asked Mar 29 at 12:40

M_Oxford

593

1

You should reshape your data to long format. That will make your life (and analysis) much easier.

– docendo discimus
Mar 29 at 12:51

add a comment |

Some simulated data:

#Create dataframe containing simulated data

df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),

                    disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),

                    disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),

                    disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'))



#data is structured as below:



     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

3  1003           G560            G20             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

6  1006           F011           A049           A481

7  1007           F023             NA             NA

8  1008           C761             NA             NA

9  1009           H653           G300             NA

10 1010           A049           G308             NA

11 1011           J679           A045           D352

Here, I'm trying to remove any case that has a 'dementia code' across any of the "disease_code" variables.

#Remove cases with dementia from dataframe (e.g. F023, G20)

Newdata_df <- subset(df, (2:4 != "F023"|"G20"|"F009"|"F002"|"F001"|"F000"|"F00"|    

                    "G309"| "G308"|"G301"|"G300"|"G30"| "F01"|"F018"|"F013"|

                    "F012"| "F011"| "F010"|"F01"))

The error that I recieve is:

Error in 2:4 != "F023" | "G20" : 

  operations are possible only for numeric, logical or complex types

Ideally, the subsetted dataframe would look like this:

     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

8  1008           C761             NA             NA

11 1011           J679           A045           D352

I know that there is an error in my code although I'm not sure how exactly to fix it. I've tried a few other ways (using dplyr) although haven't had any luck so far.

Any help is greatly appreciated!

edited Mar 29 at 12:55

Sotos

31.2k51741

asked Mar 29 at 12:40

M_Oxford

593

Some simulated data:

#Create dataframe containing simulated data

df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),

                    disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),

                    disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),

                    disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'))



#data is structured as below:



     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

3  1003           G560            G20             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

6  1006           F011           A049           A481

7  1007           F023             NA             NA

8  1008           C761             NA             NA

9  1009           H653           G300             NA

10 1010           A049           G308             NA

11 1011           J679           A045           D352

Here, I'm trying to remove any case that has a 'dementia code' across any of the "disease_code" variables.

#Remove cases with dementia from dataframe (e.g. F023, G20)

Newdata_df <- subset(df, (2:4 != "F023"|"G20"|"F009"|"F002"|"F001"|"F000"|"F00"|    

                    "G309"| "G308"|"G301"|"G300"|"G30"| "F01"|"F018"|"F013"|

                    "F012"| "F011"| "F010"|"F01"))

The error that I recieve is:

Error in 2:4 != "F023" | "G20" : 

  operations are possible only for numeric, logical or complex types

Ideally, the subsetted dataframe would look like this:

     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

8  1008           C761             NA             NA

11 1011           J679           A045           D352

I know that there is an error in my code although I'm not sure how exactly to fix it. I've tried a few other ways (using dplyr) although haven't had any luck so far.

Any help is greatly appreciated!

r dataframe filter subset

edited Mar 29 at 12:55

Sotos

31.2k51741

asked Mar 29 at 12:40

M_Oxford

593

edited Mar 29 at 12:55

Sotos

31.2k51741

asked Mar 29 at 12:40

M_Oxford

593

edited Mar 29 at 12:55

Sotos

31.2k51741

edited Mar 29 at 12:55

Sotos

31.2k51741

edited Mar 29 at 12:55

Sotos

31.2k51741

asked Mar 29 at 12:40

M_Oxford

593

asked Mar 29 at 12:40

M_Oxford

593

asked Mar 29 at 12:40

M_Oxford

593

1

You should reshape your data to long format. That will make your life (and analysis) much easier.

– docendo discimus
Mar 29 at 12:51

add a comment |

1

You should reshape your data to long format. That will make your life (and analysis) much easier.

– docendo discimus
Mar 29 at 12:51

You should reshape your data to long format. That will make your life (and analysis) much easier.

– docendo discimus
Mar 29 at 12:51

add a comment |

6 Answers
6

active

oldest

votes

One dplyr possibility could be:

df %>%

 filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",    

            "G309", "G308","G301","G300","G30", "F01","F018","F013",

            "F012", "F011", "F010","F01")))



    ID disease_code_1 disease_code_2 disease_code_3

1 1001           I802           A071           H250

2 1002           H356             NA             NA

3 1004           D235             NA           I802

4 1005           B178             NA             NA

5 1008           C761             NA             NA

6 1011           J679           A045           D352

In this case, it checks whether any of the columns 2:4 contains any of the given codes.

Or:

df %>%

 filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",    

            "G309", "G308","G301","G300","G30", "F01","F018","F013",

            "F012", "F011", "F010","F01")))

In this case, it checks whether any of the columns with names disease_code contains any of the given codes.

edited Mar 29 at 13:29

answered Mar 29 at 12:52

tmfmnk

3,6661516

1

Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!

– M_Oxford
Mar 29 at 14:22

add a comment |

We can create a vector with the codes to be removed and use rowSums to remove, i.e.

codes_to_remove <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",

                "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")



df[rowSums(sapply(df[-1], `%in%`, codes_to_remove)) == 0,]

which gives,

     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

8  1008           C761             NA             NA

11 1011           J679           A045           D352

answered Mar 29 at 12:52

Sotos

31.2k51741

add a comment |

As mentioned in comments by @docendo discimus we can convert the dataframe to long format using gather, group_by ID and select only those IDs which do not have dementia_code in them and then spread them back to wide format.

library(tidyverse)



df %>%

   gather(key, value, -ID) %>%

   group_by(ID) %>%

   filter(!any(value %in% dementia_code)) %>%

   spread(key, value)



#   ID disease_code_1 disease_code_2 disease_code_3

#  <dbl> <chr>          <chr>          <chr>         

#1  1001 I802           A071           H250          

#2  1002 H356           NA             NA            

#3  1004 D235           NA             I802          

#4  1005 B178           NA             NA            

#5  1008 C761           NA             NA            

#6  1011 J679           A045           D352

data

dementia_code <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", 

"G308","G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

edited Mar 29 at 13:18

answered Mar 29 at 13:09

Ronak Shah

45.3k104266

Why load all of tidyverse? Isn't this just tidyr and dplyr?

– Dunois
Mar 29 at 13:25

1

@Dunois yes, it is. I have a habit of loading it all up by default :P

– Ronak Shah
Mar 29 at 13:30

3

We could also do it using an anti_join such as Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")

– Kerry Jackson
Mar 29 at 13:39

add a comment |

How about this:

> dementia <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",

+               "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

> 

> dementia <- apply(sapply(df[, -1], function(x) {x %in% dementia}), 1, any)

> 

> df[!dementia,]

     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

8  1008           C761             NA             NA

11 1011           J679           A045           D352

>

Edit:

An even more elegant solution, thanks to @ Ronan Shah:

> df[apply(df[-1], 1, function(x) {!any(x %in% dementia)}),]

     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

8  1008           C761             NA             NA

11 1011           J679           A045           D352

Hope it helps.

edited Mar 29 at 13:37

answered Mar 29 at 12:52

Santiago Capobianco

491310

@ Ronan Shah Nice! Its a more elegant solution. You should post it.

– Santiago Capobianco
Mar 29 at 13:19

1

Yes! Sorry, I will change it right away.

– Santiago Capobianco
Mar 29 at 13:36

add a comment |

We can use melt/dcast from data.table

library(data.table)

dcast(melt(setDT(df), id.var = 'ID')[,

     if(!any(value %in% dementia_codes)) .SD, .(ID)], ID ~ variable)

#    ID disease_code_1 disease_code_2 disease_code_3

#1: 1001           I802           A071           H250

#2: 1002           H356             NA             NA

#3: 1004           D235             NA           I802

#4: 1005           B178             NA             NA

#5: 1008           C761             NA             NA

#6: 1011           J679           A045           D352

Or this can be done more compactly in base R with no reshaping

df[!Reduce(`|`, lapply(df[-1], `%in%` , dementia_codes)),]

 #   ID disease_code_1 disease_code_2 disease_code_3

#1  1001           I802           A071           H250

#2  1002           H356             NA             NA

#4  1004           D235             NA           I802

#5  1005           B178             NA             NA

#8  1008           C761             NA             NA

#11 1011           J679           A045           D352

data

dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", 

  "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", 

   "F012", "F011", "F010", "F01")

edited Mar 29 at 14:41

answered Mar 29 at 14:24

akrun

419k13207284

add a comment |

A for loop version with base R, in case you prefer that.

df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),

                disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),

                disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),

                disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'), stringsAsFactors = FALSE)



dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")



new_df <- df[0,]



for(i in 1:nrow(df)){

  currRow <- df[i,]

  if(any(dementia_codes %in% as.character(currRow)) == FALSE){

    new_df <- rbind(new_df, currRow)

  }

}



new_df

#      ID disease_code_1 disease_code_2 disease_code_3

# 1  1001           I802           A071           H250

# 2  1002           H356             NA             NA

# 4  1004           D235             NA           I802

# 5  1005           B178             NA             NA

# 8  1008           C761             NA             NA

# 11 1011           J679           A045           D352

edited Mar 29 at 13:23

answered Mar 29 at 13:10

Dunois

858

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55417645%2fhow-to-subset-dataframe-based-on-a-not-equal-to-criteria-applied-to-a-large-nu%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

6 Answers
6

active

oldest

votes

6 Answers
6

active

oldest

votes

One dplyr possibility could be:

df %>%

 filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",    

            "G309", "G308","G301","G300","G30", "F01","F018","F013",

            "F012", "F011", "F010","F01")))



    ID disease_code_1 disease_code_2 disease_code_3

1 1001           I802           A071           H250

2 1002           H356             NA             NA

3 1004           D235             NA           I802

4 1005           B178             NA             NA

5 1008           C761             NA             NA

6 1011           J679           A045           D352

In this case, it checks whether any of the columns 2:4 contains any of the given codes.

Or:

df %>%

 filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",    

            "G309", "G308","G301","G300","G30", "F01","F018","F013",

            "F012", "F011", "F010","F01")))

In this case, it checks whether any of the columns with names disease_code contains any of the given codes.

edited Mar 29 at 13:29

answered Mar 29 at 12:52

tmfmnk

3,6661516

1

Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!

– M_Oxford
Mar 29 at 14:22

add a comment |

One dplyr possibility could be:

df %>%

 filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",    

            "G309", "G308","G301","G300","G30", "F01","F018","F013",

            "F012", "F011", "F010","F01")))



    ID disease_code_1 disease_code_2 disease_code_3

1 1001           I802           A071           H250

2 1002           H356             NA             NA

3 1004           D235             NA           I802

4 1005           B178             NA             NA

5 1008           C761             NA             NA

6 1011           J679           A045           D352

In this case, it checks whether any of the columns 2:4 contains any of the given codes.

Or:

df %>%

 filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",    

            "G309", "G308","G301","G300","G30", "F01","F018","F013",

            "F012", "F011", "F010","F01")))

In this case, it checks whether any of the columns with names disease_code contains any of the given codes.

edited Mar 29 at 13:29

answered Mar 29 at 12:52

tmfmnk

3,6661516

1

Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!

– M_Oxford
Mar 29 at 14:22

add a comment |

One dplyr possibility could be:

df %>%

 filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",    

            "G309", "G308","G301","G300","G30", "F01","F018","F013",

            "F012", "F011", "F010","F01")))



    ID disease_code_1 disease_code_2 disease_code_3

1 1001           I802           A071           H250

2 1002           H356             NA             NA

3 1004           D235             NA           I802

4 1005           B178             NA             NA

5 1008           C761             NA             NA

6 1011           J679           A045           D352

In this case, it checks whether any of the columns 2:4 contains any of the given codes.

Or:

df %>%

 filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",    

            "G309", "G308","G301","G300","G30", "F01","F018","F013",

            "F012", "F011", "F010","F01")))

In this case, it checks whether any of the columns with names disease_code contains any of the given codes.

edited Mar 29 at 13:29

answered Mar 29 at 12:52

tmfmnk

3,6661516

One dplyr possibility could be:

df %>%

 filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",    

            "G309", "G308","G301","G300","G30", "F01","F018","F013",

            "F012", "F011", "F010","F01")))



    ID disease_code_1 disease_code_2 disease_code_3

1 1001           I802           A071           H250

2 1002           H356             NA             NA

3 1004           D235             NA           I802

4 1005           B178             NA             NA

5 1008           C761             NA             NA

6 1011           J679           A045           D352

In this case, it checks whether any of the columns 2:4 contains any of the given codes.

Or:

df %>%

 filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",    

            "G309", "G308","G301","G300","G30", "F01","F018","F013",

            "F012", "F011", "F010","F01")))

In this case, it checks whether any of the columns with names disease_code contains any of the given codes.

edited Mar 29 at 13:29

answered Mar 29 at 12:52

tmfmnk

3,6661516

edited Mar 29 at 13:29

answered Mar 29 at 12:52

tmfmnk

3,6661516

answered Mar 29 at 12:52

tmfmnk

3,6661516

answered Mar 29 at 12:52

tmfmnk

3,6661516

1

Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!

– M_Oxford
Mar 29 at 14:22

add a comment |

1

Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!

– M_Oxford
Mar 29 at 14:22

Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!

– M_Oxford
Mar 29 at 14:22

add a comment |

We can create a vector with the codes to be removed and use rowSums to remove, i.e.

codes_to_remove <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",

                "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")



df[rowSums(sapply(df[-1], `%in%`, codes_to_remove)) == 0,]

which gives,

     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

8  1008           C761             NA             NA

11 1011           J679           A045           D352

answered Mar 29 at 12:52

Sotos

31.2k51741

add a comment |

We can create a vector with the codes to be removed and use rowSums to remove, i.e.

codes_to_remove <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",

                "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")



df[rowSums(sapply(df[-1], `%in%`, codes_to_remove)) == 0,]

which gives,

     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

8  1008           C761             NA             NA

11 1011           J679           A045           D352

answered Mar 29 at 12:52

Sotos

31.2k51741

add a comment |

We can create a vector with the codes to be removed and use rowSums to remove, i.e.

codes_to_remove <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",

                "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")



df[rowSums(sapply(df[-1], `%in%`, codes_to_remove)) == 0,]

which gives,

     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

8  1008           C761             NA             NA

11 1011           J679           A045           D352

answered Mar 29 at 12:52

Sotos

31.2k51741

We can create a vector with the codes to be removed and use rowSums to remove, i.e.

codes_to_remove <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",

                "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")



df[rowSums(sapply(df[-1], `%in%`, codes_to_remove)) == 0,]

which gives,

     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

8  1008           C761             NA             NA

11 1011           J679           A045           D352

answered Mar 29 at 12:52

Sotos

31.2k51741

answered Mar 29 at 12:52

Sotos

31.2k51741

answered Mar 29 at 12:52

Sotos

31.2k51741

answered Mar 29 at 12:52

Sotos

31.2k51741

add a comment |

library(tidyverse)



df %>%

   gather(key, value, -ID) %>%

   group_by(ID) %>%

   filter(!any(value %in% dementia_code)) %>%

   spread(key, value)



#   ID disease_code_1 disease_code_2 disease_code_3

#  <dbl> <chr>          <chr>          <chr>         

#1  1001 I802           A071           H250          

#2  1002 H356           NA             NA            

#3  1004 D235           NA             I802          

#4  1005 B178           NA             NA            

#5  1008 C761           NA             NA            

#6  1011 J679           A045           D352

data

dementia_code <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", 

"G308","G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

edited Mar 29 at 13:18

answered Mar 29 at 13:09

Ronak Shah

45.3k104266

Why load all of tidyverse? Isn't this just tidyr and dplyr?

– Dunois
Mar 29 at 13:25

1

@Dunois yes, it is. I have a habit of loading it all up by default :P

– Ronak Shah
Mar 29 at 13:30

3

We could also do it using an anti_join such as Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")

– Kerry Jackson
Mar 29 at 13:39

add a comment |

library(tidyverse)



df %>%

   gather(key, value, -ID) %>%

   group_by(ID) %>%

   filter(!any(value %in% dementia_code)) %>%

   spread(key, value)



#   ID disease_code_1 disease_code_2 disease_code_3

#  <dbl> <chr>          <chr>          <chr>         

#1  1001 I802           A071           H250          

#2  1002 H356           NA             NA            

#3  1004 D235           NA             I802          

#4  1005 B178           NA             NA            

#5  1008 C761           NA             NA            

#6  1011 J679           A045           D352

data

dementia_code <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", 

"G308","G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

edited Mar 29 at 13:18

answered Mar 29 at 13:09

Ronak Shah

45.3k104266

Why load all of tidyverse? Isn't this just tidyr and dplyr?

– Dunois
Mar 29 at 13:25

1

@Dunois yes, it is. I have a habit of loading it all up by default :P

– Ronak Shah
Mar 29 at 13:30

3

We could also do it using an anti_join such as Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")

– Kerry Jackson
Mar 29 at 13:39

add a comment |

library(tidyverse)



df %>%

   gather(key, value, -ID) %>%

   group_by(ID) %>%

   filter(!any(value %in% dementia_code)) %>%

   spread(key, value)



#   ID disease_code_1 disease_code_2 disease_code_3

#  <dbl> <chr>          <chr>          <chr>         

#1  1001 I802           A071           H250          

#2  1002 H356           NA             NA            

#3  1004 D235           NA             I802          

#4  1005 B178           NA             NA            

#5  1008 C761           NA             NA            

#6  1011 J679           A045           D352

data

dementia_code <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", 

"G308","G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

edited Mar 29 at 13:18

answered Mar 29 at 13:09

Ronak Shah

45.3k104266

library(tidyverse)



df %>%

   gather(key, value, -ID) %>%

   group_by(ID) %>%

   filter(!any(value %in% dementia_code)) %>%

   spread(key, value)



#   ID disease_code_1 disease_code_2 disease_code_3

#  <dbl> <chr>          <chr>          <chr>         

#1  1001 I802           A071           H250          

#2  1002 H356           NA             NA            

#3  1004 D235           NA             I802          

#4  1005 B178           NA             NA            

#5  1008 C761           NA             NA            

#6  1011 J679           A045           D352

data

dementia_code <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", 

"G308","G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

edited Mar 29 at 13:18

answered Mar 29 at 13:09

Ronak Shah

45.3k104266

edited Mar 29 at 13:18

answered Mar 29 at 13:09

Ronak Shah

45.3k104266

answered Mar 29 at 13:09

Ronak Shah

45.3k104266

answered Mar 29 at 13:09

Ronak Shah

45.3k104266

Why load all of tidyverse? Isn't this just tidyr and dplyr?

– Dunois
Mar 29 at 13:25

1

@Dunois yes, it is. I have a habit of loading it all up by default :P

– Ronak Shah
Mar 29 at 13:30

3

We could also do it using an anti_join such as Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")

– Kerry Jackson
Mar 29 at 13:39

add a comment |

Why load all of tidyverse? Isn't this just tidyr and dplyr?

– Dunois
Mar 29 at 13:25

1

@Dunois yes, it is. I have a habit of loading it all up by default :P

– Ronak Shah
Mar 29 at 13:30

3

We could also do it using an anti_join such as Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")

– Kerry Jackson
Mar 29 at 13:39

Why load all of tidyverse? Isn't this just tidyr and dplyr?

– Dunois
Mar 29 at 13:25

@Dunois yes, it is. I have a habit of loading it all up by default :P

– Ronak Shah
Mar 29 at 13:30

We could also do it using an anti_join such as

Newdata_df <-    df %>%    anti_join(df %>%                gather(DiseaseCodeNumber, CodeValue, -ID) %>%                filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00",                                           "G309", "G308","G301","G300","G30","F01","F018","F013",                                       "F012", "F011","F010","F01")), by = "ID")

– Kerry Jackson
Mar 29 at 13:39

We could also do it using an anti_join such as

Newdata_df <-    df %>%    anti_join(df %>%                gather(DiseaseCodeNumber, CodeValue, -ID) %>%                filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00",                                           "G309", "G308","G301","G300","G30","F01","F018","F013",                                       "F012", "F011","F010","F01")), by = "ID")

– Kerry Jackson
Mar 29 at 13:39

add a comment |

How about this:

> dementia <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",

+               "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

> 

> dementia <- apply(sapply(df[, -1], function(x) {x %in% dementia}), 1, any)

> 

> df[!dementia,]

     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

8  1008           C761             NA             NA

11 1011           J679           A045           D352

>

Edit:

An even more elegant solution, thanks to @ Ronan Shah:

> df[apply(df[-1], 1, function(x) {!any(x %in% dementia)}),]

     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

8  1008           C761             NA             NA

11 1011           J679           A045           D352

Hope it helps.

edited Mar 29 at 13:37

answered Mar 29 at 12:52

Santiago Capobianco

491310

@ Ronan Shah Nice! Its a more elegant solution. You should post it.

– Santiago Capobianco
Mar 29 at 13:19

1

Yes! Sorry, I will change it right away.

– Santiago Capobianco
Mar 29 at 13:36

add a comment |

How about this:

> dementia <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",

+               "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

> 

> dementia <- apply(sapply(df[, -1], function(x) {x %in% dementia}), 1, any)

> 

> df[!dementia,]

     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

8  1008           C761             NA             NA

11 1011           J679           A045           D352

>

Edit:

An even more elegant solution, thanks to @ Ronan Shah:

> df[apply(df[-1], 1, function(x) {!any(x %in% dementia)}),]

     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

8  1008           C761             NA             NA

11 1011           J679           A045           D352

Hope it helps.

edited Mar 29 at 13:37

answered Mar 29 at 12:52

Santiago Capobianco

491310

@ Ronan Shah Nice! Its a more elegant solution. You should post it.

– Santiago Capobianco
Mar 29 at 13:19

1

Yes! Sorry, I will change it right away.

– Santiago Capobianco
Mar 29 at 13:36

add a comment |

How about this:

> dementia <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",

+               "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

> 

> dementia <- apply(sapply(df[, -1], function(x) {x %in% dementia}), 1, any)

> 

> df[!dementia,]

     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

8  1008           C761             NA             NA

11 1011           J679           A045           D352

>

Edit:

An even more elegant solution, thanks to @ Ronan Shah:

> df[apply(df[-1], 1, function(x) {!any(x %in% dementia)}),]

     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

8  1008           C761             NA             NA

11 1011           J679           A045           D352

Hope it helps.

edited Mar 29 at 13:37

answered Mar 29 at 12:52

Santiago Capobianco

491310

How about this:

> dementia <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",

+               "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

> 

> dementia <- apply(sapply(df[, -1], function(x) {x %in% dementia}), 1, any)

> 

> df[!dementia,]

     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

8  1008           C761             NA             NA

11 1011           J679           A045           D352

>

Edit:

An even more elegant solution, thanks to @ Ronan Shah:

> df[apply(df[-1], 1, function(x) {!any(x %in% dementia)}),]

     ID disease_code_1 disease_code_2 disease_code_3

1  1001           I802           A071           H250

2  1002           H356             NA             NA

4  1004           D235             NA           I802

5  1005           B178             NA             NA

8  1008           C761             NA             NA

11 1011           J679           A045           D352

Hope it helps.

edited Mar 29 at 13:37

answered Mar 29 at 12:52

Santiago Capobianco

491310

edited Mar 29 at 13:37

answered Mar 29 at 12:52

Santiago Capobianco

491310

answered Mar 29 at 12:52

Santiago Capobianco

491310

answered Mar 29 at 12:52

Santiago Capobianco

491310

@ Ronan Shah Nice! Its a more elegant solution. You should post it.

– Santiago Capobianco
Mar 29 at 13:19

1

Yes! Sorry, I will change it right away.

– Santiago Capobianco
Mar 29 at 13:36

add a comment |

@ Ronan Shah Nice! Its a more elegant solution. You should post it.

– Santiago Capobianco
Mar 29 at 13:19

1

Yes! Sorry, I will change it right away.

– Santiago Capobianco
Mar 29 at 13:36

@ Ronan Shah Nice! Its a more elegant solution. You should post it.

– Santiago Capobianco
Mar 29 at 13:19

Yes! Sorry, I will change it right away.

– Santiago Capobianco
Mar 29 at 13:36

add a comment |

We can use melt/dcast from data.table

library(data.table)

dcast(melt(setDT(df), id.var = 'ID')[,

     if(!any(value %in% dementia_codes)) .SD, .(ID)], ID ~ variable)

#    ID disease_code_1 disease_code_2 disease_code_3

#1: 1001           I802           A071           H250

#2: 1002           H356             NA             NA

#3: 1004           D235             NA           I802

#4: 1005           B178             NA             NA

#5: 1008           C761             NA             NA

#6: 1011           J679           A045           D352

Or this can be done more compactly in base R with no reshaping

df[!Reduce(`|`, lapply(df[-1], `%in%` , dementia_codes)),]

 #   ID disease_code_1 disease_code_2 disease_code_3

#1  1001           I802           A071           H250

#2  1002           H356             NA             NA

#4  1004           D235             NA           I802

#5  1005           B178             NA             NA

#8  1008           C761             NA             NA

#11 1011           J679           A045           D352

data

dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", 

  "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", 

   "F012", "F011", "F010", "F01")

edited Mar 29 at 14:41

answered Mar 29 at 14:24

akrun

419k13207284

add a comment |

We can use melt/dcast from data.table

library(data.table)

dcast(melt(setDT(df), id.var = 'ID')[,

     if(!any(value %in% dementia_codes)) .SD, .(ID)], ID ~ variable)

#    ID disease_code_1 disease_code_2 disease_code_3

#1: 1001           I802           A071           H250

#2: 1002           H356             NA             NA

#3: 1004           D235             NA           I802

#4: 1005           B178             NA             NA

#5: 1008           C761             NA             NA

#6: 1011           J679           A045           D352

Or this can be done more compactly in base R with no reshaping

df[!Reduce(`|`, lapply(df[-1], `%in%` , dementia_codes)),]

 #   ID disease_code_1 disease_code_2 disease_code_3

#1  1001           I802           A071           H250

#2  1002           H356             NA             NA

#4  1004           D235             NA           I802

#5  1005           B178             NA             NA

#8  1008           C761             NA             NA

#11 1011           J679           A045           D352

data

dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", 

  "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", 

   "F012", "F011", "F010", "F01")

edited Mar 29 at 14:41

answered Mar 29 at 14:24

akrun

419k13207284

add a comment |

We can use melt/dcast from data.table

library(data.table)

dcast(melt(setDT(df), id.var = 'ID')[,

     if(!any(value %in% dementia_codes)) .SD, .(ID)], ID ~ variable)

#    ID disease_code_1 disease_code_2 disease_code_3

#1: 1001           I802           A071           H250

#2: 1002           H356             NA             NA

#3: 1004           D235             NA           I802

#4: 1005           B178             NA             NA

#5: 1008           C761             NA             NA

#6: 1011           J679           A045           D352

Or this can be done more compactly in base R with no reshaping

df[!Reduce(`|`, lapply(df[-1], `%in%` , dementia_codes)),]

 #   ID disease_code_1 disease_code_2 disease_code_3

#1  1001           I802           A071           H250

#2  1002           H356             NA             NA

#4  1004           D235             NA           I802

#5  1005           B178             NA             NA

#8  1008           C761             NA             NA

#11 1011           J679           A045           D352

data

dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", 

  "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", 

   "F012", "F011", "F010", "F01")

edited Mar 29 at 14:41

answered Mar 29 at 14:24

akrun

419k13207284

We can use melt/dcast from data.table

library(data.table)

dcast(melt(setDT(df), id.var = 'ID')[,

     if(!any(value %in% dementia_codes)) .SD, .(ID)], ID ~ variable)

#    ID disease_code_1 disease_code_2 disease_code_3

#1: 1001           I802           A071           H250

#2: 1002           H356             NA             NA

#3: 1004           D235             NA           I802

#4: 1005           B178             NA             NA

#5: 1008           C761             NA             NA

#6: 1011           J679           A045           D352

Or this can be done more compactly in base R with no reshaping

df[!Reduce(`|`, lapply(df[-1], `%in%` , dementia_codes)),]

 #   ID disease_code_1 disease_code_2 disease_code_3

#1  1001           I802           A071           H250

#2  1002           H356             NA             NA

#4  1004           D235             NA           I802

#5  1005           B178             NA             NA

#8  1008           C761             NA             NA

#11 1011           J679           A045           D352

data

dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", 

  "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", 

   "F012", "F011", "F010", "F01")

edited Mar 29 at 14:41

answered Mar 29 at 14:24

akrun

419k13207284

edited Mar 29 at 14:41

answered Mar 29 at 14:24

akrun

419k13207284

answered Mar 29 at 14:24

akrun

419k13207284

answered Mar 29 at 14:24

akrun

419k13207284

add a comment |

A for loop version with base R, in case you prefer that.

df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),

                disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),

                disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),

                disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'), stringsAsFactors = FALSE)



dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")



new_df <- df[0,]



for(i in 1:nrow(df)){

  currRow <- df[i,]

  if(any(dementia_codes %in% as.character(currRow)) == FALSE){

    new_df <- rbind(new_df, currRow)

  }

}



new_df

#      ID disease_code_1 disease_code_2 disease_code_3

# 1  1001           I802           A071           H250

# 2  1002           H356             NA             NA

# 4  1004           D235             NA           I802

# 5  1005           B178             NA             NA

# 8  1008           C761             NA             NA

# 11 1011           J679           A045           D352

edited Mar 29 at 13:23

answered Mar 29 at 13:10

Dunois

858

add a comment |

A for loop version with base R, in case you prefer that.

df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),

                disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),

                disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),

                disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'), stringsAsFactors = FALSE)



dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")



new_df <- df[0,]



for(i in 1:nrow(df)){

  currRow <- df[i,]

  if(any(dementia_codes %in% as.character(currRow)) == FALSE){

    new_df <- rbind(new_df, currRow)

  }

}



new_df

#      ID disease_code_1 disease_code_2 disease_code_3

# 1  1001           I802           A071           H250

# 2  1002           H356             NA             NA

# 4  1004           D235             NA           I802

# 5  1005           B178             NA             NA

# 8  1008           C761             NA             NA

# 11 1011           J679           A045           D352

edited Mar 29 at 13:23

answered Mar 29 at 13:10

Dunois

858

add a comment |

A for loop version with base R, in case you prefer that.

df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),

                disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),

                disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),

                disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'), stringsAsFactors = FALSE)



dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")



new_df <- df[0,]



for(i in 1:nrow(df)){

  currRow <- df[i,]

  if(any(dementia_codes %in% as.character(currRow)) == FALSE){

    new_df <- rbind(new_df, currRow)

  }

}



new_df

#      ID disease_code_1 disease_code_2 disease_code_3

# 1  1001           I802           A071           H250

# 2  1002           H356             NA             NA

# 4  1004           D235             NA           I802

# 5  1005           B178             NA             NA

# 8  1008           C761             NA             NA

# 11 1011           J679           A045           D352

edited Mar 29 at 13:23

answered Mar 29 at 13:10

Dunois

858

A for loop version with base R, in case you prefer that.

df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),

                disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),

                disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),

                disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'), stringsAsFactors = FALSE)



dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")



new_df <- df[0,]



for(i in 1:nrow(df)){

  currRow <- df[i,]

  if(any(dementia_codes %in% as.character(currRow)) == FALSE){

    new_df <- rbind(new_df, currRow)

  }

}



new_df

#      ID disease_code_1 disease_code_2 disease_code_3

# 1  1001           I802           A071           H250

# 2  1002           H356             NA             NA

# 4  1004           D235             NA           I802

# 5  1005           B178             NA             NA

# 8  1008           C761             NA             NA

# 11 1011           J679           A045           D352

edited Mar 29 at 13:23

answered Mar 29 at 13:10

Dunois

858

edited Mar 29 at 13:23

answered Mar 29 at 13:10

Dunois

858

answered Mar 29 at 13:10

Dunois

858

answered Mar 29 at 13:10

Dunois

858

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ggthjy