How to subset dataframe based on a “not equal to” criteria applied to a large number of columns? The Next CEO of Stack OverflowHow to sort a dataframe by multiple column(s)?Extract a subset of a dataframe based on a condition involving a fieldHow to change the order of DataFrame columns?How to apply a function to two columns of Pandas dataframeHow to drop rows of Pandas DataFrame whose value in certain columns is NaNSelect rows from a DataFrame based on values in a column in pandasHow to convert index of a pandas dataframe into a column?How to count the NaN values in a column in pandas DataFramesubset a dataframe based on sum of a columnSubset dataframe based on number of observations in each column
RigExpert AA-35 - Interpreting The Information
How to invert MapIndexed on a ragged structure? How to construct a tree from rules?
How to delete every two lines after 3rd lines in a file contains very large number of lines?
How to prove a simple equation?
What happened in Rome, when the western empire "fell"?
Reference request: Grassmannian and Plucker coordinates in type B, C, D
A Man With a Stainless Steel Endoskeleton (like The Terminator) Fighting Cloaked Aliens Only He Can See
How to write a definition with variants?
Why do remote US companies require working in the US?
Is wanting to ask what to write an indication that you need to change your story?
Why isn't the Mueller report being released completely and unredacted?
TikZ: How to reverse arrow direction without switching start/end point?
What was the first Unix version to run on a microcomputer?
The past simple of "gaslight" – "gaslighted" or "gaslit"?
Method for adding error messages to a dictionary given a key
Make solar eclipses exceedingly rare, but still have new moons
Help understanding this unsettling image of Titan, Epimetheus, and Saturn's rings?
Does increasing your ability score affect your main stat?
Do I need to write [sic] when a number is less than 10 but isn't written out?
Is it ever safe to open a suspicious HTML file (e.g. email attachment)?
Why is the US ranked as #45 in Press Freedom ratings, despite its extremely permissive free speech laws?
"misplaced omit" error when >centering columns
Which one is the true statement?
Find non-case sensitive string in a mixed list of elements?
How to subset dataframe based on a “not equal to” criteria applied to a large number of columns?
The Next CEO of Stack OverflowHow to sort a dataframe by multiple column(s)?Extract a subset of a dataframe based on a condition involving a fieldHow to change the order of DataFrame columns?How to apply a function to two columns of Pandas dataframeHow to drop rows of Pandas DataFrame whose value in certain columns is NaNSelect rows from a DataFrame based on values in a column in pandasHow to convert index of a pandas dataframe into a column?How to count the NaN values in a column in pandas DataFramesubset a dataframe based on sum of a columnSubset dataframe based on number of observations in each column
I'm new to R and currently trying to subset my data according to my predefined exclusion criteria for analysis. I'm presently trying to remove all cases that have dementia, as coded by the ICD-10. Problem is that there are multiple variables containing information on each individual's disease status (~70 variables), although as they are coded in the same way, the same condition can be applied to all of them.
Some simulated data:
#Create dataframe containing simulated data
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'))
#data is structured as below:
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1003 G560 G20 NA
4 1004 D235 NA I802
5 1005 B178 NA NA
6 1006 F011 A049 A481
7 1007 F023 NA NA
8 1008 C761 NA NA
9 1009 H653 G300 NA
10 1010 A049 G308 NA
11 1011 J679 A045 D352
Here, I'm trying to remove any case that has a 'dementia code' across any of the "disease_code" variables.
#Remove cases with dementia from dataframe (e.g. F023, G20)
Newdata_df <- subset(df, (2:4 != "F023"|"G20"|"F009"|"F002"|"F001"|"F000"|"F00"|
"G309"| "G308"|"G301"|"G300"|"G30"| "F01"|"F018"|"F013"|
"F012"| "F011"| "F010"|"F01"))
The error that I recieve is:
Error in 2:4 != "F023" | "G20" :
operations are possible only for numeric, logical or complex types
Ideally, the subsetted dataframe would look like this:
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
I know that there is an error in my code although I'm not sure how exactly to fix it. I've tried a few other ways (using dplyr) although haven't had any luck so far.
Any help is greatly appreciated!
r dataframe filter subset
New contributor
add a comment |
I'm new to R and currently trying to subset my data according to my predefined exclusion criteria for analysis. I'm presently trying to remove all cases that have dementia, as coded by the ICD-10. Problem is that there are multiple variables containing information on each individual's disease status (~70 variables), although as they are coded in the same way, the same condition can be applied to all of them.
Some simulated data:
#Create dataframe containing simulated data
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'))
#data is structured as below:
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1003 G560 G20 NA
4 1004 D235 NA I802
5 1005 B178 NA NA
6 1006 F011 A049 A481
7 1007 F023 NA NA
8 1008 C761 NA NA
9 1009 H653 G300 NA
10 1010 A049 G308 NA
11 1011 J679 A045 D352
Here, I'm trying to remove any case that has a 'dementia code' across any of the "disease_code" variables.
#Remove cases with dementia from dataframe (e.g. F023, G20)
Newdata_df <- subset(df, (2:4 != "F023"|"G20"|"F009"|"F002"|"F001"|"F000"|"F00"|
"G309"| "G308"|"G301"|"G300"|"G30"| "F01"|"F018"|"F013"|
"F012"| "F011"| "F010"|"F01"))
The error that I recieve is:
Error in 2:4 != "F023" | "G20" :
operations are possible only for numeric, logical or complex types
Ideally, the subsetted dataframe would look like this:
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
I know that there is an error in my code although I'm not sure how exactly to fix it. I've tried a few other ways (using dplyr) although haven't had any luck so far.
Any help is greatly appreciated!
r dataframe filter subset
New contributor
1
You should reshape your data to long format. That will make your life (and analysis) much easier.
– docendo discimus
yesterday
add a comment |
I'm new to R and currently trying to subset my data according to my predefined exclusion criteria for analysis. I'm presently trying to remove all cases that have dementia, as coded by the ICD-10. Problem is that there are multiple variables containing information on each individual's disease status (~70 variables), although as they are coded in the same way, the same condition can be applied to all of them.
Some simulated data:
#Create dataframe containing simulated data
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'))
#data is structured as below:
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1003 G560 G20 NA
4 1004 D235 NA I802
5 1005 B178 NA NA
6 1006 F011 A049 A481
7 1007 F023 NA NA
8 1008 C761 NA NA
9 1009 H653 G300 NA
10 1010 A049 G308 NA
11 1011 J679 A045 D352
Here, I'm trying to remove any case that has a 'dementia code' across any of the "disease_code" variables.
#Remove cases with dementia from dataframe (e.g. F023, G20)
Newdata_df <- subset(df, (2:4 != "F023"|"G20"|"F009"|"F002"|"F001"|"F000"|"F00"|
"G309"| "G308"|"G301"|"G300"|"G30"| "F01"|"F018"|"F013"|
"F012"| "F011"| "F010"|"F01"))
The error that I recieve is:
Error in 2:4 != "F023" | "G20" :
operations are possible only for numeric, logical or complex types
Ideally, the subsetted dataframe would look like this:
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
I know that there is an error in my code although I'm not sure how exactly to fix it. I've tried a few other ways (using dplyr) although haven't had any luck so far.
Any help is greatly appreciated!
r dataframe filter subset
New contributor
I'm new to R and currently trying to subset my data according to my predefined exclusion criteria for analysis. I'm presently trying to remove all cases that have dementia, as coded by the ICD-10. Problem is that there are multiple variables containing information on each individual's disease status (~70 variables), although as they are coded in the same way, the same condition can be applied to all of them.
Some simulated data:
#Create dataframe containing simulated data
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'))
#data is structured as below:
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1003 G560 G20 NA
4 1004 D235 NA I802
5 1005 B178 NA NA
6 1006 F011 A049 A481
7 1007 F023 NA NA
8 1008 C761 NA NA
9 1009 H653 G300 NA
10 1010 A049 G308 NA
11 1011 J679 A045 D352
Here, I'm trying to remove any case that has a 'dementia code' across any of the "disease_code" variables.
#Remove cases with dementia from dataframe (e.g. F023, G20)
Newdata_df <- subset(df, (2:4 != "F023"|"G20"|"F009"|"F002"|"F001"|"F000"|"F00"|
"G309"| "G308"|"G301"|"G300"|"G30"| "F01"|"F018"|"F013"|
"F012"| "F011"| "F010"|"F01"))
The error that I recieve is:
Error in 2:4 != "F023" | "G20" :
operations are possible only for numeric, logical or complex types
Ideally, the subsetted dataframe would look like this:
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
I know that there is an error in my code although I'm not sure how exactly to fix it. I've tried a few other ways (using dplyr) although haven't had any luck so far.
Any help is greatly appreciated!
r dataframe filter subset
r dataframe filter subset
New contributor
New contributor
edited yesterday
Sotos
31.1k51741
31.1k51741
New contributor
asked yesterday
M_OxfordM_Oxford
433
433
New contributor
New contributor
1
You should reshape your data to long format. That will make your life (and analysis) much easier.
– docendo discimus
yesterday
add a comment |
1
You should reshape your data to long format. That will make your life (and analysis) much easier.
– docendo discimus
yesterday
1
1
You should reshape your data to long format. That will make your life (and analysis) much easier.
– docendo discimus
yesterday
You should reshape your data to long format. That will make your life (and analysis) much easier.
– docendo discimus
yesterday
add a comment |
6 Answers
6
active
oldest
votes
One dplyr
possibility could be:
df %>%
filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
"G309", "G308","G301","G300","G30", "F01","F018","F013",
"F012", "F011", "F010","F01")))
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1004 D235 NA I802
4 1005 B178 NA NA
5 1008 C761 NA NA
6 1011 J679 A045 D352
In this case, it checks whether any of the columns 2:4 contains any of the given codes.
Or:
df %>%
filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
"G309", "G308","G301","G300","G30", "F01","F018","F013",
"F012", "F011", "F010","F01")))
In this case, it checks whether any of the columns with names disease_code
contains any of the given codes.
1
Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!
– M_Oxford
yesterday
add a comment |
We can create a vector with the codes to be removed and use rowSums
to remove, i.e.
codes_to_remove <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
"G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
df[rowSums(sapply(df[-1], `%in%`, codes_to_remove)) == 0,]
which gives,
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
add a comment |
As mentioned in comments by @docendo discimus we can convert the dataframe to long format using gather
, group_by
ID
and select only those ID
s which do not have dementia_code
in them and then spread
them back to wide format.
library(tidyverse)
df %>%
gather(key, value, -ID) %>%
group_by(ID) %>%
filter(!any(value %in% dementia_code)) %>%
spread(key, value)
# ID disease_code_1 disease_code_2 disease_code_3
# <dbl> <chr> <chr> <chr>
#1 1001 I802 A071 H250
#2 1002 H356 NA NA
#3 1004 D235 NA I802
#4 1005 B178 NA NA
#5 1008 C761 NA NA
#6 1011 J679 A045 D352
data
dementia_code <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309",
"G308","G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
Why load all oftidyverse
? Isn't this justtidyr
anddplyr
?
– Dunois
yesterday
1
@Dunois yes, it is. I have a habit of loading it all up by default :P
– Ronak Shah
yesterday
3
We could also do it using ananti_join
such asNewdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")
– Kerry Jackson
yesterday
add a comment |
How about this:
> dementia <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
+ "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
>
> dementia <- apply(sapply(df[, -1], function(x) x %in% dementia), 1, any)
>
> df[!dementia,]
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
>
Edit:
An even more elegant solution, thanks to @ Ronan Shah:
> df[apply(df[-1], 1, function(x) !any(x %in% dementia)),]
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
Hope it helps.
@ Ronan Shah Nice! Its a more elegant solution. You should post it.
– Santiago Capobianco
yesterday
1
Yes! Sorry, I will change it right away.
– Santiago Capobianco
yesterday
add a comment |
We can use melt/dcast
from data.table
library(data.table)
dcast(melt(setDT(df), id.var = 'ID')[,
if(!any(value %in% dementia_codes)) .SD, .(ID)], ID ~ variable)
# ID disease_code_1 disease_code_2 disease_code_3
#1: 1001 I802 A071 H250
#2: 1002 H356 NA NA
#3: 1004 D235 NA I802
#4: 1005 B178 NA NA
#5: 1008 C761 NA NA
#6: 1011 J679 A045 D352
Or this can be done more compactly in base R
with no reshaping
df[!Reduce(`|`, lapply(df[-1], `%in%` , dementia_codes)),]
# ID disease_code_1 disease_code_2 disease_code_3
#1 1001 I802 A071 H250
#2 1002 H356 NA NA
#4 1004 D235 NA I802
#5 1005 B178 NA NA
#8 1008 C761 NA NA
#11 1011 J679 A045 D352
data
dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000",
"F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013",
"F012", "F011", "F010", "F01")
add a comment |
A for
loop version with base
R, in case you prefer that.
df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'), stringsAsFactors = FALSE)
dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
new_df <- df[0,]
for(i in 1:nrow(df))
currRow <- df[i,]
if(any(dementia_codes %in% as.character(currRow)) == FALSE)
new_df <- rbind(new_df, currRow)
new_df
# ID disease_code_1 disease_code_2 disease_code_3
# 1 1001 I802 A071 H250
# 2 1002 H356 NA NA
# 4 1004 D235 NA I802
# 5 1005 B178 NA NA
# 8 1008 C761 NA NA
# 11 1011 J679 A045 D352
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
M_Oxford is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55417645%2fhow-to-subset-dataframe-based-on-a-not-equal-to-criteria-applied-to-a-large-nu%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
One dplyr
possibility could be:
df %>%
filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
"G309", "G308","G301","G300","G30", "F01","F018","F013",
"F012", "F011", "F010","F01")))
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1004 D235 NA I802
4 1005 B178 NA NA
5 1008 C761 NA NA
6 1011 J679 A045 D352
In this case, it checks whether any of the columns 2:4 contains any of the given codes.
Or:
df %>%
filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
"G309", "G308","G301","G300","G30", "F01","F018","F013",
"F012", "F011", "F010","F01")))
In this case, it checks whether any of the columns with names disease_code
contains any of the given codes.
1
Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!
– M_Oxford
yesterday
add a comment |
One dplyr
possibility could be:
df %>%
filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
"G309", "G308","G301","G300","G30", "F01","F018","F013",
"F012", "F011", "F010","F01")))
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1004 D235 NA I802
4 1005 B178 NA NA
5 1008 C761 NA NA
6 1011 J679 A045 D352
In this case, it checks whether any of the columns 2:4 contains any of the given codes.
Or:
df %>%
filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
"G309", "G308","G301","G300","G30", "F01","F018","F013",
"F012", "F011", "F010","F01")))
In this case, it checks whether any of the columns with names disease_code
contains any of the given codes.
1
Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!
– M_Oxford
yesterday
add a comment |
One dplyr
possibility could be:
df %>%
filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
"G309", "G308","G301","G300","G30", "F01","F018","F013",
"F012", "F011", "F010","F01")))
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1004 D235 NA I802
4 1005 B178 NA NA
5 1008 C761 NA NA
6 1011 J679 A045 D352
In this case, it checks whether any of the columns 2:4 contains any of the given codes.
Or:
df %>%
filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
"G309", "G308","G301","G300","G30", "F01","F018","F013",
"F012", "F011", "F010","F01")))
In this case, it checks whether any of the columns with names disease_code
contains any of the given codes.
One dplyr
possibility could be:
df %>%
filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
"G309", "G308","G301","G300","G30", "F01","F018","F013",
"F012", "F011", "F010","F01")))
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1004 D235 NA I802
4 1005 B178 NA NA
5 1008 C761 NA NA
6 1011 J679 A045 D352
In this case, it checks whether any of the columns 2:4 contains any of the given codes.
Or:
df %>%
filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
"G309", "G308","G301","G300","G30", "F01","F018","F013",
"F012", "F011", "F010","F01")))
In this case, it checks whether any of the columns with names disease_code
contains any of the given codes.
edited yesterday
answered yesterday
tmfmnktmfmnk
3,5841516
3,5841516
1
Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!
– M_Oxford
yesterday
add a comment |
1
Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!
– M_Oxford
yesterday
1
1
Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!
– M_Oxford
yesterday
Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!
– M_Oxford
yesterday
add a comment |
We can create a vector with the codes to be removed and use rowSums
to remove, i.e.
codes_to_remove <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
"G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
df[rowSums(sapply(df[-1], `%in%`, codes_to_remove)) == 0,]
which gives,
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
add a comment |
We can create a vector with the codes to be removed and use rowSums
to remove, i.e.
codes_to_remove <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
"G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
df[rowSums(sapply(df[-1], `%in%`, codes_to_remove)) == 0,]
which gives,
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
add a comment |
We can create a vector with the codes to be removed and use rowSums
to remove, i.e.
codes_to_remove <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
"G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
df[rowSums(sapply(df[-1], `%in%`, codes_to_remove)) == 0,]
which gives,
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
We can create a vector with the codes to be removed and use rowSums
to remove, i.e.
codes_to_remove <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
"G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
df[rowSums(sapply(df[-1], `%in%`, codes_to_remove)) == 0,]
which gives,
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
answered yesterday
SotosSotos
31.1k51741
31.1k51741
add a comment |
add a comment |
As mentioned in comments by @docendo discimus we can convert the dataframe to long format using gather
, group_by
ID
and select only those ID
s which do not have dementia_code
in them and then spread
them back to wide format.
library(tidyverse)
df %>%
gather(key, value, -ID) %>%
group_by(ID) %>%
filter(!any(value %in% dementia_code)) %>%
spread(key, value)
# ID disease_code_1 disease_code_2 disease_code_3
# <dbl> <chr> <chr> <chr>
#1 1001 I802 A071 H250
#2 1002 H356 NA NA
#3 1004 D235 NA I802
#4 1005 B178 NA NA
#5 1008 C761 NA NA
#6 1011 J679 A045 D352
data
dementia_code <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309",
"G308","G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
Why load all oftidyverse
? Isn't this justtidyr
anddplyr
?
– Dunois
yesterday
1
@Dunois yes, it is. I have a habit of loading it all up by default :P
– Ronak Shah
yesterday
3
We could also do it using ananti_join
such asNewdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")
– Kerry Jackson
yesterday
add a comment |
As mentioned in comments by @docendo discimus we can convert the dataframe to long format using gather
, group_by
ID
and select only those ID
s which do not have dementia_code
in them and then spread
them back to wide format.
library(tidyverse)
df %>%
gather(key, value, -ID) %>%
group_by(ID) %>%
filter(!any(value %in% dementia_code)) %>%
spread(key, value)
# ID disease_code_1 disease_code_2 disease_code_3
# <dbl> <chr> <chr> <chr>
#1 1001 I802 A071 H250
#2 1002 H356 NA NA
#3 1004 D235 NA I802
#4 1005 B178 NA NA
#5 1008 C761 NA NA
#6 1011 J679 A045 D352
data
dementia_code <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309",
"G308","G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
Why load all oftidyverse
? Isn't this justtidyr
anddplyr
?
– Dunois
yesterday
1
@Dunois yes, it is. I have a habit of loading it all up by default :P
– Ronak Shah
yesterday
3
We could also do it using ananti_join
such asNewdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")
– Kerry Jackson
yesterday
add a comment |
As mentioned in comments by @docendo discimus we can convert the dataframe to long format using gather
, group_by
ID
and select only those ID
s which do not have dementia_code
in them and then spread
them back to wide format.
library(tidyverse)
df %>%
gather(key, value, -ID) %>%
group_by(ID) %>%
filter(!any(value %in% dementia_code)) %>%
spread(key, value)
# ID disease_code_1 disease_code_2 disease_code_3
# <dbl> <chr> <chr> <chr>
#1 1001 I802 A071 H250
#2 1002 H356 NA NA
#3 1004 D235 NA I802
#4 1005 B178 NA NA
#5 1008 C761 NA NA
#6 1011 J679 A045 D352
data
dementia_code <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309",
"G308","G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
As mentioned in comments by @docendo discimus we can convert the dataframe to long format using gather
, group_by
ID
and select only those ID
s which do not have dementia_code
in them and then spread
them back to wide format.
library(tidyverse)
df %>%
gather(key, value, -ID) %>%
group_by(ID) %>%
filter(!any(value %in% dementia_code)) %>%
spread(key, value)
# ID disease_code_1 disease_code_2 disease_code_3
# <dbl> <chr> <chr> <chr>
#1 1001 I802 A071 H250
#2 1002 H356 NA NA
#3 1004 D235 NA I802
#4 1005 B178 NA NA
#5 1008 C761 NA NA
#6 1011 J679 A045 D352
data
dementia_code <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309",
"G308","G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
edited yesterday
answered yesterday
Ronak ShahRonak Shah
43.9k104266
43.9k104266
Why load all oftidyverse
? Isn't this justtidyr
anddplyr
?
– Dunois
yesterday
1
@Dunois yes, it is. I have a habit of loading it all up by default :P
– Ronak Shah
yesterday
3
We could also do it using ananti_join
such asNewdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")
– Kerry Jackson
yesterday
add a comment |
Why load all oftidyverse
? Isn't this justtidyr
anddplyr
?
– Dunois
yesterday
1
@Dunois yes, it is. I have a habit of loading it all up by default :P
– Ronak Shah
yesterday
3
We could also do it using ananti_join
such asNewdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")
– Kerry Jackson
yesterday
Why load all of
tidyverse
? Isn't this just tidyr
and dplyr
?– Dunois
yesterday
Why load all of
tidyverse
? Isn't this just tidyr
and dplyr
?– Dunois
yesterday
1
1
@Dunois yes, it is. I have a habit of loading it all up by default :P
– Ronak Shah
yesterday
@Dunois yes, it is. I have a habit of loading it all up by default :P
– Ronak Shah
yesterday
3
3
We could also do it using an
anti_join
such as Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")
– Kerry Jackson
yesterday
We could also do it using an
anti_join
such as Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")
– Kerry Jackson
yesterday
add a comment |
How about this:
> dementia <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
+ "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
>
> dementia <- apply(sapply(df[, -1], function(x) x %in% dementia), 1, any)
>
> df[!dementia,]
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
>
Edit:
An even more elegant solution, thanks to @ Ronan Shah:
> df[apply(df[-1], 1, function(x) !any(x %in% dementia)),]
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
Hope it helps.
@ Ronan Shah Nice! Its a more elegant solution. You should post it.
– Santiago Capobianco
yesterday
1
Yes! Sorry, I will change it right away.
– Santiago Capobianco
yesterday
add a comment |
How about this:
> dementia <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
+ "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
>
> dementia <- apply(sapply(df[, -1], function(x) x %in% dementia), 1, any)
>
> df[!dementia,]
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
>
Edit:
An even more elegant solution, thanks to @ Ronan Shah:
> df[apply(df[-1], 1, function(x) !any(x %in% dementia)),]
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
Hope it helps.
@ Ronan Shah Nice! Its a more elegant solution. You should post it.
– Santiago Capobianco
yesterday
1
Yes! Sorry, I will change it right away.
– Santiago Capobianco
yesterday
add a comment |
How about this:
> dementia <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
+ "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
>
> dementia <- apply(sapply(df[, -1], function(x) x %in% dementia), 1, any)
>
> df[!dementia,]
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
>
Edit:
An even more elegant solution, thanks to @ Ronan Shah:
> df[apply(df[-1], 1, function(x) !any(x %in% dementia)),]
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
Hope it helps.
How about this:
> dementia <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
+ "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
>
> dementia <- apply(sapply(df[, -1], function(x) x %in% dementia), 1, any)
>
> df[!dementia,]
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
>
Edit:
An even more elegant solution, thanks to @ Ronan Shah:
> df[apply(df[-1], 1, function(x) !any(x %in% dementia)),]
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
Hope it helps.
edited yesterday
answered yesterday
Santiago CapobiancoSantiago Capobianco
491310
491310
@ Ronan Shah Nice! Its a more elegant solution. You should post it.
– Santiago Capobianco
yesterday
1
Yes! Sorry, I will change it right away.
– Santiago Capobianco
yesterday
add a comment |
@ Ronan Shah Nice! Its a more elegant solution. You should post it.
– Santiago Capobianco
yesterday
1
Yes! Sorry, I will change it right away.
– Santiago Capobianco
yesterday
@ Ronan Shah Nice! Its a more elegant solution. You should post it.
– Santiago Capobianco
yesterday
@ Ronan Shah Nice! Its a more elegant solution. You should post it.
– Santiago Capobianco
yesterday
1
1
Yes! Sorry, I will change it right away.
– Santiago Capobianco
yesterday
Yes! Sorry, I will change it right away.
– Santiago Capobianco
yesterday
add a comment |
We can use melt/dcast
from data.table
library(data.table)
dcast(melt(setDT(df), id.var = 'ID')[,
if(!any(value %in% dementia_codes)) .SD, .(ID)], ID ~ variable)
# ID disease_code_1 disease_code_2 disease_code_3
#1: 1001 I802 A071 H250
#2: 1002 H356 NA NA
#3: 1004 D235 NA I802
#4: 1005 B178 NA NA
#5: 1008 C761 NA NA
#6: 1011 J679 A045 D352
Or this can be done more compactly in base R
with no reshaping
df[!Reduce(`|`, lapply(df[-1], `%in%` , dementia_codes)),]
# ID disease_code_1 disease_code_2 disease_code_3
#1 1001 I802 A071 H250
#2 1002 H356 NA NA
#4 1004 D235 NA I802
#5 1005 B178 NA NA
#8 1008 C761 NA NA
#11 1011 J679 A045 D352
data
dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000",
"F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013",
"F012", "F011", "F010", "F01")
add a comment |
We can use melt/dcast
from data.table
library(data.table)
dcast(melt(setDT(df), id.var = 'ID')[,
if(!any(value %in% dementia_codes)) .SD, .(ID)], ID ~ variable)
# ID disease_code_1 disease_code_2 disease_code_3
#1: 1001 I802 A071 H250
#2: 1002 H356 NA NA
#3: 1004 D235 NA I802
#4: 1005 B178 NA NA
#5: 1008 C761 NA NA
#6: 1011 J679 A045 D352
Or this can be done more compactly in base R
with no reshaping
df[!Reduce(`|`, lapply(df[-1], `%in%` , dementia_codes)),]
# ID disease_code_1 disease_code_2 disease_code_3
#1 1001 I802 A071 H250
#2 1002 H356 NA NA
#4 1004 D235 NA I802
#5 1005 B178 NA NA
#8 1008 C761 NA NA
#11 1011 J679 A045 D352
data
dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000",
"F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013",
"F012", "F011", "F010", "F01")
add a comment |
We can use melt/dcast
from data.table
library(data.table)
dcast(melt(setDT(df), id.var = 'ID')[,
if(!any(value %in% dementia_codes)) .SD, .(ID)], ID ~ variable)
# ID disease_code_1 disease_code_2 disease_code_3
#1: 1001 I802 A071 H250
#2: 1002 H356 NA NA
#3: 1004 D235 NA I802
#4: 1005 B178 NA NA
#5: 1008 C761 NA NA
#6: 1011 J679 A045 D352
Or this can be done more compactly in base R
with no reshaping
df[!Reduce(`|`, lapply(df[-1], `%in%` , dementia_codes)),]
# ID disease_code_1 disease_code_2 disease_code_3
#1 1001 I802 A071 H250
#2 1002 H356 NA NA
#4 1004 D235 NA I802
#5 1005 B178 NA NA
#8 1008 C761 NA NA
#11 1011 J679 A045 D352
data
dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000",
"F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013",
"F012", "F011", "F010", "F01")
We can use melt/dcast
from data.table
library(data.table)
dcast(melt(setDT(df), id.var = 'ID')[,
if(!any(value %in% dementia_codes)) .SD, .(ID)], ID ~ variable)
# ID disease_code_1 disease_code_2 disease_code_3
#1: 1001 I802 A071 H250
#2: 1002 H356 NA NA
#3: 1004 D235 NA I802
#4: 1005 B178 NA NA
#5: 1008 C761 NA NA
#6: 1011 J679 A045 D352
Or this can be done more compactly in base R
with no reshaping
df[!Reduce(`|`, lapply(df[-1], `%in%` , dementia_codes)),]
# ID disease_code_1 disease_code_2 disease_code_3
#1 1001 I802 A071 H250
#2 1002 H356 NA NA
#4 1004 D235 NA I802
#5 1005 B178 NA NA
#8 1008 C761 NA NA
#11 1011 J679 A045 D352
data
dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000",
"F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013",
"F012", "F011", "F010", "F01")
edited yesterday
answered yesterday
akrunakrun
418k13206281
418k13206281
add a comment |
add a comment |
A for
loop version with base
R, in case you prefer that.
df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'), stringsAsFactors = FALSE)
dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
new_df <- df[0,]
for(i in 1:nrow(df))
currRow <- df[i,]
if(any(dementia_codes %in% as.character(currRow)) == FALSE)
new_df <- rbind(new_df, currRow)
new_df
# ID disease_code_1 disease_code_2 disease_code_3
# 1 1001 I802 A071 H250
# 2 1002 H356 NA NA
# 4 1004 D235 NA I802
# 5 1005 B178 NA NA
# 8 1008 C761 NA NA
# 11 1011 J679 A045 D352
add a comment |
A for
loop version with base
R, in case you prefer that.
df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'), stringsAsFactors = FALSE)
dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
new_df <- df[0,]
for(i in 1:nrow(df))
currRow <- df[i,]
if(any(dementia_codes %in% as.character(currRow)) == FALSE)
new_df <- rbind(new_df, currRow)
new_df
# ID disease_code_1 disease_code_2 disease_code_3
# 1 1001 I802 A071 H250
# 2 1002 H356 NA NA
# 4 1004 D235 NA I802
# 5 1005 B178 NA NA
# 8 1008 C761 NA NA
# 11 1011 J679 A045 D352
add a comment |
A for
loop version with base
R, in case you prefer that.
df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'), stringsAsFactors = FALSE)
dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
new_df <- df[0,]
for(i in 1:nrow(df))
currRow <- df[i,]
if(any(dementia_codes %in% as.character(currRow)) == FALSE)
new_df <- rbind(new_df, currRow)
new_df
# ID disease_code_1 disease_code_2 disease_code_3
# 1 1001 I802 A071 H250
# 2 1002 H356 NA NA
# 4 1004 D235 NA I802
# 5 1005 B178 NA NA
# 8 1008 C761 NA NA
# 11 1011 J679 A045 D352
A for
loop version with base
R, in case you prefer that.
df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'), stringsAsFactors = FALSE)
dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
new_df <- df[0,]
for(i in 1:nrow(df))
currRow <- df[i,]
if(any(dementia_codes %in% as.character(currRow)) == FALSE)
new_df <- rbind(new_df, currRow)
new_df
# ID disease_code_1 disease_code_2 disease_code_3
# 1 1001 I802 A071 H250
# 2 1002 H356 NA NA
# 4 1004 D235 NA I802
# 5 1005 B178 NA NA
# 8 1008 C761 NA NA
# 11 1011 J679 A045 D352
edited yesterday
answered yesterday
DunoisDunois
858
858
add a comment |
add a comment |
M_Oxford is a new contributor. Be nice, and check out our Code of Conduct.
M_Oxford is a new contributor. Be nice, and check out our Code of Conduct.
M_Oxford is a new contributor. Be nice, and check out our Code of Conduct.
M_Oxford is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55417645%2fhow-to-subset-dataframe-based-on-a-not-equal-to-criteria-applied-to-a-large-nu%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
You should reshape your data to long format. That will make your life (and analysis) much easier.
– docendo discimus
yesterday