How to subset dataframe based on a “not equal to” criteria applied to a large number of columns? The Next CEO of Stack OverflowHow to sort a dataframe by multiple column(s)?Extract a subset of a dataframe based on a condition involving a fieldHow to change the order of DataFrame columns?How to apply a function to two columns of Pandas dataframeHow to drop rows of Pandas DataFrame whose value in certain columns is NaNSelect rows from a DataFrame based on values in a column in pandasHow to convert index of a pandas dataframe into a column?How to count the NaN values in a column in pandas DataFramesubset a dataframe based on sum of a columnSubset dataframe based on number of observations in each column

RigExpert AA-35 - Interpreting The Information

How to invert MapIndexed on a ragged structure? How to construct a tree from rules?

How to delete every two lines after 3rd lines in a file contains very large number of lines?

How to prove a simple equation?

What happened in Rome, when the western empire "fell"?

Reference request: Grassmannian and Plucker coordinates in type B, C, D

A Man With a Stainless Steel Endoskeleton (like The Terminator) Fighting Cloaked Aliens Only He Can See

How to write a definition with variants?

Why do remote US companies require working in the US?

Is wanting to ask what to write an indication that you need to change your story?

Why isn't the Mueller report being released completely and unredacted?

TikZ: How to reverse arrow direction without switching start/end point?

What was the first Unix version to run on a microcomputer?

The past simple of "gaslight" – "gaslighted" or "gaslit"?

Method for adding error messages to a dictionary given a key

Make solar eclipses exceedingly rare, but still have new moons

Help understanding this unsettling image of Titan, Epimetheus, and Saturn's rings?

Does increasing your ability score affect your main stat?

Do I need to write [sic] when a number is less than 10 but isn't written out?

Is it ever safe to open a suspicious HTML file (e.g. email attachment)?

Why is the US ranked as #45 in Press Freedom ratings, despite its extremely permissive free speech laws?

"misplaced omit" error when >centering columns

Which one is the true statement?

Find non-case sensitive string in a mixed list of elements?



How to subset dataframe based on a “not equal to” criteria applied to a large number of columns?



The Next CEO of Stack OverflowHow to sort a dataframe by multiple column(s)?Extract a subset of a dataframe based on a condition involving a fieldHow to change the order of DataFrame columns?How to apply a function to two columns of Pandas dataframeHow to drop rows of Pandas DataFrame whose value in certain columns is NaNSelect rows from a DataFrame based on values in a column in pandasHow to convert index of a pandas dataframe into a column?How to count the NaN values in a column in pandas DataFramesubset a dataframe based on sum of a columnSubset dataframe based on number of observations in each column










8















I'm new to R and currently trying to subset my data according to my predefined exclusion criteria for analysis. I'm presently trying to remove all cases that have dementia, as coded by the ICD-10. Problem is that there are multiple variables containing information on each individual's disease status (~70 variables), although as they are coded in the same way, the same condition can be applied to all of them.



Some simulated data:



#Create dataframe containing simulated data
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'))

#data is structured as below:

ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1003 G560 G20 NA
4 1004 D235 NA I802
5 1005 B178 NA NA
6 1006 F011 A049 A481
7 1007 F023 NA NA
8 1008 C761 NA NA
9 1009 H653 G300 NA
10 1010 A049 G308 NA
11 1011 J679 A045 D352




Here, I'm trying to remove any case that has a 'dementia code' across any of the "disease_code" variables.



#Remove cases with dementia from dataframe (e.g. F023, G20)
Newdata_df <- subset(df, (2:4 != "F023"|"G20"|"F009"|"F002"|"F001"|"F000"|"F00"|
"G309"| "G308"|"G301"|"G300"|"G30"| "F01"|"F018"|"F013"|
"F012"| "F011"| "F010"|"F01"))


The error that I recieve is:



Error in 2:4 != "F023" | "G20" : 
operations are possible only for numeric, logical or complex types


Ideally, the subsetted dataframe would look like this:



 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352


I know that there is an error in my code although I'm not sure how exactly to fix it. I've tried a few other ways (using dplyr) although haven't had any luck so far.



Any help is greatly appreciated!










share|improve this question









New contributor




M_Oxford is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.















  • 1





    You should reshape your data to long format. That will make your life (and analysis) much easier.

    – docendo discimus
    yesterday















8















I'm new to R and currently trying to subset my data according to my predefined exclusion criteria for analysis. I'm presently trying to remove all cases that have dementia, as coded by the ICD-10. Problem is that there are multiple variables containing information on each individual's disease status (~70 variables), although as they are coded in the same way, the same condition can be applied to all of them.



Some simulated data:



#Create dataframe containing simulated data
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'))

#data is structured as below:

ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1003 G560 G20 NA
4 1004 D235 NA I802
5 1005 B178 NA NA
6 1006 F011 A049 A481
7 1007 F023 NA NA
8 1008 C761 NA NA
9 1009 H653 G300 NA
10 1010 A049 G308 NA
11 1011 J679 A045 D352




Here, I'm trying to remove any case that has a 'dementia code' across any of the "disease_code" variables.



#Remove cases with dementia from dataframe (e.g. F023, G20)
Newdata_df <- subset(df, (2:4 != "F023"|"G20"|"F009"|"F002"|"F001"|"F000"|"F00"|
"G309"| "G308"|"G301"|"G300"|"G30"| "F01"|"F018"|"F013"|
"F012"| "F011"| "F010"|"F01"))


The error that I recieve is:



Error in 2:4 != "F023" | "G20" : 
operations are possible only for numeric, logical or complex types


Ideally, the subsetted dataframe would look like this:



 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352


I know that there is an error in my code although I'm not sure how exactly to fix it. I've tried a few other ways (using dplyr) although haven't had any luck so far.



Any help is greatly appreciated!










share|improve this question









New contributor




M_Oxford is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.















  • 1





    You should reshape your data to long format. That will make your life (and analysis) much easier.

    – docendo discimus
    yesterday













8












8








8


0






I'm new to R and currently trying to subset my data according to my predefined exclusion criteria for analysis. I'm presently trying to remove all cases that have dementia, as coded by the ICD-10. Problem is that there are multiple variables containing information on each individual's disease status (~70 variables), although as they are coded in the same way, the same condition can be applied to all of them.



Some simulated data:



#Create dataframe containing simulated data
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'))

#data is structured as below:

ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1003 G560 G20 NA
4 1004 D235 NA I802
5 1005 B178 NA NA
6 1006 F011 A049 A481
7 1007 F023 NA NA
8 1008 C761 NA NA
9 1009 H653 G300 NA
10 1010 A049 G308 NA
11 1011 J679 A045 D352




Here, I'm trying to remove any case that has a 'dementia code' across any of the "disease_code" variables.



#Remove cases with dementia from dataframe (e.g. F023, G20)
Newdata_df <- subset(df, (2:4 != "F023"|"G20"|"F009"|"F002"|"F001"|"F000"|"F00"|
"G309"| "G308"|"G301"|"G300"|"G30"| "F01"|"F018"|"F013"|
"F012"| "F011"| "F010"|"F01"))


The error that I recieve is:



Error in 2:4 != "F023" | "G20" : 
operations are possible only for numeric, logical or complex types


Ideally, the subsetted dataframe would look like this:



 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352


I know that there is an error in my code although I'm not sure how exactly to fix it. I've tried a few other ways (using dplyr) although haven't had any luck so far.



Any help is greatly appreciated!










share|improve this question









New contributor




M_Oxford is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












I'm new to R and currently trying to subset my data according to my predefined exclusion criteria for analysis. I'm presently trying to remove all cases that have dementia, as coded by the ICD-10. Problem is that there are multiple variables containing information on each individual's disease status (~70 variables), although as they are coded in the same way, the same condition can be applied to all of them.



Some simulated data:



#Create dataframe containing simulated data
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'))

#data is structured as below:

ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1003 G560 G20 NA
4 1004 D235 NA I802
5 1005 B178 NA NA
6 1006 F011 A049 A481
7 1007 F023 NA NA
8 1008 C761 NA NA
9 1009 H653 G300 NA
10 1010 A049 G308 NA
11 1011 J679 A045 D352




Here, I'm trying to remove any case that has a 'dementia code' across any of the "disease_code" variables.



#Remove cases with dementia from dataframe (e.g. F023, G20)
Newdata_df <- subset(df, (2:4 != "F023"|"G20"|"F009"|"F002"|"F001"|"F000"|"F00"|
"G309"| "G308"|"G301"|"G300"|"G30"| "F01"|"F018"|"F013"|
"F012"| "F011"| "F010"|"F01"))


The error that I recieve is:



Error in 2:4 != "F023" | "G20" : 
operations are possible only for numeric, logical or complex types


Ideally, the subsetted dataframe would look like this:



 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352


I know that there is an error in my code although I'm not sure how exactly to fix it. I've tried a few other ways (using dplyr) although haven't had any luck so far.



Any help is greatly appreciated!







r dataframe filter subset






share|improve this question









New contributor




M_Oxford is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




M_Oxford is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited yesterday









Sotos

31.1k51741




31.1k51741






New contributor




M_Oxford is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked yesterday









M_OxfordM_Oxford

433




433




New contributor




M_Oxford is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





M_Oxford is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






M_Oxford is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







  • 1





    You should reshape your data to long format. That will make your life (and analysis) much easier.

    – docendo discimus
    yesterday












  • 1





    You should reshape your data to long format. That will make your life (and analysis) much easier.

    – docendo discimus
    yesterday







1




1





You should reshape your data to long format. That will make your life (and analysis) much easier.

– docendo discimus
yesterday





You should reshape your data to long format. That will make your life (and analysis) much easier.

– docendo discimus
yesterday












6 Answers
6






active

oldest

votes


















3














One dplyr possibility could be:



df %>%
filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
"G309", "G308","G301","G300","G30", "F01","F018","F013",
"F012", "F011", "F010","F01")))

ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1004 D235 NA I802
4 1005 B178 NA NA
5 1008 C761 NA NA
6 1011 J679 A045 D352


In this case, it checks whether any of the columns 2:4 contains any of the given codes.



Or:



df %>%
filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
"G309", "G308","G301","G300","G30", "F01","F018","F013",
"F012", "F011", "F010","F01")))


In this case, it checks whether any of the columns with names disease_code contains any of the given codes.






share|improve this answer




















  • 1





    Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!

    – M_Oxford
    yesterday


















4














We can create a vector with the codes to be removed and use rowSums to remove, i.e.



codes_to_remove <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
"G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

df[rowSums(sapply(df[-1], `%in%`, codes_to_remove)) == 0,]


which gives,




 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352






share|improve this answer






























    3














    As mentioned in comments by @docendo discimus we can convert the dataframe to long format using gather, group_by ID and select only those IDs which do not have dementia_code in them and then spread them back to wide format.



    library(tidyverse)

    df %>%
    gather(key, value, -ID) %>%
    group_by(ID) %>%
    filter(!any(value %in% dementia_code)) %>%
    spread(key, value)

    # ID disease_code_1 disease_code_2 disease_code_3
    # <dbl> <chr> <chr> <chr>
    #1 1001 I802 A071 H250
    #2 1002 H356 NA NA
    #3 1004 D235 NA I802
    #4 1005 B178 NA NA
    #5 1008 C761 NA NA
    #6 1011 J679 A045 D352


    data



    dementia_code <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", 
    "G308","G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")





    share|improve this answer

























    • Why load all of tidyverse? Isn't this just tidyr and dplyr?

      – Dunois
      yesterday






    • 1





      @Dunois yes, it is. I have a habit of loading it all up by default :P

      – Ronak Shah
      yesterday







    • 3





      We could also do it using an anti_join such as Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")

      – Kerry Jackson
      yesterday


















    3














    How about this:



    > dementia <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
    + "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
    >
    > dementia <- apply(sapply(df[, -1], function(x) x %in% dementia), 1, any)
    >
    > df[!dementia,]
    ID disease_code_1 disease_code_2 disease_code_3
    1 1001 I802 A071 H250
    2 1002 H356 NA NA
    4 1004 D235 NA I802
    5 1005 B178 NA NA
    8 1008 C761 NA NA
    11 1011 J679 A045 D352
    >


    Edit:



    An even more elegant solution, thanks to @ Ronan Shah:



    > df[apply(df[-1], 1, function(x) !any(x %in% dementia)),]
    ID disease_code_1 disease_code_2 disease_code_3
    1 1001 I802 A071 H250
    2 1002 H356 NA NA
    4 1004 D235 NA I802
    5 1005 B178 NA NA
    8 1008 C761 NA NA
    11 1011 J679 A045 D352


    Hope it helps.






    share|improve this answer

























    • @ Ronan Shah Nice! Its a more elegant solution. You should post it.

      – Santiago Capobianco
      yesterday







    • 1





      Yes! Sorry, I will change it right away.

      – Santiago Capobianco
      yesterday


















    3














    We can use melt/dcast from data.table



    library(data.table)
    dcast(melt(setDT(df), id.var = 'ID')[,
    if(!any(value %in% dementia_codes)) .SD, .(ID)], ID ~ variable)
    # ID disease_code_1 disease_code_2 disease_code_3
    #1: 1001 I802 A071 H250
    #2: 1002 H356 NA NA
    #3: 1004 D235 NA I802
    #4: 1005 B178 NA NA
    #5: 1008 C761 NA NA
    #6: 1011 J679 A045 D352



    Or this can be done more compactly in base R with no reshaping



    df[!Reduce(`|`, lapply(df[-1], `%in%` , dementia_codes)),]
    # ID disease_code_1 disease_code_2 disease_code_3
    #1 1001 I802 A071 H250
    #2 1002 H356 NA NA
    #4 1004 D235 NA I802
    #5 1005 B178 NA NA
    #8 1008 C761 NA NA
    #11 1011 J679 A045 D352


    data



    dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", 
    "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013",
    "F012", "F011", "F010", "F01")





    share|improve this answer
































      2














      A for loop version with base R, in case you prefer that.



      df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
      disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
      disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
      disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'), stringsAsFactors = FALSE)

      dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

      new_df <- df[0,]

      for(i in 1:nrow(df))
      currRow <- df[i,]
      if(any(dementia_codes %in% as.character(currRow)) == FALSE)
      new_df <- rbind(new_df, currRow)



      new_df
      # ID disease_code_1 disease_code_2 disease_code_3
      # 1 1001 I802 A071 H250
      # 2 1002 H356 NA NA
      # 4 1004 D235 NA I802
      # 5 1005 B178 NA NA
      # 8 1008 C761 NA NA
      # 11 1011 J679 A045 D352





      share|improve this answer

























        Your Answer






        StackExchange.ifUsing("editor", function ()
        StackExchange.using("externalEditor", function ()
        StackExchange.using("snippets", function ()
        StackExchange.snippets.init();
        );
        );
        , "code-snippets");

        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "1"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader:
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        ,
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );






        M_Oxford is a new contributor. Be nice, and check out our Code of Conduct.









        draft saved

        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55417645%2fhow-to-subset-dataframe-based-on-a-not-equal-to-criteria-applied-to-a-large-nu%23new-answer', 'question_page');

        );

        Post as a guest















        Required, but never shown

























        6 Answers
        6






        active

        oldest

        votes








        6 Answers
        6






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        3














        One dplyr possibility could be:



        df %>%
        filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
        "G309", "G308","G301","G300","G30", "F01","F018","F013",
        "F012", "F011", "F010","F01")))

        ID disease_code_1 disease_code_2 disease_code_3
        1 1001 I802 A071 H250
        2 1002 H356 NA NA
        3 1004 D235 NA I802
        4 1005 B178 NA NA
        5 1008 C761 NA NA
        6 1011 J679 A045 D352


        In this case, it checks whether any of the columns 2:4 contains any of the given codes.



        Or:



        df %>%
        filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
        "G309", "G308","G301","G300","G30", "F01","F018","F013",
        "F012", "F011", "F010","F01")))


        In this case, it checks whether any of the columns with names disease_code contains any of the given codes.






        share|improve this answer




















        • 1





          Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!

          – M_Oxford
          yesterday















        3














        One dplyr possibility could be:



        df %>%
        filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
        "G309", "G308","G301","G300","G30", "F01","F018","F013",
        "F012", "F011", "F010","F01")))

        ID disease_code_1 disease_code_2 disease_code_3
        1 1001 I802 A071 H250
        2 1002 H356 NA NA
        3 1004 D235 NA I802
        4 1005 B178 NA NA
        5 1008 C761 NA NA
        6 1011 J679 A045 D352


        In this case, it checks whether any of the columns 2:4 contains any of the given codes.



        Or:



        df %>%
        filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
        "G309", "G308","G301","G300","G30", "F01","F018","F013",
        "F012", "F011", "F010","F01")))


        In this case, it checks whether any of the columns with names disease_code contains any of the given codes.






        share|improve this answer




















        • 1





          Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!

          – M_Oxford
          yesterday













        3












        3








        3







        One dplyr possibility could be:



        df %>%
        filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
        "G309", "G308","G301","G300","G30", "F01","F018","F013",
        "F012", "F011", "F010","F01")))

        ID disease_code_1 disease_code_2 disease_code_3
        1 1001 I802 A071 H250
        2 1002 H356 NA NA
        3 1004 D235 NA I802
        4 1005 B178 NA NA
        5 1008 C761 NA NA
        6 1011 J679 A045 D352


        In this case, it checks whether any of the columns 2:4 contains any of the given codes.



        Or:



        df %>%
        filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
        "G309", "G308","G301","G300","G30", "F01","F018","F013",
        "F012", "F011", "F010","F01")))


        In this case, it checks whether any of the columns with names disease_code contains any of the given codes.






        share|improve this answer















        One dplyr possibility could be:



        df %>%
        filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
        "G309", "G308","G301","G300","G30", "F01","F018","F013",
        "F012", "F011", "F010","F01")))

        ID disease_code_1 disease_code_2 disease_code_3
        1 1001 I802 A071 H250
        2 1002 H356 NA NA
        3 1004 D235 NA I802
        4 1005 B178 NA NA
        5 1008 C761 NA NA
        6 1011 J679 A045 D352


        In this case, it checks whether any of the columns 2:4 contains any of the given codes.



        Or:



        df %>%
        filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
        "G309", "G308","G301","G300","G30", "F01","F018","F013",
        "F012", "F011", "F010","F01")))


        In this case, it checks whether any of the columns with names disease_code contains any of the given codes.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited yesterday

























        answered yesterday









        tmfmnktmfmnk

        3,5841516




        3,5841516







        • 1





          Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!

          – M_Oxford
          yesterday












        • 1





          Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!

          – M_Oxford
          yesterday







        1




        1





        Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!

        – M_Oxford
        yesterday





        Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!

        – M_Oxford
        yesterday













        4














        We can create a vector with the codes to be removed and use rowSums to remove, i.e.



        codes_to_remove <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
        "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

        df[rowSums(sapply(df[-1], `%in%`, codes_to_remove)) == 0,]


        which gives,




         ID disease_code_1 disease_code_2 disease_code_3
        1 1001 I802 A071 H250
        2 1002 H356 NA NA
        4 1004 D235 NA I802
        5 1005 B178 NA NA
        8 1008 C761 NA NA
        11 1011 J679 A045 D352






        share|improve this answer



























          4














          We can create a vector with the codes to be removed and use rowSums to remove, i.e.



          codes_to_remove <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
          "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

          df[rowSums(sapply(df[-1], `%in%`, codes_to_remove)) == 0,]


          which gives,




           ID disease_code_1 disease_code_2 disease_code_3
          1 1001 I802 A071 H250
          2 1002 H356 NA NA
          4 1004 D235 NA I802
          5 1005 B178 NA NA
          8 1008 C761 NA NA
          11 1011 J679 A045 D352






          share|improve this answer

























            4












            4








            4







            We can create a vector with the codes to be removed and use rowSums to remove, i.e.



            codes_to_remove <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
            "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

            df[rowSums(sapply(df[-1], `%in%`, codes_to_remove)) == 0,]


            which gives,




             ID disease_code_1 disease_code_2 disease_code_3
            1 1001 I802 A071 H250
            2 1002 H356 NA NA
            4 1004 D235 NA I802
            5 1005 B178 NA NA
            8 1008 C761 NA NA
            11 1011 J679 A045 D352






            share|improve this answer













            We can create a vector with the codes to be removed and use rowSums to remove, i.e.



            codes_to_remove <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
            "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

            df[rowSums(sapply(df[-1], `%in%`, codes_to_remove)) == 0,]


            which gives,




             ID disease_code_1 disease_code_2 disease_code_3
            1 1001 I802 A071 H250
            2 1002 H356 NA NA
            4 1004 D235 NA I802
            5 1005 B178 NA NA
            8 1008 C761 NA NA
            11 1011 J679 A045 D352







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered yesterday









            SotosSotos

            31.1k51741




            31.1k51741





















                3














                As mentioned in comments by @docendo discimus we can convert the dataframe to long format using gather, group_by ID and select only those IDs which do not have dementia_code in them and then spread them back to wide format.



                library(tidyverse)

                df %>%
                gather(key, value, -ID) %>%
                group_by(ID) %>%
                filter(!any(value %in% dementia_code)) %>%
                spread(key, value)

                # ID disease_code_1 disease_code_2 disease_code_3
                # <dbl> <chr> <chr> <chr>
                #1 1001 I802 A071 H250
                #2 1002 H356 NA NA
                #3 1004 D235 NA I802
                #4 1005 B178 NA NA
                #5 1008 C761 NA NA
                #6 1011 J679 A045 D352


                data



                dementia_code <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", 
                "G308","G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")





                share|improve this answer

























                • Why load all of tidyverse? Isn't this just tidyr and dplyr?

                  – Dunois
                  yesterday






                • 1





                  @Dunois yes, it is. I have a habit of loading it all up by default :P

                  – Ronak Shah
                  yesterday







                • 3





                  We could also do it using an anti_join such as Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")

                  – Kerry Jackson
                  yesterday















                3














                As mentioned in comments by @docendo discimus we can convert the dataframe to long format using gather, group_by ID and select only those IDs which do not have dementia_code in them and then spread them back to wide format.



                library(tidyverse)

                df %>%
                gather(key, value, -ID) %>%
                group_by(ID) %>%
                filter(!any(value %in% dementia_code)) %>%
                spread(key, value)

                # ID disease_code_1 disease_code_2 disease_code_3
                # <dbl> <chr> <chr> <chr>
                #1 1001 I802 A071 H250
                #2 1002 H356 NA NA
                #3 1004 D235 NA I802
                #4 1005 B178 NA NA
                #5 1008 C761 NA NA
                #6 1011 J679 A045 D352


                data



                dementia_code <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", 
                "G308","G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")





                share|improve this answer

























                • Why load all of tidyverse? Isn't this just tidyr and dplyr?

                  – Dunois
                  yesterday






                • 1





                  @Dunois yes, it is. I have a habit of loading it all up by default :P

                  – Ronak Shah
                  yesterday







                • 3





                  We could also do it using an anti_join such as Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")

                  – Kerry Jackson
                  yesterday













                3












                3








                3







                As mentioned in comments by @docendo discimus we can convert the dataframe to long format using gather, group_by ID and select only those IDs which do not have dementia_code in them and then spread them back to wide format.



                library(tidyverse)

                df %>%
                gather(key, value, -ID) %>%
                group_by(ID) %>%
                filter(!any(value %in% dementia_code)) %>%
                spread(key, value)

                # ID disease_code_1 disease_code_2 disease_code_3
                # <dbl> <chr> <chr> <chr>
                #1 1001 I802 A071 H250
                #2 1002 H356 NA NA
                #3 1004 D235 NA I802
                #4 1005 B178 NA NA
                #5 1008 C761 NA NA
                #6 1011 J679 A045 D352


                data



                dementia_code <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", 
                "G308","G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")





                share|improve this answer















                As mentioned in comments by @docendo discimus we can convert the dataframe to long format using gather, group_by ID and select only those IDs which do not have dementia_code in them and then spread them back to wide format.



                library(tidyverse)

                df %>%
                gather(key, value, -ID) %>%
                group_by(ID) %>%
                filter(!any(value %in% dementia_code)) %>%
                spread(key, value)

                # ID disease_code_1 disease_code_2 disease_code_3
                # <dbl> <chr> <chr> <chr>
                #1 1001 I802 A071 H250
                #2 1002 H356 NA NA
                #3 1004 D235 NA I802
                #4 1005 B178 NA NA
                #5 1008 C761 NA NA
                #6 1011 J679 A045 D352


                data



                dementia_code <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", 
                "G308","G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited yesterday

























                answered yesterday









                Ronak ShahRonak Shah

                43.9k104266




                43.9k104266












                • Why load all of tidyverse? Isn't this just tidyr and dplyr?

                  – Dunois
                  yesterday






                • 1





                  @Dunois yes, it is. I have a habit of loading it all up by default :P

                  – Ronak Shah
                  yesterday







                • 3





                  We could also do it using an anti_join such as Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")

                  – Kerry Jackson
                  yesterday

















                • Why load all of tidyverse? Isn't this just tidyr and dplyr?

                  – Dunois
                  yesterday






                • 1





                  @Dunois yes, it is. I have a habit of loading it all up by default :P

                  – Ronak Shah
                  yesterday







                • 3





                  We could also do it using an anti_join such as Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")

                  – Kerry Jackson
                  yesterday
















                Why load all of tidyverse? Isn't this just tidyr and dplyr?

                – Dunois
                yesterday





                Why load all of tidyverse? Isn't this just tidyr and dplyr?

                – Dunois
                yesterday




                1




                1





                @Dunois yes, it is. I have a habit of loading it all up by default :P

                – Ronak Shah
                yesterday






                @Dunois yes, it is. I have a habit of loading it all up by default :P

                – Ronak Shah
                yesterday





                3




                3





                We could also do it using an anti_join such as Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")

                – Kerry Jackson
                yesterday





                We could also do it using an anti_join such as Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")

                – Kerry Jackson
                yesterday











                3














                How about this:



                > dementia <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
                + "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
                >
                > dementia <- apply(sapply(df[, -1], function(x) x %in% dementia), 1, any)
                >
                > df[!dementia,]
                ID disease_code_1 disease_code_2 disease_code_3
                1 1001 I802 A071 H250
                2 1002 H356 NA NA
                4 1004 D235 NA I802
                5 1005 B178 NA NA
                8 1008 C761 NA NA
                11 1011 J679 A045 D352
                >


                Edit:



                An even more elegant solution, thanks to @ Ronan Shah:



                > df[apply(df[-1], 1, function(x) !any(x %in% dementia)),]
                ID disease_code_1 disease_code_2 disease_code_3
                1 1001 I802 A071 H250
                2 1002 H356 NA NA
                4 1004 D235 NA I802
                5 1005 B178 NA NA
                8 1008 C761 NA NA
                11 1011 J679 A045 D352


                Hope it helps.






                share|improve this answer

























                • @ Ronan Shah Nice! Its a more elegant solution. You should post it.

                  – Santiago Capobianco
                  yesterday







                • 1





                  Yes! Sorry, I will change it right away.

                  – Santiago Capobianco
                  yesterday















                3














                How about this:



                > dementia <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
                + "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
                >
                > dementia <- apply(sapply(df[, -1], function(x) x %in% dementia), 1, any)
                >
                > df[!dementia,]
                ID disease_code_1 disease_code_2 disease_code_3
                1 1001 I802 A071 H250
                2 1002 H356 NA NA
                4 1004 D235 NA I802
                5 1005 B178 NA NA
                8 1008 C761 NA NA
                11 1011 J679 A045 D352
                >


                Edit:



                An even more elegant solution, thanks to @ Ronan Shah:



                > df[apply(df[-1], 1, function(x) !any(x %in% dementia)),]
                ID disease_code_1 disease_code_2 disease_code_3
                1 1001 I802 A071 H250
                2 1002 H356 NA NA
                4 1004 D235 NA I802
                5 1005 B178 NA NA
                8 1008 C761 NA NA
                11 1011 J679 A045 D352


                Hope it helps.






                share|improve this answer

























                • @ Ronan Shah Nice! Its a more elegant solution. You should post it.

                  – Santiago Capobianco
                  yesterday







                • 1





                  Yes! Sorry, I will change it right away.

                  – Santiago Capobianco
                  yesterday













                3












                3








                3







                How about this:



                > dementia <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
                + "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
                >
                > dementia <- apply(sapply(df[, -1], function(x) x %in% dementia), 1, any)
                >
                > df[!dementia,]
                ID disease_code_1 disease_code_2 disease_code_3
                1 1001 I802 A071 H250
                2 1002 H356 NA NA
                4 1004 D235 NA I802
                5 1005 B178 NA NA
                8 1008 C761 NA NA
                11 1011 J679 A045 D352
                >


                Edit:



                An even more elegant solution, thanks to @ Ronan Shah:



                > df[apply(df[-1], 1, function(x) !any(x %in% dementia)),]
                ID disease_code_1 disease_code_2 disease_code_3
                1 1001 I802 A071 H250
                2 1002 H356 NA NA
                4 1004 D235 NA I802
                5 1005 B178 NA NA
                8 1008 C761 NA NA
                11 1011 J679 A045 D352


                Hope it helps.






                share|improve this answer















                How about this:



                > dementia <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
                + "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
                >
                > dementia <- apply(sapply(df[, -1], function(x) x %in% dementia), 1, any)
                >
                > df[!dementia,]
                ID disease_code_1 disease_code_2 disease_code_3
                1 1001 I802 A071 H250
                2 1002 H356 NA NA
                4 1004 D235 NA I802
                5 1005 B178 NA NA
                8 1008 C761 NA NA
                11 1011 J679 A045 D352
                >


                Edit:



                An even more elegant solution, thanks to @ Ronan Shah:



                > df[apply(df[-1], 1, function(x) !any(x %in% dementia)),]
                ID disease_code_1 disease_code_2 disease_code_3
                1 1001 I802 A071 H250
                2 1002 H356 NA NA
                4 1004 D235 NA I802
                5 1005 B178 NA NA
                8 1008 C761 NA NA
                11 1011 J679 A045 D352


                Hope it helps.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited yesterday

























                answered yesterday









                Santiago CapobiancoSantiago Capobianco

                491310




                491310












                • @ Ronan Shah Nice! Its a more elegant solution. You should post it.

                  – Santiago Capobianco
                  yesterday







                • 1





                  Yes! Sorry, I will change it right away.

                  – Santiago Capobianco
                  yesterday

















                • @ Ronan Shah Nice! Its a more elegant solution. You should post it.

                  – Santiago Capobianco
                  yesterday







                • 1





                  Yes! Sorry, I will change it right away.

                  – Santiago Capobianco
                  yesterday
















                @ Ronan Shah Nice! Its a more elegant solution. You should post it.

                – Santiago Capobianco
                yesterday






                @ Ronan Shah Nice! Its a more elegant solution. You should post it.

                – Santiago Capobianco
                yesterday





                1




                1





                Yes! Sorry, I will change it right away.

                – Santiago Capobianco
                yesterday





                Yes! Sorry, I will change it right away.

                – Santiago Capobianco
                yesterday











                3














                We can use melt/dcast from data.table



                library(data.table)
                dcast(melt(setDT(df), id.var = 'ID')[,
                if(!any(value %in% dementia_codes)) .SD, .(ID)], ID ~ variable)
                # ID disease_code_1 disease_code_2 disease_code_3
                #1: 1001 I802 A071 H250
                #2: 1002 H356 NA NA
                #3: 1004 D235 NA I802
                #4: 1005 B178 NA NA
                #5: 1008 C761 NA NA
                #6: 1011 J679 A045 D352



                Or this can be done more compactly in base R with no reshaping



                df[!Reduce(`|`, lapply(df[-1], `%in%` , dementia_codes)),]
                # ID disease_code_1 disease_code_2 disease_code_3
                #1 1001 I802 A071 H250
                #2 1002 H356 NA NA
                #4 1004 D235 NA I802
                #5 1005 B178 NA NA
                #8 1008 C761 NA NA
                #11 1011 J679 A045 D352


                data



                dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", 
                "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013",
                "F012", "F011", "F010", "F01")





                share|improve this answer





























                  3














                  We can use melt/dcast from data.table



                  library(data.table)
                  dcast(melt(setDT(df), id.var = 'ID')[,
                  if(!any(value %in% dementia_codes)) .SD, .(ID)], ID ~ variable)
                  # ID disease_code_1 disease_code_2 disease_code_3
                  #1: 1001 I802 A071 H250
                  #2: 1002 H356 NA NA
                  #3: 1004 D235 NA I802
                  #4: 1005 B178 NA NA
                  #5: 1008 C761 NA NA
                  #6: 1011 J679 A045 D352



                  Or this can be done more compactly in base R with no reshaping



                  df[!Reduce(`|`, lapply(df[-1], `%in%` , dementia_codes)),]
                  # ID disease_code_1 disease_code_2 disease_code_3
                  #1 1001 I802 A071 H250
                  #2 1002 H356 NA NA
                  #4 1004 D235 NA I802
                  #5 1005 B178 NA NA
                  #8 1008 C761 NA NA
                  #11 1011 J679 A045 D352


                  data



                  dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", 
                  "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013",
                  "F012", "F011", "F010", "F01")





                  share|improve this answer



























                    3












                    3








                    3







                    We can use melt/dcast from data.table



                    library(data.table)
                    dcast(melt(setDT(df), id.var = 'ID')[,
                    if(!any(value %in% dementia_codes)) .SD, .(ID)], ID ~ variable)
                    # ID disease_code_1 disease_code_2 disease_code_3
                    #1: 1001 I802 A071 H250
                    #2: 1002 H356 NA NA
                    #3: 1004 D235 NA I802
                    #4: 1005 B178 NA NA
                    #5: 1008 C761 NA NA
                    #6: 1011 J679 A045 D352



                    Or this can be done more compactly in base R with no reshaping



                    df[!Reduce(`|`, lapply(df[-1], `%in%` , dementia_codes)),]
                    # ID disease_code_1 disease_code_2 disease_code_3
                    #1 1001 I802 A071 H250
                    #2 1002 H356 NA NA
                    #4 1004 D235 NA I802
                    #5 1005 B178 NA NA
                    #8 1008 C761 NA NA
                    #11 1011 J679 A045 D352


                    data



                    dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", 
                    "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013",
                    "F012", "F011", "F010", "F01")





                    share|improve this answer















                    We can use melt/dcast from data.table



                    library(data.table)
                    dcast(melt(setDT(df), id.var = 'ID')[,
                    if(!any(value %in% dementia_codes)) .SD, .(ID)], ID ~ variable)
                    # ID disease_code_1 disease_code_2 disease_code_3
                    #1: 1001 I802 A071 H250
                    #2: 1002 H356 NA NA
                    #3: 1004 D235 NA I802
                    #4: 1005 B178 NA NA
                    #5: 1008 C761 NA NA
                    #6: 1011 J679 A045 D352



                    Or this can be done more compactly in base R with no reshaping



                    df[!Reduce(`|`, lapply(df[-1], `%in%` , dementia_codes)),]
                    # ID disease_code_1 disease_code_2 disease_code_3
                    #1 1001 I802 A071 H250
                    #2 1002 H356 NA NA
                    #4 1004 D235 NA I802
                    #5 1005 B178 NA NA
                    #8 1008 C761 NA NA
                    #11 1011 J679 A045 D352


                    data



                    dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", 
                    "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013",
                    "F012", "F011", "F010", "F01")






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited yesterday

























                    answered yesterday









                    akrunakrun

                    418k13206281




                    418k13206281





















                        2














                        A for loop version with base R, in case you prefer that.



                        df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
                        disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
                        disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
                        disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'), stringsAsFactors = FALSE)

                        dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

                        new_df <- df[0,]

                        for(i in 1:nrow(df))
                        currRow <- df[i,]
                        if(any(dementia_codes %in% as.character(currRow)) == FALSE)
                        new_df <- rbind(new_df, currRow)



                        new_df
                        # ID disease_code_1 disease_code_2 disease_code_3
                        # 1 1001 I802 A071 H250
                        # 2 1002 H356 NA NA
                        # 4 1004 D235 NA I802
                        # 5 1005 B178 NA NA
                        # 8 1008 C761 NA NA
                        # 11 1011 J679 A045 D352





                        share|improve this answer





























                          2














                          A for loop version with base R, in case you prefer that.



                          df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
                          disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
                          disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
                          disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'), stringsAsFactors = FALSE)

                          dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

                          new_df <- df[0,]

                          for(i in 1:nrow(df))
                          currRow <- df[i,]
                          if(any(dementia_codes %in% as.character(currRow)) == FALSE)
                          new_df <- rbind(new_df, currRow)



                          new_df
                          # ID disease_code_1 disease_code_2 disease_code_3
                          # 1 1001 I802 A071 H250
                          # 2 1002 H356 NA NA
                          # 4 1004 D235 NA I802
                          # 5 1005 B178 NA NA
                          # 8 1008 C761 NA NA
                          # 11 1011 J679 A045 D352





                          share|improve this answer



























                            2












                            2








                            2







                            A for loop version with base R, in case you prefer that.



                            df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
                            disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
                            disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
                            disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'), stringsAsFactors = FALSE)

                            dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

                            new_df <- df[0,]

                            for(i in 1:nrow(df))
                            currRow <- df[i,]
                            if(any(dementia_codes %in% as.character(currRow)) == FALSE)
                            new_df <- rbind(new_df, currRow)



                            new_df
                            # ID disease_code_1 disease_code_2 disease_code_3
                            # 1 1001 I802 A071 H250
                            # 2 1002 H356 NA NA
                            # 4 1004 D235 NA I802
                            # 5 1005 B178 NA NA
                            # 8 1008 C761 NA NA
                            # 11 1011 J679 A045 D352





                            share|improve this answer















                            A for loop version with base R, in case you prefer that.



                            df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
                            disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
                            disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
                            disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'), stringsAsFactors = FALSE)

                            dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

                            new_df <- df[0,]

                            for(i in 1:nrow(df))
                            currRow <- df[i,]
                            if(any(dementia_codes %in% as.character(currRow)) == FALSE)
                            new_df <- rbind(new_df, currRow)



                            new_df
                            # ID disease_code_1 disease_code_2 disease_code_3
                            # 1 1001 I802 A071 H250
                            # 2 1002 H356 NA NA
                            # 4 1004 D235 NA I802
                            # 5 1005 B178 NA NA
                            # 8 1008 C761 NA NA
                            # 11 1011 J679 A045 D352






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited yesterday

























                            answered yesterday









                            DunoisDunois

                            858




                            858




















                                M_Oxford is a new contributor. Be nice, and check out our Code of Conduct.









                                draft saved

                                draft discarded


















                                M_Oxford is a new contributor. Be nice, and check out our Code of Conduct.












                                M_Oxford is a new contributor. Be nice, and check out our Code of Conduct.











                                M_Oxford is a new contributor. Be nice, and check out our Code of Conduct.














                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid


                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.

                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55417645%2fhow-to-subset-dataframe-based-on-a-not-equal-to-criteria-applied-to-a-large-nu%23new-answer', 'question_page');

                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

                                Club Baloncesto Breogán Índice Historia | Pavillón | Nome | O Breogán na cultura popular | Xogadores | Adestradores | Presidentes | Palmarés | Historial | Líderes | Notas | Véxase tamén | Menú de navegacióncbbreogan.galCadroGuía oficial da ACB 2009-10, páxina 201Guía oficial ACB 1992, páxina 183. Editorial DB.É de 6.500 espectadores sentados axeitándose á última normativa"Estudiantes Junior, entre as mellores canteiras"o orixinalHemeroteca El Mundo Deportivo, 16 setembro de 1970, páxina 12Historia do BreogánAlfredo Pérez, o último canoneiroHistoria C.B. BreogánHemeroteca de El Mundo DeportivoJimmy Wright, norteamericano do Breogán deixará Lugo por ameazas de morteResultados de Breogán en 1986-87Resultados de Breogán en 1990-91Ficha de Velimir Perasović en acb.comResultados de Breogán en 1994-95Breogán arrasa al Barça. "El Mundo Deportivo", 27 de setembro de 1999, páxina 58CB Breogán - FC BarcelonaA FEB invita a participar nunha nova Liga EuropeaCharlie Bell na prensa estatalMáximos anotadores 2005Tempada 2005-06 : Tódolos Xogadores da Xornada""Non quero pensar nunha man negra, mais pregúntome que está a pasar""o orixinalRaúl López, orgulloso dos xogadores, presume da boa saúde económica do BreogánJulio González confirma que cesa como presidente del BreogánHomenaxe a Lisardo GómezA tempada do rexurdimento celesteEntrevista a Lisardo GómezEl COB dinamita el Pazo para forzar el quinto (69-73)Cafés Candelas, patrocinador del CB Breogán"Suso Lázare, novo presidente do Breogán"o orixinalCafés Candelas Breogán firma el mayor triunfo de la historiaEl Breogán realizará 17 homenajes por su cincuenta aniversario"O Breogán honra ao seu fundador e primeiro presidente"o orixinalMiguel Giao recibiu a homenaxe do PazoHomenaxe aos primeiros gladiadores celestesO home que nos amosa como ver o Breo co corazónTita Franco será homenaxeada polos #50anosdeBreoJulio Vila recibirá unha homenaxe in memoriam polos #50anosdeBreo"O Breogán homenaxeará aos seus aboados máis veteráns"Pechada ovación a «Capi» Sanmartín e Ricardo «Corazón de González»Homenaxe por décadas de informaciónPaco García volve ao Pazo con motivo do 50 aniversario"Resultados y clasificaciones""O Cafés Candelas Breogán, campión da Copa Princesa""O Cafés Candelas Breogán, equipo ACB"C.B. Breogán"Proxecto social"o orixinal"Centros asociados"o orixinalFicha en imdb.comMario Camus trata la recuperación del amor en 'La vieja música', su última película"Páxina web oficial""Club Baloncesto Breogán""C. B. Breogán S.A.D."eehttp://www.fegaba.com

                                Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070