Tuesday, July 17, 2018

r - Remove rows with all or some NAs (missing values) in data.frame



I'd like to remove the lines in this data frame that:



a) contain NAs across all columns. Below is my example data frame.



             gene hsap mmul mmus rnor cfam
1 ENSG00000208234 0 NA NA NA NA

2 ENSG00000199674 0 2 2 2 2
3 ENSG00000221622 0 NA NA NA NA
4 ENSG00000207604 0 NA NA 1 2
5 ENSG00000207431 0 NA NA NA NA
6 ENSG00000221312 0 1 2 3 2


Basically, I'd like to get a data frame such as the following.



             gene hsap mmul mmus rnor cfam

2 ENSG00000199674 0 2 2 2 2
6 ENSG00000221312 0 1 2 3 2


b) contain NAs in only some columns, so I can also get this result:



             gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2


Answer



Also check complete.cases :



> final[complete.cases(final), ]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
6 ENSG00000221312 0 1 2 3 2



na.omit is nicer for just removing all NA's. complete.cases allows partial selection by including only certain columns of the dataframe:



> final[complete.cases(final[ , 5:6]),]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2


Your solution can't work. If you insist on using is.na, then you have to do something like:




> final[rowSums(is.na(final[ , 5:6])) == 0, ]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2


but using complete.cases is quite a lot more clear, and faster.


No comments:

Post a Comment

plot explanation - Why did Peaches' mom hang on the tree? - Movies & TV

In the middle of the movie Ice Age: Continental Drift Peaches' mom asked Peaches to go to sleep. Then, she hung on the tree. This parti...