如何删除 R 中具有 NULL 值的行
How to remove rows that have NULL values in R
下面是示例数据和一种操作。在更大的图片中,我正在读取一堆按年份描述的 excel 文件,然后只取 select 列(1000 列中的 14 列)并将它们放入新的数据框(df1,df2 for例子)。从那里,我将这些新数据组合成一个最终数据框。我的问题是如何删除最终数据框中填充有空值的行。我可以过滤但希望简单地在 R 中删除它们并完成它们。
testyear <-c(2010,2010,2010,2010,2011,2011,2011,2010)
teststate<-c("CA", "Co", "NV", "NE", "CA", "CO","NV","NE")
totalhousehold<-c(251,252,253,"NULL",301,302,303,"NULL")
marriedhousehold <-c(85,86,87,"NULL",158,159,245,"NULL")
test1<-data.frame(testyear,teststate,totalhousehold,marriedhousehold)
testyear<-c(2012,2012,2012,2012)
teststate<-c("WA","OR","WY","UT")
totalhousehold<-c(654,650,646,641)
marriedhousehold<-c(400,399,398,395)
test2<-data.frame(testyear,teststate,totalhousehold,marriedhousehold)
test3<-rbind(test1,test2)
因为这些是 character
列,我们可以 filter
across
只有 character
列到 return 没有 "NULL"
元素的行并使用 type.convert
更改列的 type
library(dplyr)
test4 <- test3 %>%
filter(across(where(is.character), ~ . != "NULL")) %>%
type.convert(as.is = TRUE)
-输出
> test4
testyear teststate totalhousehold marriedhousehold
1 2010 CA 251 85
2 2010 Co 252 86
3 2010 NV 253 87
4 2011 CA 301 158
5 2011 CO 302 159
6 2011 NV 303 245
7 2012 WA 654 400
8 2012 OR 650 399
9 2012 WY 646 398
10 2012 UT 641 395
> str(test4)
'data.frame': 10 obs. of 4 variables:
$ testyear : int 2010 2010 2010 2011 2011 2011 2012 2012 2012 2012
$ teststate : chr "CA" "Co" "NV" "CA" ...
$ totalhousehold : int 251 252 253 301 302 303 654 650 646 641
$ marriedhousehold: int 85 86 87 158 159 245 400 399 398 395
或在base R
中,使用subset
和rowSums
创建逻辑表达式
type.convert(subset(test3, !rowSums(test3 == "NULL")), as.is = TRUE)
为什么 dplyr
什么时候可以做到简单?
test3[test3 == "NULL"] = NA
test3 <- na.omit(test3)
下面是示例数据和一种操作。在更大的图片中,我正在读取一堆按年份描述的 excel 文件,然后只取 select 列(1000 列中的 14 列)并将它们放入新的数据框(df1,df2 for例子)。从那里,我将这些新数据组合成一个最终数据框。我的问题是如何删除最终数据框中填充有空值的行。我可以过滤但希望简单地在 R 中删除它们并完成它们。
testyear <-c(2010,2010,2010,2010,2011,2011,2011,2010)
teststate<-c("CA", "Co", "NV", "NE", "CA", "CO","NV","NE")
totalhousehold<-c(251,252,253,"NULL",301,302,303,"NULL")
marriedhousehold <-c(85,86,87,"NULL",158,159,245,"NULL")
test1<-data.frame(testyear,teststate,totalhousehold,marriedhousehold)
testyear<-c(2012,2012,2012,2012)
teststate<-c("WA","OR","WY","UT")
totalhousehold<-c(654,650,646,641)
marriedhousehold<-c(400,399,398,395)
test2<-data.frame(testyear,teststate,totalhousehold,marriedhousehold)
test3<-rbind(test1,test2)
因为这些是 character
列,我们可以 filter
across
只有 character
列到 return 没有 "NULL"
元素的行并使用 type.convert
type
library(dplyr)
test4 <- test3 %>%
filter(across(where(is.character), ~ . != "NULL")) %>%
type.convert(as.is = TRUE)
-输出
> test4
testyear teststate totalhousehold marriedhousehold
1 2010 CA 251 85
2 2010 Co 252 86
3 2010 NV 253 87
4 2011 CA 301 158
5 2011 CO 302 159
6 2011 NV 303 245
7 2012 WA 654 400
8 2012 OR 650 399
9 2012 WY 646 398
10 2012 UT 641 395
> str(test4)
'data.frame': 10 obs. of 4 variables:
$ testyear : int 2010 2010 2010 2011 2011 2011 2012 2012 2012 2012
$ teststate : chr "CA" "Co" "NV" "CA" ...
$ totalhousehold : int 251 252 253 301 302 303 654 650 646 641
$ marriedhousehold: int 85 86 87 158 159 245 400 399 398 395
或在base R
中,使用subset
和rowSums
创建逻辑表达式
type.convert(subset(test3, !rowSums(test3 == "NULL")), as.is = TRUE)
为什么 dplyr
什么时候可以做到简单?
test3[test3 == "NULL"] = NA
test3 <- na.omit(test3)