将多个值定义为数据框中缺失
Define multiple values as missing in a data frame
如何在 R 的数据框中定义多个值缺失?
考虑一个数据框,其中两个值“888”和“999”代表缺失数据:
df <- data.frame(age=c(50,30,27,888),insomnia=c("yes","no","no",999))
df[df==888] <- NA
df[df==999] <- NA
此解决方案为每个表示缺失数据的值使用一行代码。对于表示缺失数据的值数量很多的情况,您是否有更简单的解决方案?
这应该有效
> rm(list = ls())
> df1 <- df2 <-
+ data.frame(age=c(50,30,27,888),insomnia=c("yes","no","no",999))
> df1[df1==888] <- NA
> df1[df1==999] <- NA
>
> df2[sapply(df2, "%in%", table = c(888, 999))] <- NA
> all.equal(df1, df2)
[1] TRUE
您可以使用上面的代码分配缺失值标识符的对象,然后将其作为 table
参数传递。
以下是三种解决方案:
# 1. Data set
df <- data.frame(
age = c(50, 30, 27, 888),
insomnia = c("yes", "no", "no", 999))
# 2. Solution based on "one line of code per missing data value"
df[df == 888] <- NA
df[df == 999] <- NA
is.na(df)
# 3. Solution based on "applying function to each column of data set"
df[sapply(df, function(x) as.character(x) %in% c("888", "999") )] <- NA
is.na(df)
# 4. Solution based on "dplyr"
# 4.1. Load package
library(dplyr)
# 4.2. Define function for missing values
is_na <- function(x){
return(as.character(x) %in% c("888", "999"))
}
# 4.3. Apply function to each column
df %>% lapply(is_na)
如何在 R 的数据框中定义多个值缺失?
考虑一个数据框,其中两个值“888”和“999”代表缺失数据:
df <- data.frame(age=c(50,30,27,888),insomnia=c("yes","no","no",999))
df[df==888] <- NA
df[df==999] <- NA
此解决方案为每个表示缺失数据的值使用一行代码。对于表示缺失数据的值数量很多的情况,您是否有更简单的解决方案?
这应该有效
> rm(list = ls())
> df1 <- df2 <-
+ data.frame(age=c(50,30,27,888),insomnia=c("yes","no","no",999))
> df1[df1==888] <- NA
> df1[df1==999] <- NA
>
> df2[sapply(df2, "%in%", table = c(888, 999))] <- NA
> all.equal(df1, df2)
[1] TRUE
您可以使用上面的代码分配缺失值标识符的对象,然后将其作为 table
参数传递。
以下是三种解决方案:
# 1. Data set
df <- data.frame(
age = c(50, 30, 27, 888),
insomnia = c("yes", "no", "no", 999))
# 2. Solution based on "one line of code per missing data value"
df[df == 888] <- NA
df[df == 999] <- NA
is.na(df)
# 3. Solution based on "applying function to each column of data set"
df[sapply(df, function(x) as.character(x) %in% c("888", "999") )] <- NA
is.na(df)
# 4. Solution based on "dplyr"
# 4.1. Load package
library(dplyr)
# 4.2. Define function for missing values
is_na <- function(x){
return(as.character(x) %in% c("888", "999"))
}
# 4.3. Apply function to each column
df %>% lapply(is_na)