基于多列在数据框的最后一列中创建值
Create Value in final column of dataframe based on multiple columns
我有一个看起来像这样的数据框(但还有更多 variables/columns)
set.seed(5)
id<-seq(5)*floor(runif(5,min=1000, max=10000))
vals1<-c("Y","N","N","N","N")
vals2<-c("N","N","N","N","N")
vals3<-c("N","N","N","Y","N")
df<-data.frame(id,vals1,vals2,vals3)
我想在框架中创建最后一列,以便它生成具有以下逻辑的最终标志:如果任何 id 的任何值为 'Y',则最终标志为 'Y',否则它将是 'N'。因此,对于此数据框,第一个和第四个 ID (2801, 14236) 在最后一列中有一个 'Y',其余的在最后一列中有一个 'n'。我尝试了一些方法,例如 apply 和 if...else 都无济于事。
通过将 "N" 分配给每一行进行初始化。在下一步中,对于具有 "Y" 的行(使用 apply
进行检查),分配 "Y"
df$final = "N"
df$final[apply(df, 1, function(a) "Y" %in% a)] = "Y"
下面的字母编码解决方案。
set.seed(5)
id <- seq(5) * floor(runif(5, min=1000, max=10000))
vals1 <- c("Y","N","N","N","N")
vals2 <- c("N","N","N","N","N")
vals3 <- c("N","N","N","Y","N")
df <- data.frame(id, vals1, vals2, vals3)
# If you really want to use the letter encoding, my solution works as below
df$Final <- apply(df[,2:4], MARGIN = 1, FUN = function(x) {any(x == 'Y')})
但是,我认为您应该为此使用布尔值 (TRUE/FALSE)。
与 apply
和 any
结合使用效果很好
set.seed(5)
id <- seq(5) * floor(runif(5, min=1000, max=10000))
vals1 <- c("Y","N","N","N","N")
vals2 <- c("N","N","N","N","N")
vals3 <- c("N","N","N","Y","N")
df <- data.frame(id, vals1, vals2, vals3)
# Convert your labels into booleans:
df[,2:4] <- df[,2:4] == 'Y'
# Then summarise across rows
df$Final <- apply(df[,2:4], MARGIN = 1, FUN = function(x) {any(x)})
有点类似于@d.b的回答:
df$final <- apply(df, 1, function(x) c("N","Y")[any(x == "Y")+1])
我有一个看起来像这样的数据框(但还有更多 variables/columns)
set.seed(5)
id<-seq(5)*floor(runif(5,min=1000, max=10000))
vals1<-c("Y","N","N","N","N")
vals2<-c("N","N","N","N","N")
vals3<-c("N","N","N","Y","N")
df<-data.frame(id,vals1,vals2,vals3)
我想在框架中创建最后一列,以便它生成具有以下逻辑的最终标志:如果任何 id 的任何值为 'Y',则最终标志为 'Y',否则它将是 'N'。因此,对于此数据框,第一个和第四个 ID (2801, 14236) 在最后一列中有一个 'Y',其余的在最后一列中有一个 'n'。我尝试了一些方法,例如 apply 和 if...else 都无济于事。
通过将 "N" 分配给每一行进行初始化。在下一步中,对于具有 "Y" 的行(使用 apply
进行检查),分配 "Y"
df$final = "N"
df$final[apply(df, 1, function(a) "Y" %in% a)] = "Y"
下面的字母编码解决方案。
set.seed(5)
id <- seq(5) * floor(runif(5, min=1000, max=10000))
vals1 <- c("Y","N","N","N","N")
vals2 <- c("N","N","N","N","N")
vals3 <- c("N","N","N","Y","N")
df <- data.frame(id, vals1, vals2, vals3)
# If you really want to use the letter encoding, my solution works as below
df$Final <- apply(df[,2:4], MARGIN = 1, FUN = function(x) {any(x == 'Y')})
但是,我认为您应该为此使用布尔值 (TRUE/FALSE)。
与 apply
和 any
set.seed(5)
id <- seq(5) * floor(runif(5, min=1000, max=10000))
vals1 <- c("Y","N","N","N","N")
vals2 <- c("N","N","N","N","N")
vals3 <- c("N","N","N","Y","N")
df <- data.frame(id, vals1, vals2, vals3)
# Convert your labels into booleans:
df[,2:4] <- df[,2:4] == 'Y'
# Then summarise across rows
df$Final <- apply(df[,2:4], MARGIN = 1, FUN = function(x) {any(x)})
有点类似于@d.b的回答:
df$final <- apply(df, 1, function(x) c("N","Y")[any(x == "Y")+1])