case_when 具有部分字符串匹配和 contains()
case_when with partial string match and contains()
我正在处理一个数据集,该数据集包含许多名为 status1、status2 等的列。在这些列中,它表示某人是否豁免、完成、注册等。
不幸的是,豁免输入不一致;这是一个示例:
library(dplyr)
problem <- tibble(person = c("Corey", "Sibley", "Justin", "Ruth"),
status1 = c("7EXEMPT", "Completed", "Completed", "Pending"),
status2 = c("exempt", "Completed", "Completed", "Pending"),
status3 = c("EXEMPTED", "Completed", "Completed", "ExempT - 14"))
我正在尝试使用 case_when() 创建一个具有最终状态的新列。如果它说完成,那么它们就完成了。如果它在没有说完整的情况下说豁免,那么他们就是豁免的。
重要的部分是我希望我的代码使用 contains("status"),或一些只针对状态列而不需要全部输入的等价物,我希望它只需要豁免的部分字符串匹配。
至于在 case_when 中使用包含,我看到了这个例子,但我无法将它应用到我的案例中:
这是我到目前为止尝试使用的方法,但正如您所猜到的那样,它没有用:
library(purrr)
library(dplyr)
library(stringr)
solution <- problem %>%
mutate(final= case_when(pmap_chr(select(., contains("status")), ~
any(c(...) == str_detect(., "Exempt") ~ "Exclude",
TRUE ~ "Complete"
))))
这是我想要的最终产品的样子:
solution <- tibble(person = c("Corey", "Sibley", "Justin", "Ruth"),
status1 = c("7EXEMPT", "Completed", "Completed", "Pending"),
status2 = c("exempt", "Completed", "Completed", "Pending"),
status3 = c("EXEMPTED", "Completed", "Completed", "ExempT - 14"),
final = c("Exclude", "Completed", "Completed", "Exclude"))
谢谢!
我认为你在倒退。把 case_when
放在 pmap_chr
里面,而不是反过来:
library(dplyr)
library(purrr)
library(stringr)
problem %>%
mutate(final = pmap_chr(select(., contains("status")),
~ case_when(any(str_detect(c(...), "(?i)Exempt")) ~ "Exclude",
TRUE ~ "Completed")))
对于每个 pmap
迭代(problem
数据集的每一行),我们要使用 case_when
检查是否存在字符串 Exempt
。 (?i)
in str_detect
使其不区分大小写。这和写 str_detect(c(...), regex("Exempt", ignore_case = TRUE))
是一样的
输出:
# A tibble: 4 x 5
person status1 status2 status3 final
<chr> <chr> <chr> <chr> <chr>
1 Corey 7EXEMPT exempt EXEMPTED Exclude
2 Sibley Completed Completed Completed Completed
3 Justin Completed Completed Completed Completed
4 Ruth Pending Pending ExempT - 14 Exclude
我正在处理一个数据集,该数据集包含许多名为 status1、status2 等的列。在这些列中,它表示某人是否豁免、完成、注册等。
不幸的是,豁免输入不一致;这是一个示例:
library(dplyr)
problem <- tibble(person = c("Corey", "Sibley", "Justin", "Ruth"),
status1 = c("7EXEMPT", "Completed", "Completed", "Pending"),
status2 = c("exempt", "Completed", "Completed", "Pending"),
status3 = c("EXEMPTED", "Completed", "Completed", "ExempT - 14"))
我正在尝试使用 case_when() 创建一个具有最终状态的新列。如果它说完成,那么它们就完成了。如果它在没有说完整的情况下说豁免,那么他们就是豁免的。
重要的部分是我希望我的代码使用 contains("status"),或一些只针对状态列而不需要全部输入的等价物,我希望它只需要豁免的部分字符串匹配。
至于在 case_when 中使用包含,我看到了这个例子,但我无法将它应用到我的案例中:
这是我到目前为止尝试使用的方法,但正如您所猜到的那样,它没有用:
library(purrr)
library(dplyr)
library(stringr)
solution <- problem %>%
mutate(final= case_when(pmap_chr(select(., contains("status")), ~
any(c(...) == str_detect(., "Exempt") ~ "Exclude",
TRUE ~ "Complete"
))))
这是我想要的最终产品的样子:
solution <- tibble(person = c("Corey", "Sibley", "Justin", "Ruth"),
status1 = c("7EXEMPT", "Completed", "Completed", "Pending"),
status2 = c("exempt", "Completed", "Completed", "Pending"),
status3 = c("EXEMPTED", "Completed", "Completed", "ExempT - 14"),
final = c("Exclude", "Completed", "Completed", "Exclude"))
谢谢!
我认为你在倒退。把 case_when
放在 pmap_chr
里面,而不是反过来:
library(dplyr)
library(purrr)
library(stringr)
problem %>%
mutate(final = pmap_chr(select(., contains("status")),
~ case_when(any(str_detect(c(...), "(?i)Exempt")) ~ "Exclude",
TRUE ~ "Completed")))
对于每个 pmap
迭代(problem
数据集的每一行),我们要使用 case_when
检查是否存在字符串 Exempt
。 (?i)
in str_detect
使其不区分大小写。这和写 str_detect(c(...), regex("Exempt", ignore_case = TRUE))
输出:
# A tibble: 4 x 5
person status1 status2 status3 final
<chr> <chr> <chr> <chr> <chr>
1 Corey 7EXEMPT exempt EXEMPTED Exclude
2 Sibley Completed Completed Completed Completed
3 Justin Completed Completed Completed Completed
4 Ruth Pending Pending ExempT - 14 Exclude