为什么 stringr::str_match 列 return 是一个矩阵?
Why does stringr::str_match on a column return a matrix?
我正在使用 tidyverse
加载数据,所以我有一个小标题,您可以像这样复制它:
df_1 <- tibble(id = c(1, 2, 3), subject_id = c("ABCD-FOO1-G001-YX-732E5", "ABCD-FOO2-A011-ZA-892N2", "ABCD-FOO3-1001-CD-742W5"))
现在我想修改 subject_id
以仅提取前两个字符组,即:
"ABCD-FOO1-G001-YX-732E5" -> "ABCD-FOO1"
当我运行以下代码时:
df_1 %>% mutate(subject_id = stringr::str_match(subject_id, "[^-]*-[^-]*"))
subject_id
列的每个元素本身就是一个标题:
> class(df_1[1, "subject_id"])
[1] "tbl_df" "tbl" "data.frame"
如何确保 subject_id
是字符向量而不是 tibble?
这里介绍如何避免这种情况,而不是为什么。
正如我们从 ?str_match
中学到的那样:
For str_match, a character matrix. First column is the complete match, followed by one column for each capture group. [...]
所以我们需要从矩阵中提取第一列:
df_1 %>% mutate(subject_id = stringr::str_match(subject_id, "[^-]*-[^-]*") %>% .[,1])
# # A tibble: 3 x 2
# id subject_id
# <dbl> <chr>
# 1 1 ABCD-FOO1
# 2 2 ABCD-FOO2
# 3 3 ABCD-FOO3
另外请记住,在您的 class()
示例中,您将一个 tibble 子集化。即使只有 1 个单元格,tibble 也将始终保持 tibble。请参阅比较 class(df_2[1,"id"])
。有关更多信息,请查看 this chapter from R for Data Science.
我们可以使用str_extract
library(stringr)
library(dplyr)
df_1 %>%
mutate(subject_id = str_extract(subject_id, "^\w+-\w+"))
# A tibble: 3 x 2
# id subject_id
# <dbl> <chr>
#1 1 ABCD-FOO1
#2 2 ABCD-FOO2
#3 3 ABCD-FOO3
我正在使用 tidyverse
加载数据,所以我有一个小标题,您可以像这样复制它:
df_1 <- tibble(id = c(1, 2, 3), subject_id = c("ABCD-FOO1-G001-YX-732E5", "ABCD-FOO2-A011-ZA-892N2", "ABCD-FOO3-1001-CD-742W5"))
现在我想修改 subject_id
以仅提取前两个字符组,即:
"ABCD-FOO1-G001-YX-732E5" -> "ABCD-FOO1"
当我运行以下代码时:
df_1 %>% mutate(subject_id = stringr::str_match(subject_id, "[^-]*-[^-]*"))
subject_id
列的每个元素本身就是一个标题:
> class(df_1[1, "subject_id"])
[1] "tbl_df" "tbl" "data.frame"
如何确保 subject_id
是字符向量而不是 tibble?
这里介绍如何避免这种情况,而不是为什么。
正如我们从 ?str_match
中学到的那样:
For str_match, a character matrix. First column is the complete match, followed by one column for each capture group. [...]
所以我们需要从矩阵中提取第一列:
df_1 %>% mutate(subject_id = stringr::str_match(subject_id, "[^-]*-[^-]*") %>% .[,1])
# # A tibble: 3 x 2
# id subject_id
# <dbl> <chr>
# 1 1 ABCD-FOO1
# 2 2 ABCD-FOO2
# 3 3 ABCD-FOO3
另外请记住,在您的 class()
示例中,您将一个 tibble 子集化。即使只有 1 个单元格,tibble 也将始终保持 tibble。请参阅比较 class(df_2[1,"id"])
。有关更多信息,请查看 this chapter from R for Data Science.
我们可以使用str_extract
library(stringr)
library(dplyr)
df_1 %>%
mutate(subject_id = str_extract(subject_id, "^\w+-\w+"))
# A tibble: 3 x 2
# id subject_id
# <dbl> <chr>
#1 1 ABCD-FOO1
#2 2 ABCD-FOO2
#3 3 ABCD-FOO3