如何将作为字典的数据框列拆分为两列
How to Split a dataframe column that is a dictionary into two columns
我有一个数据框,其中有一列
ID
dictionary_column
1
{1:5, 10:15, 3:9}
2
{3:4, 5:3}
...
...
我正在努力让它看起来像:
ID
col_a
col_b
1
1
5
1
10
15
1
3
9
2
3
4
2
5
3
...
...
...
对此有何建议?使用 stringr
使用了各种方法,但总是忘记 ID 列或以混乱和缓慢的循环结束。谢谢
实现您想要的结果的tidyverse
方法可能如下所示:
library(dplyr)
library(tidyr)
data.frame(
ID = c(1L, 2L),
dictionary_column = c("{1:5, 10:15, 3:9}", "{3:4, 5:3}")
) %>%
mutate(dictionary_column = gsub("(\{|\})", "", dictionary_column)) %>%
separate_rows(dictionary_column, sep = ", ") %>%
separate(dictionary_column, into = c("col_a", "col_b"))
#> # A tibble: 5 × 3
#> ID col_a col_b
#> <int> <chr> <chr>
#> 1 1 1 5
#> 2 1 10 15
#> 3 1 3 9
#> 4 2 3 4
#> 5 2 5 3
不是很优雅,但很管用:
library(tidyr)
library(dplyr)
dat %>%
mutate(dictionary_column = gsub("\{|\}|\,", "", dictionary_column)) %>%
separate(dictionary_column, into=c("a", "b", "c"), sep=" ") %>%
pivot_longer(-ID, values_drop_na=T) %>%
select(-name) %>%
separate(value, into = c("col_a", "col_b"))
# A tibble: 5 × 3
ID col_a col_b
<int> <chr> <chr>
1 1 1 5
2 1 10 15
3 1 3 9
4 2 3 4
5 2 5 3
带有 str_extract_all
的选项将 :
前后的数字提取到 list
列中,然后 unnest
list
library(stringr)
library(dplyr)
library(tidyr)
df1 %>%
mutate(col_a = str_extract_all(dictionary_column, "\d+(?=:)"),
col_b = str_extract_all(dictionary_column, "(?<=:)\d+"),
.keep = "unused") %>%
unnest(c(col_a, col_b))
-输出
# A tibble: 5 × 3
ID col_a col_b
<int> <chr> <chr>
1 1 1 5
2 1 10 15
3 1 3 9
4 2 3 4
5 2 5 3
数据
df1 <- structure(list(ID = 1:2, dictionary_column = c("{1:5, 10:15, 3:9}",
"{3:4, 5:3}")), class = "data.frame", row.names = c(NA, -2L))
我有一个数据框,其中有一列
ID | dictionary_column |
---|---|
1 | {1:5, 10:15, 3:9} |
2 | {3:4, 5:3} |
... | ... |
我正在努力让它看起来像:
ID | col_a | col_b |
---|---|---|
1 | 1 | 5 |
1 | 10 | 15 |
1 | 3 | 9 |
2 | 3 | 4 |
2 | 5 | 3 |
... | ... | ... |
对此有何建议?使用 stringr
使用了各种方法,但总是忘记 ID 列或以混乱和缓慢的循环结束。谢谢
实现您想要的结果的tidyverse
方法可能如下所示:
library(dplyr)
library(tidyr)
data.frame(
ID = c(1L, 2L),
dictionary_column = c("{1:5, 10:15, 3:9}", "{3:4, 5:3}")
) %>%
mutate(dictionary_column = gsub("(\{|\})", "", dictionary_column)) %>%
separate_rows(dictionary_column, sep = ", ") %>%
separate(dictionary_column, into = c("col_a", "col_b"))
#> # A tibble: 5 × 3
#> ID col_a col_b
#> <int> <chr> <chr>
#> 1 1 1 5
#> 2 1 10 15
#> 3 1 3 9
#> 4 2 3 4
#> 5 2 5 3
不是很优雅,但很管用:
library(tidyr)
library(dplyr)
dat %>%
mutate(dictionary_column = gsub("\{|\}|\,", "", dictionary_column)) %>%
separate(dictionary_column, into=c("a", "b", "c"), sep=" ") %>%
pivot_longer(-ID, values_drop_na=T) %>%
select(-name) %>%
separate(value, into = c("col_a", "col_b"))
# A tibble: 5 × 3
ID col_a col_b
<int> <chr> <chr>
1 1 1 5
2 1 10 15
3 1 3 9
4 2 3 4
5 2 5 3
带有 str_extract_all
的选项将 :
前后的数字提取到 list
列中,然后 unnest
list
library(stringr)
library(dplyr)
library(tidyr)
df1 %>%
mutate(col_a = str_extract_all(dictionary_column, "\d+(?=:)"),
col_b = str_extract_all(dictionary_column, "(?<=:)\d+"),
.keep = "unused") %>%
unnest(c(col_a, col_b))
-输出
# A tibble: 5 × 3
ID col_a col_b
<int> <chr> <chr>
1 1 1 5
2 1 10 15
3 1 3 9
4 2 3 4
5 2 5 3
数据
df1 <- structure(list(ID = 1:2, dictionary_column = c("{1:5, 10:15, 3:9}",
"{3:4, 5:3}")), class = "data.frame", row.names = c(NA, -2L))