如何将一个字符串列分隔成多个列?
How to separate a string column into multiple columns?
# A tibble: 268 x 1
`Which of these social media platforms do you have an account in right now?`
<chr>
1 Facebook, Instagram, Twitter, Snapchat, Reddit, Signal
2 Reddit
3 Facebook, Instagram, Twitter, Linkedin, Snapchat, Reddit, Quora
4 Facebook, Instagram, Twitter, Snapchat
5 Facebook, Instagram, TikTok, Snapchat
6 Facebook, Instagram, Twitter, Linkedin, Snapchat
7 Facebook, Instagram, TikTok, Linkedin, Snapchat, Reddit
8 Facebook, Instagram, Snapchat
9 Linkedin, Reddit
10 Facebook, Instagram, Twitter, TikTok
# ... with 258 more rows
我想将这个字符串列分成多个列,以便在其自己的列中获取每个社交媒体。
tidyr::separate
应该为你做这个(尽管它可能会警告不同行中元素数量不均匀)
library(tidyverse)
dd <- tibble(x = c("a", "a, b", "a, b, c"))
maxcols <- 3
dd %>% separate(x, into=paste0("y", 1:maxcols))
y1 y2 y3
<chr> <chr> <chr>
1 a NA NA
2 a b NA
3 a b c
Warning message:
Expected 3 pieces. Missing pieces filled with NA
in 2 rows [1, 2].
我认为@JasonPunyon 的回答比我的更有用,尽管我的确实解决了你所写的问题(“将这个字符串列分成多列”)
您可以使用 tidytext
包中的 unnest_tokens
结合 tidyr
中的 spread
来获得您想要的效果...
library(tidyverse)
library(tidytext)
df %>%
mutate(Id = row_number(), HasAccount = "Yes") %>%
unnest_tokens(Network, `Which of these social media platforms do you have an account in right now?`, to_lower = F) %>%
spread(Network, HasAccount, fill = "No")
(我生成了我自己的数据版本,因此这看起来与您的不同)
# A tibble: 268 x 8
Id Facebook Instagram Reddit Signal Snapchat TikTok Twitter
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 No No No No No No Yes
2 2 Yes Yes No No Yes No Yes
3 3 No Yes No Yes No Yes No
4 4 No Yes No No Yes No No
5 5 No Yes No Yes Yes Yes Yes
6 6 No Yes No No No No No
7 7 No No Yes Yes No Yes Yes
8 8 No No Yes No No No Yes
9 9 No No Yes No Yes Yes No
10 10 No Yes Yes Yes Yes No Yes
# A tibble: 268 x 1
`Which of these social media platforms do you have an account in right now?`
<chr>
1 Facebook, Instagram, Twitter, Snapchat, Reddit, Signal
2 Reddit
3 Facebook, Instagram, Twitter, Linkedin, Snapchat, Reddit, Quora
4 Facebook, Instagram, Twitter, Snapchat
5 Facebook, Instagram, TikTok, Snapchat
6 Facebook, Instagram, Twitter, Linkedin, Snapchat
7 Facebook, Instagram, TikTok, Linkedin, Snapchat, Reddit
8 Facebook, Instagram, Snapchat
9 Linkedin, Reddit
10 Facebook, Instagram, Twitter, TikTok
# ... with 258 more rows
我想将这个字符串列分成多个列,以便在其自己的列中获取每个社交媒体。
tidyr::separate
应该为你做这个(尽管它可能会警告不同行中元素数量不均匀)
library(tidyverse)
dd <- tibble(x = c("a", "a, b", "a, b, c"))
maxcols <- 3
dd %>% separate(x, into=paste0("y", 1:maxcols))
y1 y2 y3
<chr> <chr> <chr>
1 a NA NA
2 a b NA
3 a b c
Warning message: Expected 3 pieces. Missing pieces filled with
NA
in 2 rows [1, 2].
我认为@JasonPunyon 的回答比我的更有用,尽管我的确实解决了你所写的问题(“将这个字符串列分成多列”)
您可以使用 tidytext
包中的 unnest_tokens
结合 tidyr
中的 spread
来获得您想要的效果...
library(tidyverse)
library(tidytext)
df %>%
mutate(Id = row_number(), HasAccount = "Yes") %>%
unnest_tokens(Network, `Which of these social media platforms do you have an account in right now?`, to_lower = F) %>%
spread(Network, HasAccount, fill = "No")
(我生成了我自己的数据版本,因此这看起来与您的不同)
# A tibble: 268 x 8
Id Facebook Instagram Reddit Signal Snapchat TikTok Twitter
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 No No No No No No Yes
2 2 Yes Yes No No Yes No Yes
3 3 No Yes No Yes No Yes No
4 4 No Yes No No Yes No No
5 5 No Yes No Yes Yes Yes Yes
6 6 No Yes No No No No No
7 7 No No Yes Yes No Yes Yes
8 8 No No Yes No No No Yes
9 9 No No Yes No Yes Yes No
10 10 No Yes Yes Yes Yes No Yes