将列表列拆分为 R 数据帧上的多个整数列

Question

我有一个包含 2 列的 R 数据框：交易 ID 和相关产品列表

我需要一个具有相同行数（每个交易一行）的数据集，列数等于所有可能的产品，其值从 0 到 n，具体取决于交易包含该产品的次数

有什么快速的方法吗？

可重现的例子

输入

tibble(ID = c('01', '02'),
           Products = list(c('Apple', 'Apple', 'Orange'), c('Pear')))

输出

tibble(ID = c('01', '02'),
       Apple = c(2, 0),
       Orange = c(1, 0),
       Pear = c(0, 1))

# A tibble: 2 x 4
  ID    Apple Orange  Pear
  <chr> <dbl>  <dbl> <dbl>
1 01        2      1     0
2 02        0      0     1

Answer 1

您可以使用 tidyr 中的 unnest_longer 执行此操作。试试这个：

library(dplyr)
library(tidyr)

tibble(ID = c('01', '02'),
             Products = list(c('Apple', 'Apple', 'Orange'), c('Pear'))) %>% 
  unnest_longer(Products) %>% 
  count(ID, Products) %>% 
  spread(Products, n, fill = 0)
#> # A tibble: 2 x 4
#> # Groups:   ID [2]
#>   ID    Apple Orange  Pear
#>   <chr> <dbl>  <dbl> <dbl>
#> 1 01        2      1     0
#> 2 02        0      0     1

^{由 reprex package (v0.3.0)}

于 2020 年 3 月 10 日创建

将列表列拆分为 R 数据帧上的多个整数列

Split list column into multiple integer columns on R dataframe

r

dataframe

dplyr

tidyr

tibble