R:计算带有子元素的嵌套列表中值的频率

R: Count frequency of values in nested list with sub-elements

我有一个包含国家/地区名称的嵌套列表。 我想计算国家的频率,子列表中每次提及都会添加 +1(无论该国家在该子列表中被提及的频率如何)。

例如,如果我有这个列表:

[[1]]
[1] "Austria" "Austria" "Austria"

[[2]]
[1] "Austria" "Sweden"

[[3]]
[1] "Austria" "Austria" "Sweden"  "Sweden" "Sweden" "Sweden"

[[4]]
[1] "Austria" "Austria" "Austria"

[[5]]
[1] "Austria" "Japan" 

...那么我希望结果是这样的:

country        freq
====================
Austria         5
Sweden          2
Japan           1

我尝试了 lapplyunlisttable 等各种方法,但没有任何效果符合我的需要。非常感谢您的帮助!

一种方法是获取数据框格式的数据并计算每个国家/地区出现的唯一元素。

library(dplyr)

tibble::enframe(lst) %>%
  tidyr::unnest(value) %>%
  group_by(value) %>%
  summarise(freq = n_distinct(name))


# value    freq
#  <chr>   <int>
#1 Austria     5
#2 Japan       1
#3 Sweden      2

数据

lst <- list(c('Austria', 'Austria', 'Austria'), c("Austria", "Sweden"), 
     c("Austria", "Austria", "Sweden",  "Sweden", "Sweden", "Sweden"), 
     c("Austria", "Austria", "Austria"), c("Austria", "Japan" ))

lapply()unlist()table() 的一种方式:

count <- table(unlist(lapply(lst, unique)))
count
# Austria   Japan  Sweden 
#       5       1       2 


as.data.frame(count)
#      Var1 Freq
# 1 Austria    5
# 2   Japan    1
# 3  Sweden    2

可复现数据(下次请自行提供):

lst <- list(
  c('Austria', 'Austria', 'Austria'), 
  c("Austria", "Sweden"), 
  c("Austria", "Austria", "Sweden", "Sweden", "Sweden", "Sweden"), 
  c("Austria", "Austria", "Austria"), 
  c("Austria", "Japan")
)

这是另一个基础 R 选项

colSums(
  do.call(
    rbind,
    lapply(
      lst,
      function(x) table(factor(x, levels = unique(unlist(lst)))) > 0
    )
  )
)

这给出了

Austria  Sweden   Japan
      5       2       1

一个选项也是将stack变成两列data.frame,然后取unique并应用table

table(unique(stack(setNames(lst, seq_along(lst))))$values)

#   Austria   Japan  Sweden 
#     5       1       2