R:计算带有子元素的嵌套列表中值的频率
R: Count frequency of values in nested list with sub-elements
我有一个包含国家/地区名称的嵌套列表。
我想计算国家的频率,子列表中每次提及都会添加 +1(无论该国家在该子列表中被提及的频率如何)。
例如,如果我有这个列表:
[[1]]
[1] "Austria" "Austria" "Austria"
[[2]]
[1] "Austria" "Sweden"
[[3]]
[1] "Austria" "Austria" "Sweden" "Sweden" "Sweden" "Sweden"
[[4]]
[1] "Austria" "Austria" "Austria"
[[5]]
[1] "Austria" "Japan"
...那么我希望结果是这样的:
country freq
====================
Austria 5
Sweden 2
Japan 1
我尝试了 lapply
、unlist
、table
等各种方法,但没有任何效果符合我的需要。非常感谢您的帮助!
一种方法是获取数据框格式的数据并计算每个国家/地区出现的唯一元素。
library(dplyr)
tibble::enframe(lst) %>%
tidyr::unnest(value) %>%
group_by(value) %>%
summarise(freq = n_distinct(name))
# value freq
# <chr> <int>
#1 Austria 5
#2 Japan 1
#3 Sweden 2
数据
lst <- list(c('Austria', 'Austria', 'Austria'), c("Austria", "Sweden"),
c("Austria", "Austria", "Sweden", "Sweden", "Sweden", "Sweden"),
c("Austria", "Austria", "Austria"), c("Austria", "Japan" ))
lapply()
、unlist()
和 table()
的一种方式:
count <- table(unlist(lapply(lst, unique)))
count
# Austria Japan Sweden
# 5 1 2
as.data.frame(count)
# Var1 Freq
# 1 Austria 5
# 2 Japan 1
# 3 Sweden 2
可复现数据(下次请自行提供):
lst <- list(
c('Austria', 'Austria', 'Austria'),
c("Austria", "Sweden"),
c("Austria", "Austria", "Sweden", "Sweden", "Sweden", "Sweden"),
c("Austria", "Austria", "Austria"),
c("Austria", "Japan")
)
这是另一个基础 R 选项
colSums(
do.call(
rbind,
lapply(
lst,
function(x) table(factor(x, levels = unique(unlist(lst)))) > 0
)
)
)
这给出了
Austria Sweden Japan
5 2 1
一个选项也是将stack
变成两列data.frame,然后取unique
并应用table
table(unique(stack(setNames(lst, seq_along(lst))))$values)
# Austria Japan Sweden
# 5 1 2
我有一个包含国家/地区名称的嵌套列表。 我想计算国家的频率,子列表中每次提及都会添加 +1(无论该国家在该子列表中被提及的频率如何)。
例如,如果我有这个列表:
[[1]]
[1] "Austria" "Austria" "Austria"
[[2]]
[1] "Austria" "Sweden"
[[3]]
[1] "Austria" "Austria" "Sweden" "Sweden" "Sweden" "Sweden"
[[4]]
[1] "Austria" "Austria" "Austria"
[[5]]
[1] "Austria" "Japan"
...那么我希望结果是这样的:
country freq
====================
Austria 5
Sweden 2
Japan 1
我尝试了 lapply
、unlist
、table
等各种方法,但没有任何效果符合我的需要。非常感谢您的帮助!
一种方法是获取数据框格式的数据并计算每个国家/地区出现的唯一元素。
library(dplyr)
tibble::enframe(lst) %>%
tidyr::unnest(value) %>%
group_by(value) %>%
summarise(freq = n_distinct(name))
# value freq
# <chr> <int>
#1 Austria 5
#2 Japan 1
#3 Sweden 2
数据
lst <- list(c('Austria', 'Austria', 'Austria'), c("Austria", "Sweden"),
c("Austria", "Austria", "Sweden", "Sweden", "Sweden", "Sweden"),
c("Austria", "Austria", "Austria"), c("Austria", "Japan" ))
lapply()
、unlist()
和 table()
的一种方式:
count <- table(unlist(lapply(lst, unique)))
count
# Austria Japan Sweden
# 5 1 2
as.data.frame(count)
# Var1 Freq
# 1 Austria 5
# 2 Japan 1
# 3 Sweden 2
可复现数据(下次请自行提供):
lst <- list(
c('Austria', 'Austria', 'Austria'),
c("Austria", "Sweden"),
c("Austria", "Austria", "Sweden", "Sweden", "Sweden", "Sweden"),
c("Austria", "Austria", "Austria"),
c("Austria", "Japan")
)
这是另一个基础 R 选项
colSums(
do.call(
rbind,
lapply(
lst,
function(x) table(factor(x, levels = unique(unlist(lst)))) > 0
)
)
)
这给出了
Austria Sweden Japan
5 2 1
一个选项也是将stack
变成两列data.frame,然后取unique
并应用table
table(unique(stack(setNames(lst, seq_along(lst))))$values)
# Austria Japan Sweden
# 5 1 2