创建频率数据框并从旧数据框传输列
Create frequency data frame and transfer columns from old data frame
我正在使用 map 函数从数据框列表创建频率表。我想从原始数据框中导入名称列。例如,当我输入 df_freq$C
时,我想看到三列,value
、n
和 name
。对于 name
列,我希望所有值都等于 "C"
.
# load packages and define variables
rm(list = ls())
library(purrr)
library(dplyr)
## load data
df_raw <- data.frame(name = c("C", "A", "B", "A", "B", "C"),
start = c(2, 1, 3, 4, 5, 2),
end = c(7, 6, 7, 8, 10, 9))
df <- df_raw %>%
split(.$name) %>% # split data by name
imap(function(x, x_name) {
data.frame(value = Map(seq.int, x$start, x$end) %>% unlist,
name = x_name) })
## create frequency plot with name column
df_freq <- df %>%
map(., ~count(.x, value))```
在tidyverse
中可以更直接地完成。创建一个 rowwise
属性,然后 transmute
到 return 每行的 name
和 list
从 'start' 到 'end' 的序列, unnest
list
列并执行 count
library(dplyr)
library(tidyr)
df_raw %>%
rowwise %>%
transmute(name, value = list(start:end)) %>%
unnest(c(value)) %>%
count(name, value)
-输出
# A tibble: 24 x 3
# name value n
# <chr> <int> <int>
# 1 A 1 1
# 2 A 2 1
# 3 A 3 1
# 4 A 4 2
# 5 A 5 2
# 6 A 6 2
# 7 A 7 1
# 8 A 8 1
# 9 B 3 1
#10 B 4 1
# … with 14 more rows
或者代替 rowwise
,可以使用 map2
library(purrr)
df_raw %>%
transmute(name, value = map2(start, end, `:`)) %>%
unnest(c(value)) %>%
count(name, value)
在 OP 的代码中,count
也需要 name
列
df %>%
map(., ~count(.x, name, value))
这里有一个data.table
选项
setDT(df)[, .(value = unlist(Map(seq, start, end)), n = 1), .(name)][, .(n = sum(n)), by = .(name, value)]
这给出了
name value n
1: C 2 2
2: C 3 2
3: C 4 2
4: C 5 2
5: C 6 2
6: C 7 2
7: C 8 1
8: C 9 1
9: A 1 1
10: A 2 1
11: A 3 1
12: A 4 2
13: A 5 2
14: A 6 2
15: A 7 1
16: A 8 1
17: B 3 1
18: B 4 1
19: B 5 2
20: B 6 2
21: B 7 2
22: B 8 1
23: B 9 1
24: B 10 1
name value n
我正在使用 map 函数从数据框列表创建频率表。我想从原始数据框中导入名称列。例如,当我输入 df_freq$C
时,我想看到三列,value
、n
和 name
。对于 name
列,我希望所有值都等于 "C"
.
# load packages and define variables
rm(list = ls())
library(purrr)
library(dplyr)
## load data
df_raw <- data.frame(name = c("C", "A", "B", "A", "B", "C"),
start = c(2, 1, 3, 4, 5, 2),
end = c(7, 6, 7, 8, 10, 9))
df <- df_raw %>%
split(.$name) %>% # split data by name
imap(function(x, x_name) {
data.frame(value = Map(seq.int, x$start, x$end) %>% unlist,
name = x_name) })
## create frequency plot with name column
df_freq <- df %>%
map(., ~count(.x, value))```
在tidyverse
中可以更直接地完成。创建一个 rowwise
属性,然后 transmute
到 return 每行的 name
和 list
从 'start' 到 'end' 的序列, unnest
list
列并执行 count
library(dplyr)
library(tidyr)
df_raw %>%
rowwise %>%
transmute(name, value = list(start:end)) %>%
unnest(c(value)) %>%
count(name, value)
-输出
# A tibble: 24 x 3
# name value n
# <chr> <int> <int>
# 1 A 1 1
# 2 A 2 1
# 3 A 3 1
# 4 A 4 2
# 5 A 5 2
# 6 A 6 2
# 7 A 7 1
# 8 A 8 1
# 9 B 3 1
#10 B 4 1
# … with 14 more rows
或者代替 rowwise
,可以使用 map2
library(purrr)
df_raw %>%
transmute(name, value = map2(start, end, `:`)) %>%
unnest(c(value)) %>%
count(name, value)
在 OP 的代码中,count
也需要 name
列
df %>%
map(., ~count(.x, name, value))
这里有一个data.table
选项
setDT(df)[, .(value = unlist(Map(seq, start, end)), n = 1), .(name)][, .(n = sum(n)), by = .(name, value)]
这给出了
name value n
1: C 2 2
2: C 3 2
3: C 4 2
4: C 5 2
5: C 6 2
6: C 7 2
7: C 8 1
8: C 9 1
9: A 1 1
10: A 2 1
11: A 3 1
12: A 4 2
13: A 5 2
14: A 6 2
15: A 7 1
16: A 8 1
17: B 3 1
18: B 4 1
19: B 5 2
20: B 6 2
21: B 7 2
22: B 8 1
23: B 9 1
24: B 10 1
name value n