在 R 中添加最低时间的频率计数
frequency count with adding the lowest time in R
group/Time/Boardstation/
1 0511 1
1 0513 1
1 0515 1
1 0520 2
1 0525 3
1 0526 3
1 0540 5
2 0511 1
2 0513 1
2 0515 1
2 0520 2
2 0525 3
2 0526 3
2 0540 5
目前数据集如上
group/Boardstation/Frequency
1 1 3
1 2 1
1 3 2
1 4 0
1 5 1
2 1 3
2 2 1
2 3 2
2 4 0
2 5 1
我已经计算了频率并添加了不在数据集中的站点
现在我想在将最低时间添加到第二个数据的同时进行频率计数,如下所示
group/Boardstation/Frequency/Time
1 1 3 0511
1 2 1 0520
1 3 2 0525
1 4 0 null
1 5 1 0540
2 1 3 0511
2 2 1 0520
2 3 2 0525
2 4 0 null
2 5 1 0540
目前我已经设法让 boardstation 和 frequency 和 count 也为空,但是在新列上添加时间很困难。
任何帮助都会有帮助
谢谢!
一个选项是连接两个数据框,按时间排序结果,然后删除组和 boardstation 中重复的行:
数据:
time_df <- data.frame(t(matrix(c(1,0511,1,
1,0513,1,1,0515,1,1,0520,2,1,0525,3,1,0526,3,1,0540,5,2,0511,1,
2,0513,1,2,0515,1,2,0520,2,2,0525,3,2,0526,3,2,0540,5), nrow = 3)))
colnames(time_df) <- c("group","Time","Boardstation")
freq_df <- data.frame(t(matrix(c(1,1,3,
1,2,1,1,3,2,1,4,0,1,5,1,2,1,3,2,2,1,2,3,2,2,4,0,2,5,1), nrow = 3)))
# alternative
# freq_df <- as.data.frame(with(time_df, table(group, factor(Boardstation,levels = 1:5))))
colnames(freq_df) <- c("group","Boardstation","Frequency")
解决方案:
join_df <- merge(freq_df, time_df, by = c("group", "Boardstation"), all.x = TRUE)
join_df <- join_df[with(join_df, order(group, Boardstation, Time)),]
final_df <- join_df[!duplicated(join_df[,1:2]),]
group Boardstation Frequency Time
1 1 1 3 511
4 1 2 1 520
5 1 3 2 525
7 1 4 0 NA
8 1 5 1 540
9 2 1 3 511
12 2 2 1 520
13 2 3 2 525
15 2 4 0 NA
16 2 5 1 540
这是一个 dplyr
解决方案:
library(dplyr)
df = read.table(text = "
group Time Boardstation
1 0511 1
1 0513 1
1 0515 1
1 0520 2
1 0525 3
1 0526 3
1 0540 5
2 0511 1
2 0513 1
2 0515 1
2 0520 2
2 0525 3
2 0526 3
2 0540 5
", header=T)
expand.grid(group = seq(min(df$group),
max(df$group)), # get all possible combinations of group and Boardstation as a dataframe
Boardstation = seq(min(df$Boardstation),
max(df$Boardstation))) %>%
left_join(df, by=c("group", "Boardstation")) %>% # join your original dataset
mutate(counter = !is.na(Time)) %>% # flag when there's a Time value
group_by(group, Boardstation) %>% # for each combination of group and Boardstation
summarise(Freq = sum(counter), # count it only if there's a Time value
Time = min(Time)) %>% # get the minimum Time value
ungroup() # forget the grouping
# # A tibble: 10 x 4
# group Boardstation Freq Time
# <int> <int> <int> <dbl>
# 1 1 1 3 511
# 2 1 2 1 520
# 3 1 3 2 525
# 4 1 4 0 NA
# 5 1 5 1 540
# 6 2 1 3 511
# 7 2 2 1 520
# 8 2 3 2 525
# 9 2 4 0 NA
# 10 2 5 1 540
如果 Time
是数字或字符变量,这将起作用。如果它是因子变量,则必须将其更改为字符变量。
group/Time/Boardstation/
1 0511 1
1 0513 1
1 0515 1
1 0520 2
1 0525 3
1 0526 3
1 0540 5
2 0511 1
2 0513 1
2 0515 1
2 0520 2
2 0525 3
2 0526 3
2 0540 5
目前数据集如上
group/Boardstation/Frequency
1 1 3
1 2 1
1 3 2
1 4 0
1 5 1
2 1 3
2 2 1
2 3 2
2 4 0
2 5 1
我已经计算了频率并添加了不在数据集中的站点
现在我想在将最低时间添加到第二个数据的同时进行频率计数,如下所示
group/Boardstation/Frequency/Time
1 1 3 0511
1 2 1 0520
1 3 2 0525
1 4 0 null
1 5 1 0540
2 1 3 0511
2 2 1 0520
2 3 2 0525
2 4 0 null
2 5 1 0540
目前我已经设法让 boardstation 和 frequency 和 count 也为空,但是在新列上添加时间很困难。
任何帮助都会有帮助 谢谢!
一个选项是连接两个数据框,按时间排序结果,然后删除组和 boardstation 中重复的行:
数据:
time_df <- data.frame(t(matrix(c(1,0511,1,
1,0513,1,1,0515,1,1,0520,2,1,0525,3,1,0526,3,1,0540,5,2,0511,1,
2,0513,1,2,0515,1,2,0520,2,2,0525,3,2,0526,3,2,0540,5), nrow = 3)))
colnames(time_df) <- c("group","Time","Boardstation")
freq_df <- data.frame(t(matrix(c(1,1,3,
1,2,1,1,3,2,1,4,0,1,5,1,2,1,3,2,2,1,2,3,2,2,4,0,2,5,1), nrow = 3)))
# alternative
# freq_df <- as.data.frame(with(time_df, table(group, factor(Boardstation,levels = 1:5))))
colnames(freq_df) <- c("group","Boardstation","Frequency")
解决方案:
join_df <- merge(freq_df, time_df, by = c("group", "Boardstation"), all.x = TRUE)
join_df <- join_df[with(join_df, order(group, Boardstation, Time)),]
final_df <- join_df[!duplicated(join_df[,1:2]),]
group Boardstation Frequency Time
1 1 1 3 511
4 1 2 1 520
5 1 3 2 525
7 1 4 0 NA
8 1 5 1 540
9 2 1 3 511
12 2 2 1 520
13 2 3 2 525
15 2 4 0 NA
16 2 5 1 540
这是一个 dplyr
解决方案:
library(dplyr)
df = read.table(text = "
group Time Boardstation
1 0511 1
1 0513 1
1 0515 1
1 0520 2
1 0525 3
1 0526 3
1 0540 5
2 0511 1
2 0513 1
2 0515 1
2 0520 2
2 0525 3
2 0526 3
2 0540 5
", header=T)
expand.grid(group = seq(min(df$group),
max(df$group)), # get all possible combinations of group and Boardstation as a dataframe
Boardstation = seq(min(df$Boardstation),
max(df$Boardstation))) %>%
left_join(df, by=c("group", "Boardstation")) %>% # join your original dataset
mutate(counter = !is.na(Time)) %>% # flag when there's a Time value
group_by(group, Boardstation) %>% # for each combination of group and Boardstation
summarise(Freq = sum(counter), # count it only if there's a Time value
Time = min(Time)) %>% # get the minimum Time value
ungroup() # forget the grouping
# # A tibble: 10 x 4
# group Boardstation Freq Time
# <int> <int> <int> <dbl>
# 1 1 1 3 511
# 2 1 2 1 520
# 3 1 3 2 525
# 4 1 4 0 NA
# 5 1 5 1 540
# 6 2 1 3 511
# 7 2 2 1 520
# 8 2 3 2 525
# 9 2 4 0 NA
# 10 2 5 1 540
如果 Time
是数字或字符变量,这将起作用。如果它是因子变量,则必须将其更改为字符变量。