如何在计算一列中的出现次数并对其他列中的值求和时对数据框中的行进行分组?
How to group rows in data frame while counting occurrences in one column and summing values in other?
我正在尝试修改我的数据框:
start end duration_time
1 1 2 2.438
2 2 1 3.901
3 1 2 18.037
4 2 3 85.861
5 3 4 83.922
并创建如下内容:
start end duration_time weight
1 1 2 20.475 2
2 2 1 3.901 1
4 2 3 85.861 1
5 3 4 83.922 1
因此重复的开始结束组合将被删除,权重将增加并且持续时间将相加
我已经有一部分在工作了,只是无法让重量工作:
library('plyr')
df <- read.table(header = TRUE, text = "start end duration_time
1 1 2 2.438
2 2 1 3.901
3 1 2 18.037
4 2 3 85.861
5 3 4 83.922")
ddply(df, c("start","end"), summarise, weight=? ,duration_time=sum(duration_time))
使用 data.table 的最简单解决方案:
library(data.table)
setDT(df)[, .(duration_time=sum(duration_time), wt = .N) , by =c("start", "end")]
start end duration_time wt
1: 1 2 20.475 2
2: 2 1 3.901 1
3: 2 3 85.861 1
4: 3 4 83.922 1
尝试使用 dplyr、tidyr
library(dplyr)
library(tidyr)
df1 <- df %>% unite(by_var, start,end)
df2 <- cbind(df1 %>% count(by_var), df1 %>% group_by(by_var)%>%
summarise( duration_time=sum(duration_time))%>%
separate(by_var, c("start","end")))[c(3,4,5,2)]
> df2
start end duration_time n
1 1 2 20.475 2
2 2 1 3.901 1
3 2 3 85.861 1
4 3 4 83.922 1
一个base R
选项是aggregate
do.call(data.frame, aggregate(duration_time~., df1,
FUN = function(x) c(duration_time=sum(x), weight = length(x))))
我正在尝试修改我的数据框:
start end duration_time
1 1 2 2.438
2 2 1 3.901
3 1 2 18.037
4 2 3 85.861
5 3 4 83.922
并创建如下内容:
start end duration_time weight
1 1 2 20.475 2
2 2 1 3.901 1
4 2 3 85.861 1
5 3 4 83.922 1
因此重复的开始结束组合将被删除,权重将增加并且持续时间将相加
我已经有一部分在工作了,只是无法让重量工作:
library('plyr')
df <- read.table(header = TRUE, text = "start end duration_time
1 1 2 2.438
2 2 1 3.901
3 1 2 18.037
4 2 3 85.861
5 3 4 83.922")
ddply(df, c("start","end"), summarise, weight=? ,duration_time=sum(duration_time))
使用 data.table 的最简单解决方案:
library(data.table)
setDT(df)[, .(duration_time=sum(duration_time), wt = .N) , by =c("start", "end")]
start end duration_time wt
1: 1 2 20.475 2
2: 2 1 3.901 1
3: 2 3 85.861 1
4: 3 4 83.922 1
尝试使用 dplyr、tidyr
library(dplyr)
library(tidyr)
df1 <- df %>% unite(by_var, start,end)
df2 <- cbind(df1 %>% count(by_var), df1 %>% group_by(by_var)%>%
summarise( duration_time=sum(duration_time))%>%
separate(by_var, c("start","end")))[c(3,4,5,2)]
> df2
start end duration_time n
1 1 2 20.475 2
2 2 1 3.901 1
3 2 3 85.861 1
4 3 4 83.922 1
一个base R
选项是aggregate
do.call(data.frame, aggregate(duration_time~., df1,
FUN = function(x) c(duration_time=sum(x), weight = length(x))))