如何创建列的频率,然后在 R 中对该数据执行聚合
How to create the frequency of a column and then perform an aggregation on that data in R
Objective:
我有一个数据集 df,我希望首先计算每个日期的出现次数,然后将输出乘以某个数字。
Sent Duration Length
1/7/2020 8:11:00 PM 34 216
1/22/2020 7:51:05 AM 432 111
1/7/2020 1:35:08 AM 57 90
1/22/2020 3:43:26 AM 22 212
1/22/2020 4:00:00 AM 55 500
期望的结果:
Date Count Aggregation(80)
1/7/2020 2 160
1/22/2020 3 240
我想计算特定 'datetime' 出现的次数,然后将此结果乘以 80。日期 1/7/2020 出现两次,日期 1/22/2020,出现三次。然后我将这个数字乘以数字 80。
输出为:
structure(list(Sent = structure(c(5L, 3L, 4L, 1L, 2L), .Label = c("1/22/2020 3:43:26 AM",
"1/22/2020 4:00:00 AM", "1/22/2020 7:51:05 PM", "1/7/2020 1:35:08 AM",
"1/7/2020 8:11:00 PM"), class = "factor"), Duration = c(34L,
432L, 57L, 22L, 55L), length = c(216L, 111L, 90L, 212L, 500L)), class = "data.frame", row.names = c(NA,
-5L))
这是我试过的:
df1<- aggregate(df$Sent, by=list(Category= df$dSent),
FUN=length)
但是,我需要输出日期出现的频率以及聚合(乘以 80)
欢迎提出任何建议。
这是data.table
事情的方式..
代码
library( data.table )
#set data as data.table
setDT(mydata)
#set timestamps as posix
mydata[, Sent := as.POSIXct( Sent, format = "%m/%d/%Y %H:%M:%S %p" ) ]
#summarise
mydata[, .(Count = .N, Aggregation = .N * 80), by = .(Date = as.Date(Sent) )]
输出
# Date Count Aggregation
# 1: 2020-01-07 2 160
# 2: 2020-01-22 3 240
我们可以将Sent
格式转换为POSIXct
格式并提取日期,统计每个日期的行数并乘以80。使用dplyr
,我们可以做到如:
library(dplyr)
df %>%
group_by(Date = as.Date(lubridate::mdy_hms(Sent))) %>%
summarise(Count = n(), `Aggregation(80)` = Count * 80)
# Date Count `Aggregation(80)`
# <date> <int> <dbl>
#1 2020-01-07 2 160
#2 2020-01-22 3 240
使用 table
.
as.data.frame(cbind(Count=(r <- table(as.Date(df$Sent, format="%m/%d/%Y %H:%M:%S"))),
Agg=r*80))
# Count Agg
# 2020-01-07 2 160
# 2020-01-22 3 240
或
`rownames<-`(as.data.frame(cbind(Count=(r <- table(as.Date(df$Sent, format="%m/%d/%Y %H:%M:%S"))),
Agg=r*80, Date=names(r)))[c(3, 1:2)], NULL)
# Date Count Agg
# 1 2020-01-07 2 160
# 2 2020-01-22 3 240
Objective:
我有一个数据集 df,我希望首先计算每个日期的出现次数,然后将输出乘以某个数字。
Sent Duration Length
1/7/2020 8:11:00 PM 34 216
1/22/2020 7:51:05 AM 432 111
1/7/2020 1:35:08 AM 57 90
1/22/2020 3:43:26 AM 22 212
1/22/2020 4:00:00 AM 55 500
期望的结果:
Date Count Aggregation(80)
1/7/2020 2 160
1/22/2020 3 240
我想计算特定 'datetime' 出现的次数,然后将此结果乘以 80。日期 1/7/2020 出现两次,日期 1/22/2020,出现三次。然后我将这个数字乘以数字 80。
输出为:
structure(list(Sent = structure(c(5L, 3L, 4L, 1L, 2L), .Label = c("1/22/2020 3:43:26 AM",
"1/22/2020 4:00:00 AM", "1/22/2020 7:51:05 PM", "1/7/2020 1:35:08 AM",
"1/7/2020 8:11:00 PM"), class = "factor"), Duration = c(34L,
432L, 57L, 22L, 55L), length = c(216L, 111L, 90L, 212L, 500L)), class = "data.frame", row.names = c(NA,
-5L))
这是我试过的:
df1<- aggregate(df$Sent, by=list(Category= df$dSent),
FUN=length)
但是,我需要输出日期出现的频率以及聚合(乘以 80)
欢迎提出任何建议。
这是data.table
事情的方式..
代码
library( data.table )
#set data as data.table
setDT(mydata)
#set timestamps as posix
mydata[, Sent := as.POSIXct( Sent, format = "%m/%d/%Y %H:%M:%S %p" ) ]
#summarise
mydata[, .(Count = .N, Aggregation = .N * 80), by = .(Date = as.Date(Sent) )]
输出
# Date Count Aggregation
# 1: 2020-01-07 2 160
# 2: 2020-01-22 3 240
我们可以将Sent
格式转换为POSIXct
格式并提取日期,统计每个日期的行数并乘以80。使用dplyr
,我们可以做到如:
library(dplyr)
df %>%
group_by(Date = as.Date(lubridate::mdy_hms(Sent))) %>%
summarise(Count = n(), `Aggregation(80)` = Count * 80)
# Date Count `Aggregation(80)`
# <date> <int> <dbl>
#1 2020-01-07 2 160
#2 2020-01-22 3 240
使用 table
.
as.data.frame(cbind(Count=(r <- table(as.Date(df$Sent, format="%m/%d/%Y %H:%M:%S"))),
Agg=r*80))
# Count Agg
# 2020-01-07 2 160
# 2020-01-22 3 240
或
`rownames<-`(as.data.frame(cbind(Count=(r <- table(as.Date(df$Sent, format="%m/%d/%Y %H:%M:%S"))),
Agg=r*80, Date=names(r)))[c(3, 1:2)], NULL)
# Date Count Agg
# 1 2020-01-07 2 160
# 2 2020-01-22 3 240