使用 dplyr、过滤器、group_by 和汇总计算 R 中的平均天数?
Calculate mean days in R using dplyr, filter, group_by and summarise?
我想创建一个 table 来显示 submitted_via 的平均天数(请参阅 consumer_compliants.csv)使用 date_diff,减去 date_sent 和date_received。过滤数据以仅显示大于 0 的 date_diff 值。所有这些都必须使用 dplyr、%>%、过滤器、group_by 和 summarise_at、knitr::kable( )
我已经在 R 中试过了
date_received <- as.Date(mydata$date_received, "%m/%d/%Y")
date_sent <- as.Date(mydata$date_sent_to_company, "%m/%d/%Y")
date_diff <- (date_sent) - (date_received)
mydata %>%
filter(date_diff > 0) %>%
group_by(date_received, date_sent_to_company) %>%
summarise(
a = mean(date_diff))
输出:
Email 11.973214 days
Fax 7.057072 days
Phone 6.290040 days
Postal mail 9.627809 days
Referral 6.761684 days
Web 10.695773 days
有什么建议吗?
这可能更接近您的要求:
library(dplyr)
mydata %>%
mutate_at(vars(starts_with("date_")), as.Date, format = "%m/%d/%Y") %>%
mutate(date_diff = date_received - date_sent) %>%
filter(date_diff > 0) %>%
group_by(submitted_via) %>%
summarise(a = mean(date_diff))
输出
# A tibble: 3 x 2
submitted_via a
<fct> <drtn>
1 phone 22 days
2 Referral 27 days
3 web 4 days
数据
mydata <- read.table(
text =
"date_received date_sent submitted_via
9/30/2015 9/3/2015 Referral
9/3/2015 8/30/2015 web
9/25/2015 9/3/2015 phone
9/18/2015 9/18/2015 Referral", header = T
)
在 base R 中,我们可以按以下方式进行:
#select the date columns
cols <- c("date_received", "date_sent_to_company")
#Change the columns to date class
consumer_complaints[cols] <- lapply(consumer_complaints[cols],as.Date,"%m/%d/%Y")
#Suntract values between date_sent_to_company and date_received
#Select rows where dat_diff is greater than 0 and take mean for each submitted_via
aggregate(date_diff~submitted_via, subset(transform(consumer_complaints,
date_diff = date_sent_to_company - date_received), date_diff > 0), mean)
# submitted_via date_diff
#1 Email 11.97
#2 Fax 7.06
#3 Phone 6.29
#4 Postal mail 9.63
#5 Referral 6.76
#6 Web 10.70
我想创建一个 table 来显示 submitted_via 的平均天数(请参阅 consumer_compliants.csv)使用 date_diff,减去 date_sent 和date_received。过滤数据以仅显示大于 0 的 date_diff 值。所有这些都必须使用 dplyr、%>%、过滤器、group_by 和 summarise_at、knitr::kable( )
我已经在 R 中试过了
date_received <- as.Date(mydata$date_received, "%m/%d/%Y")
date_sent <- as.Date(mydata$date_sent_to_company, "%m/%d/%Y")
date_diff <- (date_sent) - (date_received)
mydata %>%
filter(date_diff > 0) %>%
group_by(date_received, date_sent_to_company) %>%
summarise(
a = mean(date_diff))
输出:
Email 11.973214 days
Fax 7.057072 days
Phone 6.290040 days
Postal mail 9.627809 days
Referral 6.761684 days
Web 10.695773 days
有什么建议吗?
这可能更接近您的要求:
library(dplyr)
mydata %>%
mutate_at(vars(starts_with("date_")), as.Date, format = "%m/%d/%Y") %>%
mutate(date_diff = date_received - date_sent) %>%
filter(date_diff > 0) %>%
group_by(submitted_via) %>%
summarise(a = mean(date_diff))
输出
# A tibble: 3 x 2
submitted_via a
<fct> <drtn>
1 phone 22 days
2 Referral 27 days
3 web 4 days
数据
mydata <- read.table(
text =
"date_received date_sent submitted_via
9/30/2015 9/3/2015 Referral
9/3/2015 8/30/2015 web
9/25/2015 9/3/2015 phone
9/18/2015 9/18/2015 Referral", header = T
)
在 base R 中,我们可以按以下方式进行:
#select the date columns
cols <- c("date_received", "date_sent_to_company")
#Change the columns to date class
consumer_complaints[cols] <- lapply(consumer_complaints[cols],as.Date,"%m/%d/%Y")
#Suntract values between date_sent_to_company and date_received
#Select rows where dat_diff is greater than 0 and take mean for each submitted_via
aggregate(date_diff~submitted_via, subset(transform(consumer_complaints,
date_diff = date_sent_to_company - date_received), date_diff > 0), mean)
# submitted_via date_diff
#1 Email 11.97
#2 Fax 7.06
#3 Phone 6.29
#4 Postal mail 9.63
#5 Referral 6.76
#6 Web 10.70