在我使用 dplyr 中的 group_by() 的地方保留重复的条目
Keep duplicate entries where I use group_by() from dplyr
library(dplyr) ##activates the data.table library
mydataWithWeeksAndWeights <- data_frame(ended = c("14/11/2016",
"14/11/2016",
"14/11/2016",
"02/01/2017",
"02/01/2017",
"15/11/2017",
"15/11/2017",
"16/11/2017",
"16/11/2017"),
week = c(46, 46, 46, 1, 1, 46, 46, 46, 46),
satisfactionLevel = c("Very dissatisfied",
"Very satisfied",
"Satisfied",
"Dissatisfied",
"Very dissatisfied",
"Very satisfied",
"Very dissatisfied",
"Very Satisfied",
"Very satisfied"),
weight = c(0, 1, 0.75, 0.25, 0, 1, 0, 1, 1))
当我调用以下函数时 pivotTable <- mydataWithWeeksAndWeights %>% group_by(week, weight) %>% count(satisfactionLevel)
它会计算第 46 周所有条目的满意度。问题是前三行的第 46 周指的是 2016 年,其余的指的是 2017 年。我想保留这些重复的条目。
我不能确定我的代码是否按照您的要求执行,因为您没有给出预期的输出,但我认为您需要做的是添加一个 year
列并将其添加到group_by
以便您区分 2016 年第 46 周和 2017 年第 46 周。
编辑:如果您需要从您拥有的结束日期自动定义年份,我将在@docendodiscimus 的评论中添加:
library(dplyr)
mydataWithWeeksAndWeights <- data_frame(ended = c("14/11/2016",
"14/11/2016",
"14/11/2016",
"02/01/2017",
"02/01/2017",
"15/11/2017",
"15/11/2017",
"16/11/2017",
"16/11/2017"),
week = c(46, 46, 46, 1, 1, 46, 46, 46, 46),
satisfactionLevel = c("Very dissatisfied",
"Very satisfied",
"Satisfied",
"Dissatisfied",
"Very dissatisfied",
"Very satisfied",
"Very dissatisfied",
"Very Satisfied",
"Very satisfied"),
weight = c(0, 1, 0.75, 0.25, 0, 1, 0, 1, 1))
mydataWithWeeksAndWeights$year <- format(as.Date(mydataWithWeeksAndWeights$ended,
"%d/%m/%Y"), "%Y")
pivotTable <- mydataWithWeeksAndWeights %>%
group_by(week, year, weight) %>%
count(satisfactionLevel)
我会这样做:将 "ended" 重新格式化为日期格式并使用聚合函数:
# just to shorten df-name
df <- mydataWithWeeksAndWeights
# reformat and add column with year
df[,"ended"] <- as.Date(df[[1]], format = "%d/%m/%Y")
df$year <- format(df[[1]], "%Y")
# actual aggregating
aggregate (df$weight, by = list(df$year, df$satisfactionLevel, df$week), FUN = sum)
希望对您有所帮助!
library(dplyr) ##activates the data.table library
mydataWithWeeksAndWeights <- data_frame(ended = c("14/11/2016",
"14/11/2016",
"14/11/2016",
"02/01/2017",
"02/01/2017",
"15/11/2017",
"15/11/2017",
"16/11/2017",
"16/11/2017"),
week = c(46, 46, 46, 1, 1, 46, 46, 46, 46),
satisfactionLevel = c("Very dissatisfied",
"Very satisfied",
"Satisfied",
"Dissatisfied",
"Very dissatisfied",
"Very satisfied",
"Very dissatisfied",
"Very Satisfied",
"Very satisfied"),
weight = c(0, 1, 0.75, 0.25, 0, 1, 0, 1, 1))
当我调用以下函数时 pivotTable <- mydataWithWeeksAndWeights %>% group_by(week, weight) %>% count(satisfactionLevel)
它会计算第 46 周所有条目的满意度。问题是前三行的第 46 周指的是 2016 年,其余的指的是 2017 年。我想保留这些重复的条目。
我不能确定我的代码是否按照您的要求执行,因为您没有给出预期的输出,但我认为您需要做的是添加一个 year
列并将其添加到group_by
以便您区分 2016 年第 46 周和 2017 年第 46 周。
编辑:如果您需要从您拥有的结束日期自动定义年份,我将在@docendodiscimus 的评论中添加:
library(dplyr)
mydataWithWeeksAndWeights <- data_frame(ended = c("14/11/2016",
"14/11/2016",
"14/11/2016",
"02/01/2017",
"02/01/2017",
"15/11/2017",
"15/11/2017",
"16/11/2017",
"16/11/2017"),
week = c(46, 46, 46, 1, 1, 46, 46, 46, 46),
satisfactionLevel = c("Very dissatisfied",
"Very satisfied",
"Satisfied",
"Dissatisfied",
"Very dissatisfied",
"Very satisfied",
"Very dissatisfied",
"Very Satisfied",
"Very satisfied"),
weight = c(0, 1, 0.75, 0.25, 0, 1, 0, 1, 1))
mydataWithWeeksAndWeights$year <- format(as.Date(mydataWithWeeksAndWeights$ended,
"%d/%m/%Y"), "%Y")
pivotTable <- mydataWithWeeksAndWeights %>%
group_by(week, year, weight) %>%
count(satisfactionLevel)
我会这样做:将 "ended" 重新格式化为日期格式并使用聚合函数:
# just to shorten df-name
df <- mydataWithWeeksAndWeights
# reformat and add column with year
df[,"ended"] <- as.Date(df[[1]], format = "%d/%m/%Y")
df$year <- format(df[[1]], "%Y")
# actual aggregating
aggregate (df$weight, by = list(df$year, df$satisfactionLevel, df$week), FUN = sum)
希望对您有所帮助!