在我使用 dplyr 中的 group_by() 的地方保留重复的条目

Question

library(dplyr) ##activates the data.table library

mydataWithWeeksAndWeights <- data_frame(ended = c("14/11/2016",
                                                  "14/11/2016",
                                                  "14/11/2016",
                                                  "02/01/2017",
                                                  "02/01/2017",
                                                  "15/11/2017",
                                                  "15/11/2017",
                                                  "16/11/2017",
                                                  "16/11/2017"),
                                        week = c(46, 46, 46, 1, 1, 46, 46, 46, 46),
                                        satisfactionLevel = c("Very dissatisfied",
                                                              "Very satisfied",
                                                              "Satisfied",
                                                              "Dissatisfied",
                                                              "Very dissatisfied",
                                                              "Very satisfied",
                                                              "Very dissatisfied",
                                                              "Very Satisfied",
                                                              "Very satisfied"),
                                        weight = c(0, 1, 0.75, 0.25, 0, 1, 0, 1, 1))

当我调用以下函数时 pivotTable <- mydataWithWeeksAndWeights %>% group_by(week, weight) %>% count(satisfactionLevel) 它会计算第 46 周所有条目的满意度。问题是前三行的第 46 周指的是 2016 年，其余的指的是 2017 年。我想保留这些重复的条目。

Answer 1

我不能确定我的代码是否按照您的要求执行，因为您没有给出预期的输出，但我认为您需要做的是添加一个 year 列并将其添加到group_by 以便您区分 2016 年第 46 周和 2017 年第 46 周。

编辑：如果您需要从您拥有的结束日期自动定义年份，我将在@docendodiscimus 的评论中添加：

library(dplyr)

mydataWithWeeksAndWeights <- data_frame(ended = c("14/11/2016",
                                                  "14/11/2016",
                                                  "14/11/2016",
                                                  "02/01/2017",
                                                  "02/01/2017",
                                                  "15/11/2017",
                                                  "15/11/2017",
                                                  "16/11/2017",
                                                  "16/11/2017"),
                                        week = c(46, 46, 46, 1, 1, 46, 46, 46, 46),
                                        satisfactionLevel = c("Very dissatisfied",
                                                              "Very satisfied",
                                                              "Satisfied",
                                                              "Dissatisfied",
                                                              "Very dissatisfied",
                                                              "Very satisfied",
                                                              "Very dissatisfied",
                                                              "Very Satisfied",
                                                              "Very satisfied"),
                                        weight = c(0, 1, 0.75, 0.25, 0, 1, 0, 1, 1))

mydataWithWeeksAndWeights$year <- format(as.Date(mydataWithWeeksAndWeights$ended,
                                                 "%d/%m/%Y"), "%Y")

pivotTable <- mydataWithWeeksAndWeights %>%
  group_by(week, year, weight) %>%
  count(satisfactionLevel)

Answer 2

我会这样做：将 "ended" 重新格式化为日期格式并使用聚合函数：

# just to shorten df-name
df <- mydataWithWeeksAndWeights 

# reformat and add column with year
df[,"ended"] <- as.Date(df[[1]], format = "%d/%m/%Y")
df$year <- format(df[[1]], "%Y")

# actual aggregating
aggregate (df$weight, by = list(df$year, df$satisfactionLevel, df$week), FUN = sum)

希望对您有所帮助！

在我使用 dplyr 中的 group_by() 的地方保留重复的条目

Keep duplicate entries where I use group_by() from dplyr

r

data-manipulation

dplyr