按 R 中的列值组合 rows/observations

Question

我有一个包含三列的数据框：

 ID        Class     Score
 abc123    Science   1
 jkl456    Math      0
 zpe789    English   1
 abc123    Science   0
 jkl456    Math      0
 yth293    Art       1

我想按 ID 合并观察结果，并添加一列来汇总他们的分数（总分），显示他们尝试的次数（即使他们弄错了），并计算正确百分比，如下所示：

 ID        Class     Total Score     Number Attempted      Percent
 abc123    Science   1               2                      50
 jkl456    Math      0               2                       0
 zpe789    English   1               1                     100
 yth293    Art       1               1                     100

是否有任何 R 包或函数可以跨 ID 和相应的 Class 崩溃并产生这些结果？谢谢。

Answer 1

df <- read.table(textConnection("ID        Class     Score
 abc123    Science   1
 jkl456    Math      0
 zpe789    English   1
 abc123    Science   0
 jkl456    Math      0
 yth293    Art       1"), header = TRUE)

然后做：

library(dplyr)
df %>% group_by(ID) %>% 
  summarise("Total_Score" = sum(Score),
            "Number_Attempted" = n(),
            "Percent" = (Total_Score/Number_Attempted)*100)

Answer 2

尝试：

library(dplyr)
df %>%
  group_by(ID) %>%
  summarize(TotalScore = sum(Score), 
            NumberAttempted=n(), 
            Percent = TotalScore/NumberAttempted*100)
#Source: local data frame [4 x 4]
#
#      ID TotalScore NumberAttempted Percent
#1 abc123          1               2      50
#2 jkl456          0               2       0
#3 yth293          1               1     100
#4 zpe789          1               1     100

为了展示 dplyr 包的实用性，这里有一个类似的解决方案，没有使用特殊包。

newdf <- data.frame(TotalScore = with(df, tapply(Score, ID, FUN=sum)))
newdf$NumberAttempted <- with(df, tapply(Score, ID, FUN=length))
newdf$Percent <- 100*newdf$TotalScore/newdf$NumberAttempted
newdf
#       TotalScore NumberAttempted Percent
#abc123          1               2      50
#jkl456          0               2       0
#yth293          1               1     100
#zpe789          1               1     100

作为结尾说明，带有空格的变量名会给进一步分析带来困难。

按 R 中的列值组合 rows/observations

Combine rows/observations by column value in R

packages

r