按 R 中的列值组合 rows/observations
Combine rows/observations by column value in R
我有一个包含三列的数据框:
ID Class Score
abc123 Science 1
jkl456 Math 0
zpe789 English 1
abc123 Science 0
jkl456 Math 0
yth293 Art 1
我想按 ID 合并观察结果,并添加一列来汇总他们的分数(总分),显示他们尝试的次数(即使他们弄错了),并计算正确百分比,如下所示:
ID Class Total Score Number Attempted Percent
abc123 Science 1 2 50
jkl456 Math 0 2 0
zpe789 English 1 1 100
yth293 Art 1 1 100
是否有任何 R 包或函数可以跨 ID 和相应的 Class 崩溃并产生这些结果?谢谢。
df <- read.table(textConnection("ID Class Score
abc123 Science 1
jkl456 Math 0
zpe789 English 1
abc123 Science 0
jkl456 Math 0
yth293 Art 1"), header = TRUE)
然后做:
library(dplyr)
df %>% group_by(ID) %>%
summarise("Total_Score" = sum(Score),
"Number_Attempted" = n(),
"Percent" = (Total_Score/Number_Attempted)*100)
尝试:
library(dplyr)
df %>%
group_by(ID) %>%
summarize(TotalScore = sum(Score),
NumberAttempted=n(),
Percent = TotalScore/NumberAttempted*100)
#Source: local data frame [4 x 4]
#
# ID TotalScore NumberAttempted Percent
#1 abc123 1 2 50
#2 jkl456 0 2 0
#3 yth293 1 1 100
#4 zpe789 1 1 100
为了展示 dplyr
包的实用性,这里有一个类似的解决方案,没有使用特殊包。
newdf <- data.frame(TotalScore = with(df, tapply(Score, ID, FUN=sum)))
newdf$NumberAttempted <- with(df, tapply(Score, ID, FUN=length))
newdf$Percent <- 100*newdf$TotalScore/newdf$NumberAttempted
newdf
# TotalScore NumberAttempted Percent
#abc123 1 2 50
#jkl456 0 2 0
#yth293 1 1 100
#zpe789 1 1 100
作为结尾说明,带有空格的变量名会给进一步分析带来困难。
我有一个包含三列的数据框:
ID Class Score
abc123 Science 1
jkl456 Math 0
zpe789 English 1
abc123 Science 0
jkl456 Math 0
yth293 Art 1
我想按 ID 合并观察结果,并添加一列来汇总他们的分数(总分),显示他们尝试的次数(即使他们弄错了),并计算正确百分比,如下所示:
ID Class Total Score Number Attempted Percent
abc123 Science 1 2 50
jkl456 Math 0 2 0
zpe789 English 1 1 100
yth293 Art 1 1 100
是否有任何 R 包或函数可以跨 ID 和相应的 Class 崩溃并产生这些结果?谢谢。
df <- read.table(textConnection("ID Class Score
abc123 Science 1
jkl456 Math 0
zpe789 English 1
abc123 Science 0
jkl456 Math 0
yth293 Art 1"), header = TRUE)
然后做:
library(dplyr)
df %>% group_by(ID) %>%
summarise("Total_Score" = sum(Score),
"Number_Attempted" = n(),
"Percent" = (Total_Score/Number_Attempted)*100)
尝试:
library(dplyr)
df %>%
group_by(ID) %>%
summarize(TotalScore = sum(Score),
NumberAttempted=n(),
Percent = TotalScore/NumberAttempted*100)
#Source: local data frame [4 x 4]
#
# ID TotalScore NumberAttempted Percent
#1 abc123 1 2 50
#2 jkl456 0 2 0
#3 yth293 1 1 100
#4 zpe789 1 1 100
为了展示 dplyr
包的实用性,这里有一个类似的解决方案,没有使用特殊包。
newdf <- data.frame(TotalScore = with(df, tapply(Score, ID, FUN=sum)))
newdf$NumberAttempted <- with(df, tapply(Score, ID, FUN=length))
newdf$Percent <- 100*newdf$TotalScore/newdf$NumberAttempted
newdf
# TotalScore NumberAttempted Percent
#abc123 1 2 50
#jkl456 0 2 0
#yth293 1 1 100
#zpe789 1 1 100
作为结尾说明,带有空格的变量名会给进一步分析带来困难。