根据 R 中的范围拆分数据
Splitting data based on the ranges in R
我想知道如何将主题分为 4 个不同的 ranges/level。每个级别都有一定的范围。以下是数据。
Std Name Subject Percentage
2 Vinay eng 50
2 Vinay math 60
2 Vinay hindi 70
2 Rohan eng 70
2 vas mat 50
2 dheer eng 35
2 dheer math 90
2 dheer hindi 80
2 Bhas eng 90
2 Bhas math 35
2 Bhas hindi 50
四个桶的范围如下。 <=35、35-50、50-75、>75
预期输出:
Std Subject 0-35 35-50 50-75 >75
2 Eng 25% 25% 25% 25%
2 Mat 25% 25% 25% 25%
2 Hin 0% 25% 25% 25%
P.s 范围值是在该范围内得分的学生百分比。
提前致谢
这应该行得通,可能需要更多的格式化工作:
df<-read.table(header = TRUE, sep=",", text="Std, Name, Subject, Percentage
2, Vinay,eng, 50
2, Vinay,math, 60
2, Vinay,hindi, 70
2, Rohan,eng, 70
2, vas,math, 50
2, dheer,eng, 35
2, dheer,math, 90
2, dheer,hindi, 80
2, Bhas,eng, 90
2, Bhas,math, 35
2, Bhas,hindi, 50")
breaks<-c(0, 35, 50, 75, 100)
t<-table(df$Subject, responseName=cut(df$Percentage, breaks = breaks) )
format(t/rowSums(t), digits=3)
可能的data.table解决方案:
library(data.table)
dat <- data.table(Std = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
Name = c("Vinay", "Vinay", "Vinay", "Rohan", "vas", "dheer", "dheer", "dheer", "Bhas", "Bhas", "Bhas"),
Subject = c("eng", "math", "hindi", "eng", "mat", "eng", "math", "hindi", "eng", "math", "hindi"),
Percentage = c(50L, 60L, 70L, 70L, 50L, 35L, 90L, 80L, 90L, 35L, 50L))
dat[, PCTs := cut(Percentage,
breaks = c(0, 35, 50, 75, 100),
include.lowest = TRUE)]
res <- dat[, list(
"0-35" = sum(PCTs == "[0,35]") / .N * 100,
"35-50" = sum(PCTs == "(35,50]") / .N * 100,
"50-75" = sum(PCTs == "(50,75]") / .N * 100,
">75" = sum(PCTs == "(75,100]") / .N * 100
),
by = c("Std", "Subject")]
print(res, digits = 2)
我想知道如何将主题分为 4 个不同的 ranges/level。每个级别都有一定的范围。以下是数据。
Std Name Subject Percentage
2 Vinay eng 50
2 Vinay math 60
2 Vinay hindi 70
2 Rohan eng 70
2 vas mat 50
2 dheer eng 35
2 dheer math 90
2 dheer hindi 80
2 Bhas eng 90
2 Bhas math 35
2 Bhas hindi 50
四个桶的范围如下。 <=35、35-50、50-75、>75
预期输出:
Std Subject 0-35 35-50 50-75 >75
2 Eng 25% 25% 25% 25%
2 Mat 25% 25% 25% 25%
2 Hin 0% 25% 25% 25%
P.s 范围值是在该范围内得分的学生百分比。
提前致谢
这应该行得通,可能需要更多的格式化工作:
df<-read.table(header = TRUE, sep=",", text="Std, Name, Subject, Percentage
2, Vinay,eng, 50
2, Vinay,math, 60
2, Vinay,hindi, 70
2, Rohan,eng, 70
2, vas,math, 50
2, dheer,eng, 35
2, dheer,math, 90
2, dheer,hindi, 80
2, Bhas,eng, 90
2, Bhas,math, 35
2, Bhas,hindi, 50")
breaks<-c(0, 35, 50, 75, 100)
t<-table(df$Subject, responseName=cut(df$Percentage, breaks = breaks) )
format(t/rowSums(t), digits=3)
可能的data.table解决方案:
library(data.table)
dat <- data.table(Std = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
Name = c("Vinay", "Vinay", "Vinay", "Rohan", "vas", "dheer", "dheer", "dheer", "Bhas", "Bhas", "Bhas"),
Subject = c("eng", "math", "hindi", "eng", "mat", "eng", "math", "hindi", "eng", "math", "hindi"),
Percentage = c(50L, 60L, 70L, 70L, 50L, 35L, 90L, 80L, 90L, 35L, 50L))
dat[, PCTs := cut(Percentage,
breaks = c(0, 35, 50, 75, 100),
include.lowest = TRUE)]
res <- dat[, list(
"0-35" = sum(PCTs == "[0,35]") / .N * 100,
"35-50" = sum(PCTs == "(35,50]") / .N * 100,
"50-75" = sum(PCTs == "(50,75]") / .N * 100,
">75" = sum(PCTs == "(75,100]") / .N * 100
),
by = c("Std", "Subject")]
print(res, digits = 2)