根据 R 中的范围拆分数据

Question

我想知道如何将主题分为 4 个不同的 ranges/level。每个级别都有一定的范围。以下是数据。

Std   Name   Subject  Percentage
   2   Vinay   eng      50
   2   Vinay   math     60
   2   Vinay   hindi    70
   2   Rohan   eng      70
   2   vas     mat      50
   2   dheer   eng      35
   2   dheer   math     90
   2   dheer   hindi    80
   2   Bhas    eng      90
   2   Bhas    math     35
   2   Bhas    hindi    50

四个桶的范围如下。 <=35、35-50、50-75、>75

预期输出：

Std Subject 0-35  35-50  50-75  >75
2    Eng     25%  25%    25%   25%
2    Mat     25%  25%    25%   25%
2    Hin     0%   25%    25%   25%

P.s 范围值是在该范围内得分的学生百分比。

提前致谢

Answer 1

这应该行得通，可能需要更多的格式化工作：

df<-read.table(header = TRUE, sep=",", text="Std,   Name,   Subject,  Percentage
              2,   Vinay,eng,     50
               2,   Vinay,math,     60
               2,   Vinay,hindi,    70
               2,   Rohan,eng,      70
               2,   vas,math,      50
               2,   dheer,eng,      35
               2,   dheer,math,    90
               2,   dheer,hindi,    80
               2,   Bhas,eng,     90
               2,   Bhas,math,     35
               2,   Bhas,hindi,    50")

breaks<-c(0, 35, 50, 75, 100)
t<-table(df$Subject, responseName=cut(df$Percentage, breaks = breaks) )
format(t/rowSums(t), digits=3)

Answer 2

可能的data.table解决方案：

library(data.table)

dat <- data.table(Std = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
                  Name = c("Vinay", "Vinay", "Vinay", "Rohan", "vas", "dheer", "dheer", "dheer", "Bhas", "Bhas", "Bhas"),
                  Subject = c("eng", "math", "hindi", "eng", "mat", "eng", "math", "hindi", "eng", "math", "hindi"),
                  Percentage = c(50L, 60L, 70L, 70L, 50L, 35L, 90L, 80L, 90L, 35L, 50L))

dat[, PCTs := cut(Percentage,
                  breaks = c(0, 35, 50, 75, 100),
                  include.lowest = TRUE)]

res <- dat[, list(
               "0-35" = sum(PCTs == "[0,35]") / .N * 100,
               "35-50" = sum(PCTs == "(35,50]") / .N * 100,
               "50-75" = sum(PCTs == "(50,75]") / .N * 100,
               ">75" = sum(PCTs == "(75,100]") / .N * 100
             ),
             by = c("Std", "Subject")]

print(res, digits = 2)

根据 R 中的范围拆分数据

Splitting data based on the ranges in R

r

sqldf