如何简单地 select R 中矩阵中特定行的特定数量

How to simply select specific number of specific rows in a matrix in R

以下是我的数据,

num<- c(6,8,7,5,9,6,7)
x<- matrix(c(rep(1:7,num),rnorm(sum(num))), nrow=sum(num), ncol=2)
colnames(x)<-c("subject","value")

x
subject value
[1,] 1 0.35182560
[2,] 1 0.35933614
[3,] 1 -0.89029320
[4,] 1 -0.79991981
[5,] 1 1.10773640
[6,] 1 -1.73900484
[7,] 2 1.06632139
[8,] 2 0.71727759
[9,] 2 0.51002247
[10,] 2 1.36132224
[11,] 2 -0.85432175
[12,] 2 -0.49878742
[13,] 2 1.43705322
[14,] 2 0.34052593
[15,] 3 -0.43245360
[16,] 3 1.01687525
[17,] 3 0.48998138
[18,] 3 -1.06197379
[19,] 3 -0.19777785
[20,] 3 1.24940714
[21,] 3 0.47521229
[22,] 4 -0.99888249
[23,] 4 -0.12678874
[24,] 4 -1.14620801
[25,] 4 -1.29165060
[26,] 4 1.56110270
[27,] 5 0.82543156
[28,] 5 -0.61718617
[29,] 5 0.22357131
[30,] 5 0.59639380
[31,] 5 2.72122980
[32,] 5 0.58674354
[33,] 5 0.23674196
[34,] 5 0.78656422
[35,] 5 0.10426860
[36,] 6 0.93059568
[37,] 6 0.16065327
[38,] 6 -2.23496916
[39,] 6 -1.75680495
[40,] 6 0.49717967
[41,] 6 1.13033910
[42,] 7 0.71402667
[43,] 7 -0.06120018
[44,] 7 -0.67636605
[45,] 7 0.46402913
[46,] 7 -0.99090058
[47,] 7 1.58853435
[48,] 7 -1.15982415

我的任务是select将每个科目的具体数据改成新的矩阵。
每个科目的具体人数是

b<- ceiling(num*0.5)

也就是

b
[1] 3 4 4 3 5 3 4

也就是说,我需要提取
主题 1 的前 3 行,
主题 2 的前 4 行,
主题 3 的前 4 行,
...
主题 7 的前 4 行,
形成一个新的矩阵。

以下是我自己的编码:

b<- ceiling(a*0.5)
newx<- matrix(0, nrow=sum(b), ncol=2)
newx<- do.call(rbind, sapply(1:7, function(i){head(x[x[,1]==i,], b[i])} ) )

可以,但是需要时间,有没有更简单的方法解决这个问题?

newx
subject value
[1,] 1 0.35182560
[2,] 1 0.35933614
[3,] 1 -0.89029320
[4,] 2 1.06632139
[5,] 2 0.71727759
[6,] 2 0.51002247
[7,] 2 1.36132224
[8,] 3 -0.43245360
[9,] 3 1.01687525
[10,] 3 0.48998138
[11,] 3 -1.06197379
[12,] 4 -0.99888249
[13,] 4 -0.12678874
[14,] 4 -1.14620801
[15,] 5 0.82543156
[16,] 5 -0.61718617
[17,] 5 0.22357131
[18,] 5 0.59639380
[19,] 5 2.72122980
[20,] 6 0.93059568
[21,] 6 0.16065327
[22,] 6 -2.23496916
[23,] 7 0.71402667
[24,] 7 -0.06120018
[25,] 7 -0.67636605
[26,] 7 0.46402913

如果您想保留 'half' 每个主题类型的行,这里是使用 dplyr 包的一种方法:

library(dplyr)
num<- c(6,8,7,5,9,6,7)
df <- as.data.frame(matrix(c(rep(1:7,num),rnorm(sum(num))), nrow=sum(num), ncol=2))
df %>% group_by(subject) %>% slice(1:(n()/2))

在 base R 中(假设第一列订购 x):

x[rep(match(unique(x[,1]),x[,1]),b)+sequence(b)-1,]

我们split将'x'中的行序列由'subject'列创建一个list,使用Map得到head 通过将 n 指定为 'b'(使用 listvector 的对应元素),unlist 和'x' 的行子集。

x[unlist(Map(head, split(seq_len(nrow(x)), x[,1]), b)),]

另一个选项是使用 data.table。我们将 'x' 转换为 'data.table',用 [=33= 创建第二个 data.table,将 key 列设置为 'subject',通过 [= 连接两者23=] 并得到 .SDhead

library(data.table)
d1 <- as.data.table(x)
d2 <- data.table(subject=seq_along(b), b)
setkey(d1, subject)
sekey(d2, subject)
d1[d2, head(.SD,b) , by = .EACHI]