通过每个组中的最大值更改组 ID 中的特定行

Question

我在 R 中有一个 150 万行和 23 列的数据集，它看起来像：

ID  Week col1 col2 col3 ...
A   1    2    3    1
A   2    3    4    1
...
A   69   15   2    11
B   1    5    1    2
B   2    6    10   3
...
B   69   2    1    1
Z   1    1    12   2
Z   2    4    5    3
...
Z   69   1    20   2

我想改变每个ID，但只在"Week" 69，每个组ID的最大值的三分之一

例如： col1 中 ID = A 的最大值除以 3 并在原始数据集中替换它。

我目前的逻辑，似乎不起作用：

index<-unique(data$ID)
dat<-filter(data, id== index[1])

b<-sapply(dat[,3:23],max)
b<-b/3
dat[69,4:23]<-dat[69,4:23]+b
data.alt<-dat
enter code here
for (i in 2:19477)
{
dat<-filter(data, id== index[i])

b<-sapply(dat[,4:23],max)
b<-b/3
dat[69,4:23]<-dat[69,4:23]+b
data.alt<-rbind(data.alt,dat)
}

Answer 1

我们可以使用data.table方法。从原始数据集创建一个 names 的向量，其中列名称中有 col ('nm1')，paste 和 'i.' 创建第二个向量 ('nm2' - 用于在连接时分配值），然后使用按 'ID' 分组的 'cols' 的 max 汇总数据集，并将 .SDcols 指定为 'nm1'，创建一列 'Week' 为 '69'，join 两个数据集 on，'ID'，'Week' 并分配 (:=) 'nm2' 到 'nm1' 列

library(data.table)
nm1 <- grep("col", names(df1), value = TRUE)
nm2 <- paste0("i.", nm1)
df2 <- setDT(df1)[, lapply(.SD, max) , ID, .SDcols = nm1][, Week := factor(69)][]
df1[df2, (nm1) := mget(nm2), on = .(ID, Week)]
df1

更新

如果我们要替换 'nm1' 列的 max 值除以 3，其中 'Week' 为 69，

setDT(df1)[, (nm1) := lapply(.SD, as.numeric), .SDcol = nm1]
df2 <-  df1[, lapply(.SD, function(x) max(x)/3) , ID, .SDcols = nm1][, Week := factor(69)][]
df1[df2, (nm1) := mget(nm2), on = .(ID, Week)]

更新2

如果我们需要add到原来的值，把最后一行代码改成

df1[df2,  (nm1) := Map(`+`, mget(nm1), mget(nm2)), on = .(ID, Week)]

数据

df1 <- structure(list(ID = c("A", "A", "A", "B", "B", "B", "Z", "Z", 
"Z"), Week = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1", 
"2", "69"), class = "factor"), col1 = c(2L, 3L, 15L, 5L, 6L, 
2L, 1L, 4L, 1L), col2 = c(3L, 4L, 2L, 1L, 10L, 1L, 12L, 5L, 20L
), col3 = c(1L, 1L, 11L, 2L, 3L, 1L, 2L, 3L, 2L)), .Names = c("ID", 
"Week", "col1", "col2", "col3"), row.names = c(NA, -9L), 
class = "data.frame")

通过每个组中的最大值更改组 ID 中的特定行

Alter an specific row from a group id by the max value in each group

indexing

group-by

r

data-manipulation

filter

更新

更新2

数据