根据 r 中的特定标准重新编码数值变量

Question

我想根据分数标准重新编码一个数值变量。如果变量中没有分数，我想将最接近的较小值重新编码为分数。这是数据集的快照：

ids <- c(1,2,3,4,5,6,7,8,9,10)
scores <- c(512,531,541,555,562,565,570,572,573,588)
data <- data.frame(ids, scores)
> data
   ids scores
1    1    512
2    2    531
3    3    541
4    4    555
5    5    562
6    6    565
7    7    570
8    8    572
9    9    573
10  10    588

cuts <- c(531, 560, 575)

第一个截断分数 (531) 在数据集中。所以它将与 531 保持不变。但是，560 和 575 不可用。我想将最接近的较小值 (555) 重新编码为新列中的第二个切分 560，对于第三个切分，我想重新编码 573作为 575。

这是我想要得到的。

   ids scores  rescored
1    1    512   512
2    2    531   531
3    3    541   541
4    4    555   560
5    5    562   562
6    6    565   565
7    7    570   570
8    8    572   572
9    9    573   575
10  10    588   588

有什么想法吗？谢谢

Answer 1

一个选项是使用 findInterval 找到索引，然后使用 'cuts' 获取对应于该索引的 'scores' 的 pmax 并更新 'rescored' 该索引上的列元素

i1 <- with(data, findInterval(cuts, scores))
data$rescored <- data$scores
data$rescored[i1] <- with(data, pmax(scores[i1], cuts))
data
#   ids scores rescored
#1    1    512      512
#2    2    531      531
#3    3    541      541
#4    4    555      560
#5    5    562      562
#6    6    565      565
#7    7    570      570
#8    8    572      572
#9    9    573      575
#10  10    588      588

根据 r 中的特定标准重新编码数值变量

recoding a numerical variable based on a specific criterion in r

r

recode