通过从另一个 table 中查找值来更新列值

Question

所以我希望能够根据 df table 找到 ScoreLU 值。例如，DSCRpd 中的值 1.3730682 应该 return ScoreLU 值 60，因为它大于 1.35 但小于下一个值 1.65。

另一方面，对于杠杆列，它需要按降序排列，即第一个值 2.01 应该 return 值 60，因为它小于 2.5 但大于下一个值 2.0 .

[df][1]
   DSCRpd Leverage         TCB
1  1.3730682 2.010122 -1590099.11
2  1.0449597 2.680051   493370.85
3  1.0311141 4.790531    21594.63
4  1.3923007 3.279903  -499326.76
5  1.6443938 3.853003   988780.79
6  0.6265976 1.814359  1003736.73
7  2.1025253 4.412528  1245305.83
8  1.2872873 2.074424  -688305.83
9  0.5088294 2.504510  1406986.68
10 1.7794307 3.724905  1132513.33


[ScoreLU][2]
      Score DSCRpd Leverage     TCB
 1:       0   0.65      5.0       0
 2:      10   0.80      4.5  100000
 3:      20   0.95      4.0  250000
 4:      30   1.10      3.5  500000
 5:      40   1.20      3.0  850000
 6:      50   1.26      2.5 1250000
 7:      60   1.35      2.0 1700000
 8:      70   1.65      1.5 2300000
 9:      80   2.00      1.0 2900000
10:      90   2.30      0.5 3600000

是的，就像excel中的vlookup函数一样，具有Asc和Desc排序能力。帮助。

我有一个函数可以正确获取值...但是我如何在每一列上使用它来将值填充到正确的列中，即对于 DSCRpd 分数，结果应该更新到名为 DSCRpdScore 的列.

此函数查看列号为 CN 的数据框 'df'，并且 return 是基于 x 的适当值。

myFUN = function(df, x, CN){
if (dtScoreLU[1,CN] <= median(dtScoreLU[,CN])){
    myMax = max(dtScoreLU[(dtScoreLU[,CN] <= x),CN])
    return(dtScoreLU %>% select(Score) %>% 
    filter(dtScoreLU[,CN] == myMax))
    } else {
    myMin = min(dtScoreLU[as.vector(dtScoreLU[,CN] >= x),CN])
    return(dtScoreLU %>% select(Score) %>% 
    filter(dtScoreLU[,CN] == myMin))
    } 
}

Answer 1

据我了解，这似乎是 data.table 滚动连接功能的一个很好的候选者。

So I want to be able to find the ScoreLU value based on the df table. For example the value of 1.3730682 in DSCRpd should return the ScoreLU value of 60 because it is larger than 1.35 but less than the next value of 1.65.

library(data.table)
ScoreLU[, .(DSCRpd, Score)][df, ,on = 'DSCRpd', roll = TRUE]

       DSCRpd Score Leverage         TCB
 1: 1.3730682    60 2.010122 -1590099.11
 2: 1.0449597    20 2.680051   493370.85
 3: 1.0311141    20 4.790531    21594.63
 4: 1.3923007    60 3.279903  -499326.76
 5: 1.6443938    60 3.853003   988780.79
 6: 0.6265976    NA 1.814359  1003736.73
 7: 2.1025253    80 4.412528  1245305.83
 8: 1.2872873    50 2.074424  -688305.83
 9: 0.5088294    NA 2.504510  1406986.68
10: 1.7794307    70 3.724905  1132513.33

On the other hand for the Leverage column it needs to be in Desc order i.e. the first value of 2.01 should return the value of 60 as it is less than 2.5 but greater than the next value of 2.0.

ScoreLU[, .(Leverage, Score)][df, , on = 'Leverage', roll = TRUE]

    Leverage Score    DSCRpd         TCB
 1: 2.010122    60 1.3730682 -1590099.11
 2: 2.680051    50 1.0449597   493370.85
 3: 4.790531    10 1.0311141    21594.63
 4: 3.279903    40 1.3923007  -499326.76
 5: 3.853003    30 1.6443938   988780.79
 6: 1.814359    70 0.6265976  1003736.73
 7: 4.412528    20 2.1025253  1245305.83
 8: 2.074424    60 1.2872873  -688305.83
 9: 2.504510    50 0.5088294  1406986.68
10: 3.724905    30 1.7794307  1132513.33

如果您愿意，可以将它们组合起来：

ScoreLU[, .(Leverage, Score)][
  ScoreLU[, .(DSCRpd, Score)][
    df, ,on = 'DSCRpd', roll = TRUE
    ], , on = 'Leverage', roll = TRUE]

    Leverage Score    DSCRpd i.Score         TCB
 1: 2.010122    60 1.3730682      60 -1590099.11
 2: 2.680051    50 1.0449597      20   493370.85
 3: 4.790531    10 1.0311141      20    21594.63
 4: 3.279903    40 1.3923007      60  -499326.76
 5: 3.853003    30 1.6443938      60   988780.79
 6: 1.814359    70 0.6265976      NA  1003736.73
 7: 4.412528    20 2.1025253      80  1245305.83
 8: 2.074424    60 1.2872873      50  -688305.83
 9: 2.504510    50 0.5088294      NA  1406986.68
10: 3.724905    30 1.7794307      70  1132513.33

对于Score两个变量的结尾，您可以根据需要指定rollends参数。如果您也有时间，我会给 ?data.table 通读一遍。它有助于入门，因为语法有时可能有点不透明。

我是 data.table 的新手，欢迎有更多专业知识的人加入。

数据

ScoreLU <- structure(list(Score = c(0L, 10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L, 90L),
                          DSCRpd = c(0.65, 0.8, 0.95, 1.1, 1.2, 1.26, 1.35, 1.65, 2, 2.3),
                          Leverage = c(5, 4.5, 4, 3.5, 3, 2.5, 2, 1.5, 1, 0.5),
                          TCB = c(0L, 100000L, 250000L, 500000L, 850000L, 1250000L, 1700000L, 2300000L, 2900000L, 3600000L)),
                     .Names = c("Score", "DSCRpd", "Leverage", "TCB"), row.names = c(NA, -10L), class = c("data.table", "data.frame"))

df <- structure(list(DSCRpd = c(1.3730682, 1.0449597, 1.0311141, 1.3923007, 1.6443938, 0.6265976, 2.1025253, 1.2872873, 0.5088294, 1.7794307),
                     Leverage = c(2.010122, 2.680051, 4.790531, 3.279903, 3.853003, 1.814359, 4.412528, 2.074424, 2.50451, 3.724905),
                     TCB = c(-1590099.11, 493370.85, 21594.63, -499326.76, 988780.79, 1003736.73, 1245305.83, -688305.83, 1406986.68, 1132513.33)),
                .Names = c("DSCRpd", "Leverage", "TCB"),
                row.names = c(NA, -10L), class = c("data.table", "data.frame" ))

通过从另一个 table 中查找值来更新列值

update column values by looking up value from another table

lookup

r

数据