R data.table - 是否有更有效的方法来融合多种方式并合并结果?

R data.table - Is there a more efficient way of melting multiple ways and merging results?

我正在尝试以两种不同的方式 melt data.table。然后我最终不得不合并结果 - 这很尴尬,因为我有不同的 measure.vars 并且如果我有更宽的 table / 更复杂的列名将无法扩展。

我从这个开始 data.table:

    id p1 p2     p1_pos     p2_pos
 1:  1  A  F 0.70644404 0.75523969
 2:  2  B  G 0.96798381 0.26280453
 3:  3  C  H 0.35558517 0.45418777
 4:  4  D  I 0.14662296 0.01969177
 5:  5  E  J 0.45155647 0.41373110
 6:  6  A  F 0.81074292 0.19421395
 7:  7  B  G 0.49014540 0.02094569
 8:  8  C  H 0.01445689 0.20199638
 9:  9  D  I 0.80327645 0.73982715
10: 10  E  J 0.17625955 0.88250913

接下来我继续融化两次然后合并:

dat = data.table(id = as.character(rep(1:10)),
           p1 = rep(c("A", "B", "C", "D", "E"), 2),
           p2 = rep(c("F", "G", "H", "I", "J"), 2),
           p1_pos = runif(10),
           p2_pos = runif(10))

           
first_melt = melt(dat, id.vars = "id", 
                  measure.vars = c("p1", "p2"), 
                  variable.name = "loc", 
                  value.name = "name", 
                  value.factor = F, 
                  variable.factor = F)

second_melt = melt(dat,
                   id.vars = "id", 
                   measure.vars = c("p1_pos", "p2_pos"), 
                   variable.name = "loc", 
                   value.name = "pos", 
                   value.factor = F, 
                   variable.factor = F)

second_melt[, loc := substr(loc, 1,2)]
result = merge(first_melt, second_melt, by = c("id", "loc"))
result[order(id)]

尴尬来自不同的“measure.vars”,然后需要合并。

这会产生预期的结果:

    id loc name        pos
 1:  1  p1    A 0.70644404
 2:  1  p2    F 0.75523969
 3: 10  p1    E 0.17625955
 4: 10  p2    J 0.88250913
 5:  2  p1    B 0.96798381
 6:  2  p2    G 0.26280453
 7:  3  p1    C 0.35558517
 8:  3  p2    H 0.45418777
 9:  4  p1    D 0.14662296
10:  4  p2    I 0.01969177
11:  5  p1    E 0.45155647
12:  5  p2    J 0.41373110
13:  6  p1    A 0.81074292
14:  6  p2    F 0.19421395
15:  7  p1    B 0.49014540
16:  7  p2    G 0.02094569
17:  8  p1    C 0.01445689
18:  8  p2    H 0.20199638
19:  9  p1    D 0.80327645
20:  9  p2    I 0.73982715

我的问题是这些是否是更有效的方法(即在单个 melt 命令中)?还是我尽可能地接近这个?

这是通过一个相同的列名合并两个数据框的简单方法

B = merge(df,otherdataname,by.x=1,by.y=1)

这个怎么样?

data.table::melt(
  dat, id.vars = "id", measure.vars = patterns("p[0-9]+$", "p[0-9]+_pos"), 
  variable.name = "loc", value.name = c("name", "pos")
)[, loc := paste0("p", loc)][]
#        id    loc   name        pos
#     <int> <char> <char>      <num>
#  1:     1     p1      A 0.70644404
#  2:     2     p1      B 0.96798381
#  3:     3     p1      C 0.35558517
#  4:     4     p1      D 0.14662296
#  5:     5     p1      E 0.45155647
#  6:     6     p1      A 0.81074292
#  7:     7     p1      B 0.49014540
#  8:     8     p1      C 0.01445689
#  9:     9     p1      D 0.80327645
# 10:    10     p1      E 0.17625955
# 11:     1     p2      F 0.75523969
# 12:     2     p2      G 0.26280453
# 13:     3     p2      H 0.45418777
# 14:     4     p2      I 0.01969177
# 15:     5     p2      J 0.41373110
# 16:     6     p2      F 0.19421395
# 17:     7     p2      G 0.02094569
# 18:     8     p2      H 0.20199638
# 19:     9     p2      I 0.73982715
# 20:    10     p2      J 0.88250913
#        id    loc   name        pos

数据

library(data.table)
dat <- setDT(structure(list(id = 1:10, p1 = c("A", "B", "C", "D", "E", "A", "B", "C", "D", "E"), p2 = c("F", "G", "H", "I", "J", "F", "G", "H", "I", "J"), p1_pos = c(0.70644404, 0.96798381, 0.35558517, 0.14662296, 0.45155647, 0.81074292, 0.4901454, 0.01445689, 0.80327645, 0.17625955), p2_pos = c(0.75523969, 0.26280453, 0.45418777, 0.01969177, 0.4137311, 0.19421395, 0.02094569, 0.20199638, 0.73982715, 0.88250913)), row.names = c(NA, -10L), class = c("data.table", "data.frame"))_

这是一个 data.table 方法,与 r2evans 的回答略有不同。 无论 p1-p2 的编号如何,它都会起作用...它会很容易处理不同的(或非数字的)名称。

# get different p's
p.v <- grep("^p[0-9]$", names(dat), value = TRUE)
# now melt
dat.melt <- melt(dat, 
                 id.vars = "id", 
                 measure.vars = patterns(name = "^p[0-9]$", pos = "^p[0-9]_pos$"),
                 variable.name = "loc")
# set loc-attributes
setattr(dat.melt$loc, "levels", p.v)
#    id loc name        pos
# 1:  1  p1    A 0.59099882
# 2:  2  p1    B 0.79727305
# 3:  3  p1    C 0.04180905
# 4:  4  p1    D 0.53533198
# 5:  5  p1    E 0.75851590
# 6:  6  p1    A 0.47344565
# 7:  7  p1    B 0.47035125
# 8:  8  p1    C 0.88675906
# 9:  9  p1    D 0.18159266
#10: 10  p1    E 0.97808083
#11:  1  p2    F 0.84751133
#12:  2  p2    G 0.13917469
#13:  3  p2    H 0.57787425
#14:  4  p2    I 0.20052178
#15:  5  p2    J 0.49654451
#16:  6  p2    F 0.62705394
#17:  7  p2    G 0.04015590
#18:  8  p2    H 0.29792342
#19:  9  p2    I 0.72457705
#20: 10  p2    J 0.05694427