用查找中的匹配项替换数据框中的每一列 table
Replace each column within a dataframe with matches from a lookup table
我有一个名为 table_1
的 data.frame
,结构如下:
p_id rd1 rd2 rd3
<fctr><fctr><fctr><fctr>
1 1 5 4 6
2 2 3 1 1
3 3 6 6 5
4 4 1 5 2
5 5 4 1 4
我还有另一个 data.frame
和 p_id
和 p_scr
,设置如下:
p_id p_scr
<fctr><fctr>
1 1 11
2 2 22
3 3 33
4 4 44
5 5 55
6 6 66
我的目标:对于 table_1
中的每一列,我想用中的查找值替换 rd1
、rd2
和 rd3
中的所有条目table p_scr
。
p_id rd1 rd2 rd3
<fctr><fctr><fctr><fctr>
1 1 55 44 66
2 2 33 11 11
3 3 66 66 55
4 4 11 55 22
5 5 44 11 44
我怀疑这会将 mapply
或 lapply
与 match
一起使用,但我还没有找到像这样的好例子。我也很熟悉 mutate
,我怀疑它也可以在这里使用。接受任何建议。注意:这是我实际数据的简化版本。
注意:我已更正此代码以匹配您的数据结构,这些都是因素。使用 t 中的 rd 值通过将 ref table 的行名设置为 p_ids.
来索引您的参考查找 table
我对 p_ids 使用不同的值来突出按 p_id 行名而不是按位置的索引。
# t is your df; ref is your lookup table
t <- data.frame(p_id=factor(c(10,20,30,40,50)),
rd1=factor(c(5,3,6,1,4)*10),
rd2=factor(c(4,1,6,5,1)*10),
rd3=factor(c(6,1,5,2,4)*10))
ref <- data.frame(p_id=factor(c(10,20,30,40,50,60)),
p_scr=factor(c(11,22,33,44,55,66)))
t
# p_id rd1 rd2 rd3
# 1 10 50 40 60
# 2 20 30 10 10
# 3 30 60 60 50
# 4 40 10 50 20
# 5 50 40 10 40
ref
# p_id p_scr
# 1 10 11
# 2 20 22
# 3 30 33
# 4 40 44
# 5 50 55
# 6 60 66
# assuming p_id is unique, set rownames of ref lookup table to p_id to allow for indexing by p_id
rownames(ref) <- ref$p_id
rownames(ref) # character values, not numeric
# [1] "10" "20" "30" "40" "50" "60"
# ref lookup table now looks like this
ref
# p_id p_scr
# 10 10 11
# 20 20 22
# 30 30 33
# 40 40 44
# 50 50 55
# 60 60 66
# single case, ref rownames are character vectors, we want to index with corresponding character vector from t
as.character(t$rd1)
# [1] "50" "30" "60" "10" "40"
ref[as.character(t$rd1),]$p_scr # use character values of rd1 to index, matching the character values of rownames
# [1] 55 33 66 11 44
# Levels: 11 22 33 44 55 66
# apply to each rd column, returns the character values of p_scr factor
apply(t[,2:ncol(t)], 2, function(x) ref[as.character(x),]$p_scr)
# converts to numeric the character values of p_scr factor
apply(t[,2:ncol(t)], 2, function(x) as.numeric(as.character(ref[as.character(x),]$p_scr)))
# NOTE: the previous answer I gave does not work, why?
ref[t$rd1,]$p_scr # gives incorrect order
# [1] 44 22 55 11 33
# Levels: 11 22 33 44 55 66
# NOTE structure of t
str(t)
# 'data.frame': 5 obs. of 4 variables:
# $ p_id: Factor w/ 5 levels "10","20","30",..: 1 2 3 4 5
# $ rd1 : Factor w/ 5 levels "10","30","40",..: 4 2 5 1 3
# $ rd2 : Factor w/ 4 levels "10","40","50",..: 2 1 4 3 1
# $ rd3 : Factor w/ 5 levels "10","20","40",..: 5 1 4 2 3
# Do you see the character vs integer values of the factor t$rd1
t$rd1
# [1] 50 30 60 10 40
# Levels: 10 30 40 50 60
# The levels of t$rd1: "10", "30", "40", "50", "60", which correspond to 4 2 5 1 3 position
# In the case of ref[t$rd1] you are using the integer values of t$rd1 and indexing ref by position: ref[c(4,2,5,1,3)] so your output is c(44, 22, 55, 11, 33)
# In the case of ref[as.character(t$rd1) you are using the character values of t$rd1 and indexing ref by rownames: ref[c("50", "30", "60", "10", "40")] so your output is c(55, 33, 66 11, 44)
请注意,如果您的数据是因子,请小心编制索引,始终检查结构和整数值。观察:
n <- 1:5 # numeric
n
f <- factor(n, levels=5:1) # factor
f
levels(f)
# consequence when used to index
letters[n]
[1] "a" "b" "c" "d" "e"
letters[f]
[1] "e" "d" "c" "b" "a"
我有一个名为 table_1
的 data.frame
,结构如下:
p_id rd1 rd2 rd3
<fctr><fctr><fctr><fctr>
1 1 5 4 6
2 2 3 1 1
3 3 6 6 5
4 4 1 5 2
5 5 4 1 4
我还有另一个 data.frame
和 p_id
和 p_scr
,设置如下:
p_id p_scr
<fctr><fctr>
1 1 11
2 2 22
3 3 33
4 4 44
5 5 55
6 6 66
我的目标:对于 table_1
中的每一列,我想用中的查找值替换 rd1
、rd2
和 rd3
中的所有条目table p_scr
。
p_id rd1 rd2 rd3
<fctr><fctr><fctr><fctr>
1 1 55 44 66
2 2 33 11 11
3 3 66 66 55
4 4 11 55 22
5 5 44 11 44
我怀疑这会将 mapply
或 lapply
与 match
一起使用,但我还没有找到像这样的好例子。我也很熟悉 mutate
,我怀疑它也可以在这里使用。接受任何建议。注意:这是我实际数据的简化版本。
注意:我已更正此代码以匹配您的数据结构,这些都是因素。使用 t 中的 rd 值通过将 ref table 的行名设置为 p_ids.
来索引您的参考查找 table我对 p_ids 使用不同的值来突出按 p_id 行名而不是按位置的索引。
# t is your df; ref is your lookup table
t <- data.frame(p_id=factor(c(10,20,30,40,50)),
rd1=factor(c(5,3,6,1,4)*10),
rd2=factor(c(4,1,6,5,1)*10),
rd3=factor(c(6,1,5,2,4)*10))
ref <- data.frame(p_id=factor(c(10,20,30,40,50,60)),
p_scr=factor(c(11,22,33,44,55,66)))
t
# p_id rd1 rd2 rd3
# 1 10 50 40 60
# 2 20 30 10 10
# 3 30 60 60 50
# 4 40 10 50 20
# 5 50 40 10 40
ref
# p_id p_scr
# 1 10 11
# 2 20 22
# 3 30 33
# 4 40 44
# 5 50 55
# 6 60 66
# assuming p_id is unique, set rownames of ref lookup table to p_id to allow for indexing by p_id
rownames(ref) <- ref$p_id
rownames(ref) # character values, not numeric
# [1] "10" "20" "30" "40" "50" "60"
# ref lookup table now looks like this
ref
# p_id p_scr
# 10 10 11
# 20 20 22
# 30 30 33
# 40 40 44
# 50 50 55
# 60 60 66
# single case, ref rownames are character vectors, we want to index with corresponding character vector from t
as.character(t$rd1)
# [1] "50" "30" "60" "10" "40"
ref[as.character(t$rd1),]$p_scr # use character values of rd1 to index, matching the character values of rownames
# [1] 55 33 66 11 44
# Levels: 11 22 33 44 55 66
# apply to each rd column, returns the character values of p_scr factor
apply(t[,2:ncol(t)], 2, function(x) ref[as.character(x),]$p_scr)
# converts to numeric the character values of p_scr factor
apply(t[,2:ncol(t)], 2, function(x) as.numeric(as.character(ref[as.character(x),]$p_scr)))
# NOTE: the previous answer I gave does not work, why?
ref[t$rd1,]$p_scr # gives incorrect order
# [1] 44 22 55 11 33
# Levels: 11 22 33 44 55 66
# NOTE structure of t
str(t)
# 'data.frame': 5 obs. of 4 variables:
# $ p_id: Factor w/ 5 levels "10","20","30",..: 1 2 3 4 5
# $ rd1 : Factor w/ 5 levels "10","30","40",..: 4 2 5 1 3
# $ rd2 : Factor w/ 4 levels "10","40","50",..: 2 1 4 3 1
# $ rd3 : Factor w/ 5 levels "10","20","40",..: 5 1 4 2 3
# Do you see the character vs integer values of the factor t$rd1
t$rd1
# [1] 50 30 60 10 40
# Levels: 10 30 40 50 60
# The levels of t$rd1: "10", "30", "40", "50", "60", which correspond to 4 2 5 1 3 position
# In the case of ref[t$rd1] you are using the integer values of t$rd1 and indexing ref by position: ref[c(4,2,5,1,3)] so your output is c(44, 22, 55, 11, 33)
# In the case of ref[as.character(t$rd1) you are using the character values of t$rd1 and indexing ref by rownames: ref[c("50", "30", "60", "10", "40")] so your output is c(55, 33, 66 11, 44)
请注意,如果您的数据是因子,请小心编制索引,始终检查结构和整数值。观察:
n <- 1:5 # numeric
n
f <- factor(n, levels=5:1) # factor
f
levels(f)
# consequence when used to index
letters[n]
[1] "a" "b" "c" "d" "e"
letters[f]
[1] "e" "d" "c" "b" "a"