使用不同长度的单独数据帧替换字符向量

Question

有一个字符向量（~35,000 行）(col1)，我想 recode/rename 基于单独的数据帧 (df1)。它们都是字符向量。

col1
C
B
M A
B
R R
C
R R
M A
B

df1:

V1   V2
B    blanket
C    toy
M A  blarg
R R  targe

结果会是

col1
toy
blanket
blarg
blanket
targe
toy
targe
blarg
blanket

我想做的是说 "if V1 = col1, replace with V1 = V2" 我试着按字面意思写：

out<-if(col1==df$V1){replace(df$V1 == df$V2)}

抛出：

Warning message:
In if (testdat == schooldf$V1) { :
  the condition has length > 1 and only the first element will be used

我尝试使用 gsub:

out<-gsub(df$V1, df$V2, col1)

抛出：

1: In gsub(schooldf$V1, schooldf$V2, testdat) :
  argument 'pattern' has length > 1 and only the first element will be used
2: In gsub(schooldf$V1, schooldf$V2, testdat) :
  argument 'replacement' has length > 1 and only the first element will be used

很明显，我尝试过的两个论点中的问题都很相似，但我无法弄清楚我做错了什么。

Answer 1

您在 replace 代码中收到的警告来自于您使用 if() 的事实，它用于流量控制，而不是变量创建。它仅意味着采用长度为 1 的逻辑值（TRUE 或 FALSE）。 replace 的语法也不正确，请参阅 ?replace 或下面我的回答的最后一部分：

一个想法是用 match 而不是 replace 来做到这一点。 replace一次只做一个条件

col2 <- df1$V2[match(col1, df1$V1)]
col2
#[1] "toy"     "blanket" "blarg"   "blanket" "targe"   "toy"     "targe"   "blarg"   "blanket"

结果是一个字符向量，因为你说这就是你问题中的 col1。如果 col1 是 data.frame 你仍然可以使用相同的方法。

如果您有一些潜在的不匹配，您可以使用 replace 来确保保留原始 col1 值：

replace(col2, is.na(col2), col1[which(is.na(col2))])

Answer 2

您也可以使用 merge，假设您的 col 在 df

merge(df1, df, by.x = "v1", by.y = "col", all.y=T)

使用不同长度的单独数据帧替换字符向量

replace vector of characters using separate dataframe of different length

if-statement

r

gsub