gsub() 无法识别和替换某些重音字符

gsub() not recognizing and replacing certain accented characters

我有一个 df 有各种名字,其中很多包含 accented/non-English 个字符。我对每个要替换的字符都使用了 gsub's,这对他们中的许多人都有效;但是,对于几个字符,它根本没有替换它们。

非工作示例 gsubgsub("č","c",df,fixed=TRUE)

以下是未替换的字符:ł ř ń š ž Ľ ţ ę č ć

我的愿望是用他们的英文“相似”替代它们:l r n s z L t e c c

除了gsub的尝试,我也尝试过chartr("łřńšžĽţęčć","lrnszLtecc",df$Name)。与 gsub 次尝试一样,这也以失败告终。

df<-data.frame(Name=c("Stipe Miočić","Duško Todorović","Michał Oleksiejczuk","Jiři Prochazka","Bartosz Fabiński","Damir Hadžović","Ľudovit Klein","Diana Belbiţă","Joanna Jędrzejczyk" ))

上面是一个 df,其中有几个名字给我带来了麻烦,问题是,当你 运行 这个并查看结果 df 时,它删除了所有给我带来问题的字符并显示这些字符的英文版本。但是,它不会在我的主 df 中执行此操作,我正在处理直接抓取的数据。

如能深入了解此问题以及解决方法,我们将不胜感激。

您可以使用 stringi::replace_all_fixed:

library(stringi)
stri_replace_all_fixed(df$Name,
                       c("ł","ř","ń","š","ž","Ľ","ţ","ę","č","ć"),
                       c("l","r","n","s","z","L","t","e","c","c"),
                       vectorize_all = FALSE)
[1] "Stipe Miocic"        "Dusko Todorovic"     "Michal Oleksiejczuk" "Jiri Prochazka"      "Bartosz Fabinski"   
[6] "Damir Hadzovic"      "Ludovit Klein"       "Diana Belbită"       "Joanna Jedrzejczyk" 

使用stringi::stri_trans_general:

library(stringi)
df<-data.frame(Name=c("Stipe Miočić","Duško Todorović","Michał Oleksiejczuk","Jiři Prochazka","Bartosz Fabiński","Damir Hadžović","Ľudovit Klein","Diana Belbiţă","Joanna Jędrzejczyk" ))
stri_trans_general(df$Name, "Latin-ASCII")

结果:

[1] "Stipe Miocic"        "Dusko Todorovic"     "Michal Oleksiejczuk"
[4] "Jiri Prochazka"      "Bartosz Fabinski"    "Damir Hadzovic"     
[7] "Ludovit Klein"       "Diana Belbita"       "Joanna Jedrzejczyk" 

R proof