缩写科学名称的功能
Function to abbreviate scientific names
你能帮帮我吗?
我正在尝试修改同事编写的 R 函数。这个函数接收一个带有科学名称(拉丁文二元组)的字符向量,就像这个:
Name
Cerradomys scotti
Oligoryzomys sp
Philander frenatus
Byrsonima sp
Campomanesia adamantium
Cecropia pachystachya
Cecropia sp
Erythroxylum sp
Ficus sp
Leandra aurea
那么,应该对学名进行缩写,只用属的前三个字母(第一项)和加词(第二项)做一个短码。例如,Cerradomys scotti 应该变成 Cersco.
这是原始函数:
AbbreviatedNames <- function(vector) {
abbreviations <- character(length = length(vector))
splitnames <- strsplit(vector, " ")
for (i in 1:length(vector)) {
vector[i] <- if(splitnames[[i]][2] == "^sp") {
paste(substr(splitnames[[i]][1],1,3),
splitnames[[i]][2], sep = "")
}
else {
paste(substr(splitnames[[i]][1],1,3),
substr(splitnames[[i]][2],1,3), sep = "")
}
}
vector
}
有了像这样的简单列表,该功能就可以完美运行。但是,当列表有一些缺失或多余的元素时,它就不起作用了。当遇到与模式不匹配的第一行时,循环停止。让我们以这个更复杂的列表为例:
Name
Cerradomys scotti
Oligoryzomys sp
Philander frenatus
Byrsonima sp
Campomanesia adamantium
Cecropia pachystachya
Cecropia sp
Erythroxylum sp
Ficus sp
Leandra aurea
Morfosp1
Vismia cf brasiliensis
看到 Morfosp1 只有 1 个任期。而Vismia cf brasiliensis中间多了一个词(cf)
我试过调整函数,例如,这样:
AbbreviatedNames <- function(vector) {
abbreviations <- character(length = length(vector))
splitnames <- strsplit(vector, " ")
for (i in 1:length(vector)) {
vector[i] <- if(splitnames[[i]][2] == "^sp" & is.na(splitnames[[i]][2]))) {
paste(substr(splitnames[[i]][1],1,3),
splitnames[[i]][2], sep = "")
}
else {
paste(substr(splitnames[[i]][1],1,3),
substr(splitnames[[i]][2],1,3), sep = "")
}
}
vector
}
然而,它不起作用。我收到此错误消息:
Error in if (splitnames[[i]][2] == "^sp" & is.na(splitnames[[i]][2])) { :
valor ausente onde TRUE/FALSE necessário
我怎样才能实现这个功能:
- 还处理只有 1 个词的名称吗?
预期结果:Morfosp1 -> Morfosp1(保持不变)
- 还处理中间有附加词的名称吗?
预期结果:Vismia cf brasiliensis -> Visbra(忽略中间项)
非常感谢!
像这样的东西非常简洁:
test <- c("Cerradomys scotti", "Oligoryzomys sp", "Latingstuff", "Latin staff more")
# function to truncate a given name
trunc_str <- function(latin_name) {
# split it on a space
name_split <- unlist(strsplit(latin_name, " ", fixed = TRUE))
# if one name, just return it
if (length(name_split) == 1) return(name_split)
# truncate to first 3 letters
name_trunc <- substr(name_split, 1, 3)
# paste the first and last term together (skipping any middle ones)
paste0(head(name_trunc, 1), tail(name_trunc, 1))
}
# iterate over all
vapply(test, trunc_str, "")
# Cerradomys scotti Oligoryzomys sp Latingstuff Latin staff more
# "Cersco" "Olisp" "Latingstuff" "Latmor"
如果您不想要命名向量输出,可以在vapply()
中使用USE.NAMES = FALSE
。或者在这里随意使用循环。
AbbreviatedNames <- function(vector) {
abbreviations <- character(length = length(vector))
splitnames <- strsplit(vector, " ")
for (i in 1:length(vector)){
# One name
if(length(splitnames[[i]])==1){
vector[i] <- paste(substr(splitnames[[i]][1],1,3),
substr(splitnames[[i]][2],1,3), sep = "")
}
# Two names
else if(length(splitnames[[i]])==2){
vector[i] <- if(splitnames[[i]][2] == "^sp") {
paste(substr(splitnames[[i]][1],1,3),
splitnames[[i]][2], sep = "")
}
else {
paste(substr(splitnames[[i]][1],1,3),
substr(splitnames[[i]][2],1,3), sep = "")
}
}
# Three names
else if(length(splitnames[[i]])==3){
vector[i] <- paste(substr(splitnames[[i]][1],1,3),
substr(splitnames[[i]][3],1,3), sep = "")
# Assuming that the unwanted word is always in the middle
}
}
return(vector)
}
我在您提供的列表上进行了测试,它似乎有效,如果您需要更通用的代码,请告诉我
非常感谢 Ricardo 和 Adam 的帮助!我已经在 GitHub 上向其他使用交互网络的人提供了代码,并且需要缩写要在图表中使用的科学名称。
你能帮帮我吗?
我正在尝试修改同事编写的 R 函数。这个函数接收一个带有科学名称(拉丁文二元组)的字符向量,就像这个:
Name
Cerradomys scotti
Oligoryzomys sp
Philander frenatus
Byrsonima sp
Campomanesia adamantium
Cecropia pachystachya
Cecropia sp
Erythroxylum sp
Ficus sp
Leandra aurea
那么,应该对学名进行缩写,只用属的前三个字母(第一项)和加词(第二项)做一个短码。例如,Cerradomys scotti 应该变成 Cersco.
这是原始函数:
AbbreviatedNames <- function(vector) {
abbreviations <- character(length = length(vector))
splitnames <- strsplit(vector, " ")
for (i in 1:length(vector)) {
vector[i] <- if(splitnames[[i]][2] == "^sp") {
paste(substr(splitnames[[i]][1],1,3),
splitnames[[i]][2], sep = "")
}
else {
paste(substr(splitnames[[i]][1],1,3),
substr(splitnames[[i]][2],1,3), sep = "")
}
}
vector
}
有了像这样的简单列表,该功能就可以完美运行。但是,当列表有一些缺失或多余的元素时,它就不起作用了。当遇到与模式不匹配的第一行时,循环停止。让我们以这个更复杂的列表为例:
Name
Cerradomys scotti
Oligoryzomys sp
Philander frenatus
Byrsonima sp
Campomanesia adamantium
Cecropia pachystachya
Cecropia sp
Erythroxylum sp
Ficus sp
Leandra aurea
Morfosp1
Vismia cf brasiliensis
看到 Morfosp1 只有 1 个任期。而Vismia cf brasiliensis中间多了一个词(cf)
我试过调整函数,例如,这样:
AbbreviatedNames <- function(vector) {
abbreviations <- character(length = length(vector))
splitnames <- strsplit(vector, " ")
for (i in 1:length(vector)) {
vector[i] <- if(splitnames[[i]][2] == "^sp" & is.na(splitnames[[i]][2]))) {
paste(substr(splitnames[[i]][1],1,3),
splitnames[[i]][2], sep = "")
}
else {
paste(substr(splitnames[[i]][1],1,3),
substr(splitnames[[i]][2],1,3), sep = "")
}
}
vector
}
然而,它不起作用。我收到此错误消息:
Error in if (splitnames[[i]][2] == "^sp" & is.na(splitnames[[i]][2])) { :
valor ausente onde TRUE/FALSE necessário
我怎样才能实现这个功能:
- 还处理只有 1 个词的名称吗?
预期结果:Morfosp1 -> Morfosp1(保持不变)
- 还处理中间有附加词的名称吗?
预期结果:Vismia cf brasiliensis -> Visbra(忽略中间项)
非常感谢!
像这样的东西非常简洁:
test <- c("Cerradomys scotti", "Oligoryzomys sp", "Latingstuff", "Latin staff more")
# function to truncate a given name
trunc_str <- function(latin_name) {
# split it on a space
name_split <- unlist(strsplit(latin_name, " ", fixed = TRUE))
# if one name, just return it
if (length(name_split) == 1) return(name_split)
# truncate to first 3 letters
name_trunc <- substr(name_split, 1, 3)
# paste the first and last term together (skipping any middle ones)
paste0(head(name_trunc, 1), tail(name_trunc, 1))
}
# iterate over all
vapply(test, trunc_str, "")
# Cerradomys scotti Oligoryzomys sp Latingstuff Latin staff more
# "Cersco" "Olisp" "Latingstuff" "Latmor"
如果您不想要命名向量输出,可以在vapply()
中使用USE.NAMES = FALSE
。或者在这里随意使用循环。
AbbreviatedNames <- function(vector) {
abbreviations <- character(length = length(vector))
splitnames <- strsplit(vector, " ")
for (i in 1:length(vector)){
# One name
if(length(splitnames[[i]])==1){
vector[i] <- paste(substr(splitnames[[i]][1],1,3),
substr(splitnames[[i]][2],1,3), sep = "")
}
# Two names
else if(length(splitnames[[i]])==2){
vector[i] <- if(splitnames[[i]][2] == "^sp") {
paste(substr(splitnames[[i]][1],1,3),
splitnames[[i]][2], sep = "")
}
else {
paste(substr(splitnames[[i]][1],1,3),
substr(splitnames[[i]][2],1,3), sep = "")
}
}
# Three names
else if(length(splitnames[[i]])==3){
vector[i] <- paste(substr(splitnames[[i]][1],1,3),
substr(splitnames[[i]][3],1,3), sep = "")
# Assuming that the unwanted word is always in the middle
}
}
return(vector)
}
我在您提供的列表上进行了测试,它似乎有效,如果您需要更通用的代码,请告诉我
非常感谢 Ricardo 和 Adam 的帮助!我已经在 GitHub 上向其他使用交互网络的人提供了代码,并且需要缩写要在图表中使用的科学名称。