缩写科学名称的功能

Function to abbreviate scientific names

你能帮帮我吗?

我正在尝试修改同事编写的 R 函数。这个函数接收一个带有科学名称(拉丁文二元组)的字符向量,就像这个:

Name
Cerradomys scotti
Oligoryzomys sp
Philander frenatus
Byrsonima sp
Campomanesia adamantium
Cecropia pachystachya
Cecropia sp
Erythroxylum sp
Ficus sp
Leandra aurea

那么,应该对学名进行缩写,只用属的前三个字母(第一项)和加词(第二项)做一个短码。例如,Cerradomys scotti 应该变成 Cersco.

这是原始函数:

AbbreviatedNames <- function(vector) {

    abbreviations <- character(length = length(vector))
    
    splitnames <- strsplit(vector, " ")
    
    for (i in 1:length(vector)) {
        vector[i] <- if(splitnames[[i]][2] == "^sp") {
            paste(substr(splitnames[[i]][1],1,3),
                  splitnames[[i]][2], sep = "")
        }
        
        else {
            paste(substr(splitnames[[i]][1],1,3),
                  substr(splitnames[[i]][2],1,3), sep = "")
        }
        
    }
    
    vector
    
    }

有了像这样的简单列表,该功能就可以完美运行。但是,当列表有一些缺失或多余的元素时,它就不起作用了。当遇到与模式不匹配的第一行时,循环停止。让我们以这个更复杂的列表为例:

Name
Cerradomys scotti
Oligoryzomys sp
Philander frenatus
Byrsonima sp
Campomanesia adamantium
Cecropia pachystachya
Cecropia sp
Erythroxylum sp
Ficus sp
Leandra aurea
Morfosp1
Vismia cf brasiliensis

看到 Morfosp1 只有 1 个任期。而Vismia cf brasiliensis中间多了一个词(cf)

我试过调整函数,例如,这样:

AbbreviatedNames <- function(vector) {

    abbreviations <- character(length = length(vector))
    
    splitnames <- strsplit(vector, " ")
    
    for (i in 1:length(vector)) {
        vector[i] <- if(splitnames[[i]][2] == "^sp" & is.na(splitnames[[i]][2]))) {
            paste(substr(splitnames[[i]][1],1,3),
                  splitnames[[i]][2], sep = "")
        }
        
        else {
            paste(substr(splitnames[[i]][1],1,3),
                  substr(splitnames[[i]][2],1,3), sep = "")
        }
        
    }
    
    vector
    
    }

然而,它不起作用。我收到此错误消息:

Error in if (splitnames[[i]][2] == "^sp" & is.na(splitnames[[i]][2])) { : 
  valor ausente onde TRUE/FALSE necessário

我怎样才能实现这个功能:

  1. 还处理只有 1 个词的名称吗?

预期结果:Morfosp1 -> Morfosp1(保持不变)

  1. 还处理中间有附加词的名称吗?

预期结果:Vismia cf brasiliensis -> Visbra(忽略中间项)

非常感谢!

像这样的东西非常简洁:

test <- c("Cerradomys scotti", "Oligoryzomys sp", "Latingstuff", "Latin staff more")

# function to truncate a given name
trunc_str <- function(latin_name) {
  
  # split it on a space
  name_split <- unlist(strsplit(latin_name, " ", fixed = TRUE))
  
  # if one name, just return it
  if (length(name_split) == 1) return(name_split)
  
  # truncate to first 3 letters
  name_trunc <- substr(name_split, 1, 3)

  # paste the first and last term together (skipping any middle ones)
  paste0(head(name_trunc, 1), tail(name_trunc, 1))
    
}

# iterate over all
vapply(test, trunc_str, "")
# Cerradomys scotti   Oligoryzomys sp       Latingstuff  Latin staff more 
# "Cersco"           "Olisp"     "Latingstuff"          "Latmor"

如果您不想要命名向量输出,可以在vapply() 中使用USE.NAMES = FALSE。或者在这里随意使用循环。

AbbreviatedNames <- function(vector) {
  
  abbreviations <- character(length = length(vector))
  
  splitnames <- strsplit(vector, " ")
  
  for (i in 1:length(vector)){
    
    # One name
    if(length(splitnames[[i]])==1){
      vector[i] <- paste(substr(splitnames[[i]][1],1,3),
            substr(splitnames[[i]][2],1,3), sep = "")
    }
    
    # Two names
    else if(length(splitnames[[i]])==2){
      vector[i] <- if(splitnames[[i]][2] == "^sp") {
        paste(substr(splitnames[[i]][1],1,3),
              splitnames[[i]][2], sep = "")
      }
      else {
        paste(substr(splitnames[[i]][1],1,3),
              substr(splitnames[[i]][2],1,3), sep = "")
      }
    }
    
    # Three names
    else if(length(splitnames[[i]])==3){
      vector[i] <- paste(substr(splitnames[[i]][1],1,3),
              substr(splitnames[[i]][3],1,3), sep = "")
      # Assuming that the unwanted word is always in the middle 
    }
    
  }
  
  return(vector)
}

我在您提供的列表上进行了测试,它似乎有效,如果您需要更通用的代码,请告诉我

非常感谢 Ricardo 和 Adam 的帮助!我已经在 GitHub 上向其他使用交互网络的人提供了代码,并且需要缩写要在图表中使用的科学名称。