R 中的条件尾随 space 删除

Conditional trailing space deletion in R

我正在尝试创建一个名为 "combo" 的变量。我想要全小写的县,包括 space 如果两个词之间有一个,并且在县名和州缩写之间没有 SPACE 。

到目前为止我有这个:

county <- c("Abbeville County", "Aleutians West Census Area",
           "Cerro Gordo County", "Lonoke County")
state <- c("West Virginia", "Wisconsin", "Wyoming", "Alabama")

trialdat <- data.frame(county, state)
trialdat$state <- sapply(trialdat$state, tolower)
# deal with trailing spaces 
trim.trailing <- function (x) sub("\s+$", "", x)
trialdat$state2 <- as.factor(trim.trailing(as.factor(trialdat$state)))
trialdat$StateAbbrev <- stateFromLower(trialdat$state2)
trialdat$county2 <-     as.factor(trim.trailing(as.factor(trialdat$county)))
# make combo variable
trialdat = mutate(trialdat, combo=paste(tolower(gsub("County", "",county2)),
            StateAbbrev, sep=""))

所需的输出是具有

的列
                       combo
1                  abbevilleWV
2 aleutians west census areaWI
3                cerro gordoWY
4                     lonokeAL

奇怪的事情正在发生。有了名字里有 spaces 的县,我得到了我想要的。但对于其他县,县名后仍保留 space。我不能简单地 gsub-out 所有 spaces 因为我需要它们在县名之间。有任何想法吗?谢谢!

注意:statefromLower 函数如下,从 Chris' code 稍作调整。我包括它是因为问题可能源于这部分,不确定。

 stateFromLower <- function(x) {
  # read 52 state codes into local variable [includes DC
  # (Washington D.C. and PR (Puerto Rico)]
  st.codes <- data.frame(state1 = as.factor(c("AK", "AL", "AR", 
    "AZ", "CA", "CO", "CT", "DC", "DE", "FL", "GA", "HI", 
    "IA", "ID", "IL", "IN", "KS", "KY", "LA", "MA", "MD", 
    "ME", "MI", "MN", "MO", "MS", "MT", "NC", "ND", "NE", 
    "NH", "NJ", "NM", "NV", "NY", "OH", "OK", "OR", "PA", 
    "PR", "RI", "SC", "SD", "TN", "TX", "UT", "VA", "VT", 
    "WA", "WI", "WV", "WY")), full = as.factor(c("alaska", 
    "alabama", "arkansas", "arizona", "california", "colorado", 
    "connecticut", "district of columbia", "delaware", "florida", 
    "georgia", "hawaii", "iowa", "idaho", "illinois", "indiana", 
    "kansas", "kentucky", "louisiana", "massachusetts", "maryland", 
    "maine", "michigan", "minnesota", "missouri", "mississippi", 
    "montana", "north carolina", "north dakota", "nebraska", 
    "new hampshire", "new jersey", "new mexico", "nevada", 
    "new york", "ohio", "oklahoma", "oregon", "pennsylvania", 
    "puerto rico", "rhode island", "south carolina", "south dakota", 
    "tennessee", "texas", "utah", "virginia", "vermont", 
    "washington", "wisconsin", "west virginia", "wyoming")))

  # create an nx1 data.frame of state codes from source column
  st.x <- data.frame(full = x)
  # match source codes with codes from 'st.codes' local
  # variable and use to return the full state name
  refac.x <- st.codes$state1[match(st.x$full, st.codes$full)]
  # return the full state names in the same order in which they
  # appeared in the original source
  return(refac.x)
}

感谢您对格式问题的耐心等待,这是我的第一个问题!

已修复!在 mutate 命令中,我必须在 County 之前添加一个 space。

trialdat = mutate(trialdat, combo=paste(tolower(gsub(" County", "",     county2)), StateAbbrev, sep=""))