如何用列中的另一种模式替换特定模式以外的模式?

How to replace a pattern except specific pattern with another pattern in a column?

我有一个包含 "Symbol" 列的数据框 (x),我想将其替换(所有“-*”都替换为“”)但我不想更改某些值,例如:1- 3 月、9 月 1 日、12 月 1 日、...

x<-data.frame("ID"=c("a","b","c","d","e","f","g","h","i"),"Symbol"=c("3-Mar","STON1-GTF2A1L","1-Dec","NME1-NME2","12-Mar","TNFSF12-TNFSF13","8-Mar","TMEM189-UBE2V1","10-Sep"))

我试过这个代码:x$Symbol<-gsub ("-*", "", x$Symbol) 但它会改变(3 月 1 日、9 月 1 日、12 月 1 日)

我需要下面的数据框

x<-data.frame("ID"=c("a","b","c","d","e","f","g","h","i"),"Symbol"=c("3-Mar","STON1","1-Dec","NME1","12-Mar","TNFSF12","8-Mar","TMEM189","10-Sep"))

您可以使用

x$Symbol <- sub("-(?!(?:Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)$).*", "", x$Symbol, perl=TRUE)

regex demo

详情

  • - - 一个连字符
  • (?!(?:Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)$) - 如果紧接在当前位置的右侧,字符串末尾有一个缩写的月份名称(注意:如果您在月份名称后可能有更多文本,请使用 (?!(?:Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)\b)(将月份名称匹配为整个单词)或 (?!Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec) 仅将名称匹配为无界子字符串)
  • .* - 尽可能多的除换行符以外的任何 0+ 个字符。

R demo:

df<-data.frame("ID"=c("a","b","c","d","e","f","g","h","i"),"Symbol"=c("3-Mar","STON1-GTF2A1L","1-Dec","NME1-NME2","12-Mar","TNFSF12-TNFSF13","8-Mar","TMEM189-UBE2V1","10-Sep"))
df$Symbol <- sub("-(?!(?:Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)$).*", "", df$Symbol, perl=TRUE)
df

输出:

  ID  Symbol
1  a   3-Mar
2  b   STON1
3  c   1-Dec
4  d    NME1
5  e  12-Mar
6  f TNFSF12
7  g   8-Mar
8  h TMEM189
9  i  10-Sep

您可以 paste “18” in Symbol 并查看它是否解析为 Datesub 不是日期的值。

df$Symbol <- with(df, ifelse(is.na(as.Date(paste0(Symbol, "-18"), "%d-%b-%y")), 
                   sub ("-.*", "", Symbol), Symbol))

df
#  ID  Symbol
#1  a   3-Mar
#2  b   STON1
#3  c   1-Dec
#4  d    NME1
#5  e  12-Mar
#6  f TNFSF12
#7  g   8-Mar
#8  h TMEM189
#9  i  10-Sep

第一个运行

df$Symbol <- as.character(df$Symbol)

Symbol转换成字符。