如何用列中的另一种模式替换特定模式以外的模式?
How to replace a pattern except specific pattern with another pattern in a column?
我有一个包含 "Symbol" 列的数据框 (x),我想将其替换(所有“-*”都替换为“”)但我不想更改某些值,例如:1- 3 月、9 月 1 日、12 月 1 日、...
x<-data.frame("ID"=c("a","b","c","d","e","f","g","h","i"),"Symbol"=c("3-Mar","STON1-GTF2A1L","1-Dec","NME1-NME2","12-Mar","TNFSF12-TNFSF13","8-Mar","TMEM189-UBE2V1","10-Sep"))
我试过这个代码:x$Symbol<-gsub ("-*", "", x$Symbol)
但它会改变(3 月 1 日、9 月 1 日、12 月 1 日)
我需要下面的数据框
x<-data.frame("ID"=c("a","b","c","d","e","f","g","h","i"),"Symbol"=c("3-Mar","STON1","1-Dec","NME1","12-Mar","TNFSF12","8-Mar","TMEM189","10-Sep"))
您可以使用
x$Symbol <- sub("-(?!(?:Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)$).*", "", x$Symbol, perl=TRUE)
详情
-
- 一个连字符
(?!(?:Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)$)
- 如果紧接在当前位置的右侧,字符串末尾有一个缩写的月份名称(注意:如果您在月份名称后可能有更多文本,请使用 (?!(?:Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)\b)
(将月份名称匹配为整个单词)或 (?!Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)
仅将名称匹配为无界子字符串)
.*
- 尽可能多的除换行符以外的任何 0+ 个字符。
df<-data.frame("ID"=c("a","b","c","d","e","f","g","h","i"),"Symbol"=c("3-Mar","STON1-GTF2A1L","1-Dec","NME1-NME2","12-Mar","TNFSF12-TNFSF13","8-Mar","TMEM189-UBE2V1","10-Sep"))
df$Symbol <- sub("-(?!(?:Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)$).*", "", df$Symbol, perl=TRUE)
df
输出:
ID Symbol
1 a 3-Mar
2 b STON1
3 c 1-Dec
4 d NME1
5 e 12-Mar
6 f TNFSF12
7 g 8-Mar
8 h TMEM189
9 i 10-Sep
您可以 paste
“18” in Symbol
并查看它是否解析为 Date
和 sub
不是日期的值。
df$Symbol <- with(df, ifelse(is.na(as.Date(paste0(Symbol, "-18"), "%d-%b-%y")),
sub ("-.*", "", Symbol), Symbol))
df
# ID Symbol
#1 a 3-Mar
#2 b STON1
#3 c 1-Dec
#4 d NME1
#5 e 12-Mar
#6 f TNFSF12
#7 g 8-Mar
#8 h TMEM189
#9 i 10-Sep
第一个运行
df$Symbol <- as.character(df$Symbol)
将Symbol
转换成字符。
我有一个包含 "Symbol" 列的数据框 (x),我想将其替换(所有“-*”都替换为“”)但我不想更改某些值,例如:1- 3 月、9 月 1 日、12 月 1 日、...
x<-data.frame("ID"=c("a","b","c","d","e","f","g","h","i"),"Symbol"=c("3-Mar","STON1-GTF2A1L","1-Dec","NME1-NME2","12-Mar","TNFSF12-TNFSF13","8-Mar","TMEM189-UBE2V1","10-Sep"))
我试过这个代码:x$Symbol<-gsub ("-*", "", x$Symbol)
但它会改变(3 月 1 日、9 月 1 日、12 月 1 日)
我需要下面的数据框
x<-data.frame("ID"=c("a","b","c","d","e","f","g","h","i"),"Symbol"=c("3-Mar","STON1","1-Dec","NME1","12-Mar","TNFSF12","8-Mar","TMEM189","10-Sep"))
您可以使用
x$Symbol <- sub("-(?!(?:Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)$).*", "", x$Symbol, perl=TRUE)
详情
-
- 一个连字符(?!(?:Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)$)
- 如果紧接在当前位置的右侧,字符串末尾有一个缩写的月份名称(注意:如果您在月份名称后可能有更多文本,请使用(?!(?:Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)\b)
(将月份名称匹配为整个单词)或(?!Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)
仅将名称匹配为无界子字符串).*
- 尽可能多的除换行符以外的任何 0+ 个字符。
df<-data.frame("ID"=c("a","b","c","d","e","f","g","h","i"),"Symbol"=c("3-Mar","STON1-GTF2A1L","1-Dec","NME1-NME2","12-Mar","TNFSF12-TNFSF13","8-Mar","TMEM189-UBE2V1","10-Sep"))
df$Symbol <- sub("-(?!(?:Jan|Feb|Mar|Apr|May|Ju[nl]|Aug|Sep|Oct|Nov|Dec)$).*", "", df$Symbol, perl=TRUE)
df
输出:
ID Symbol
1 a 3-Mar
2 b STON1
3 c 1-Dec
4 d NME1
5 e 12-Mar
6 f TNFSF12
7 g 8-Mar
8 h TMEM189
9 i 10-Sep
您可以 paste
“18” in Symbol
并查看它是否解析为 Date
和 sub
不是日期的值。
df$Symbol <- with(df, ifelse(is.na(as.Date(paste0(Symbol, "-18"), "%d-%b-%y")),
sub ("-.*", "", Symbol), Symbol))
df
# ID Symbol
#1 a 3-Mar
#2 b STON1
#3 c 1-Dec
#4 d NME1
#5 e 12-Mar
#6 f TNFSF12
#7 g 8-Mar
#8 h TMEM189
#9 i 10-Sep
第一个运行
df$Symbol <- as.character(df$Symbol)
将Symbol
转换成字符。