从列 R 中删除重复项
removing duplicates from a column R
我有一列包含不同长度的 ID 列表,其中一些 ID 具有版本号。
rownames(x)
"ENSP00000424360.1-D4"
"ENSP00000424360.2-D4"
"ENSP00000424360.3-D4"
"ENSP00000437781-D59"
"XP_010974537.1"
"XP_010974538.1"
"XP_010974538.2"
我想把这些改成:
"ENSP00000424360"
"ENSP00000424360.1"
"ENSP00000424360.2"
"ENSP00000437781"
"XP_010974537"
"XP_010974538"
"XP_010974538.1"
我可以使用
单独转换ENSxx
或XPxx
make.unique(substr(rownames(x),1,15))
或
make.unique(substr(rownames(dds),1,12))
如何更改代码以获得所需的结果?
我们删除带有 sub
的子字符串并应用 make.unique
make.unique(sub("-.*$", "", sub("\..*", "", rownames(x))))
#[1] "ENSP00000424360" "ENSP00000424360.1" "ENSP00000424360.2"
#[4] "ENSP00000437781" "XP_010974537" "XP_010974538" "XP_010974538.1"
数据
x <- structure(list(v1 = 1:7), .Names = "v1", row.names = c("ENSP00000424360.1-D4",
"ENSP00000424360.2-D4", "ENSP00000424360.3-D4", "ENSP00000437781-D59",
"XP_010974537.1", "XP_010974538.1", "XP_010974538.2"), class = "data.frame")
我有一列包含不同长度的 ID 列表,其中一些 ID 具有版本号。
rownames(x)
"ENSP00000424360.1-D4"
"ENSP00000424360.2-D4"
"ENSP00000424360.3-D4"
"ENSP00000437781-D59"
"XP_010974537.1"
"XP_010974538.1"
"XP_010974538.2"
我想把这些改成:
"ENSP00000424360"
"ENSP00000424360.1"
"ENSP00000424360.2"
"ENSP00000437781"
"XP_010974537"
"XP_010974538"
"XP_010974538.1"
我可以使用
单独转换ENSxx
或XPxx
make.unique(substr(rownames(x),1,15))
或
make.unique(substr(rownames(dds),1,12))
如何更改代码以获得所需的结果?
我们删除带有 sub
的子字符串并应用 make.unique
make.unique(sub("-.*$", "", sub("\..*", "", rownames(x))))
#[1] "ENSP00000424360" "ENSP00000424360.1" "ENSP00000424360.2"
#[4] "ENSP00000437781" "XP_010974537" "XP_010974538" "XP_010974538.1"
数据
x <- structure(list(v1 = 1:7), .Names = "v1", row.names = c("ENSP00000424360.1-D4",
"ENSP00000424360.2-D4", "ENSP00000424360.3-D4", "ENSP00000437781-D59",
"XP_010974537.1", "XP_010974538.1", "XP_010974538.2"), class = "data.frame")