更改特定列,字符串(在所有行中重复相同)顺序
Change for a specific column, a string (identically repeated in all the rows) order
a) 第 4 列中的所有行都有相同的字符串。
b) 我正在尝试重新排序这个字符串。
c) 每行从strsplit分成8组。
d) 我想为列的所有行重新排序这些组(重新排序的方式相同)。
然后我尝试了这个脚本...
我只是找不到要在 [[ 中插入什么? ]]...
我尝试了行数和列名...
但仍然无法更改列中的顺序...
我应该在 [[ 的地方插入什么? ]] 使其工作,为列的所有行排序字符串?
有什么建议吗?
df
dput(df[1:4, ])
structure(list(V1 = c("chr1", "chr1", "chr1", "chr1"), V2 = 3003641:3003644,
V3 = 3003650:3003653, V4 = c("Class=C2H2.zinc.finger.factors;Family=More.than.3.adjacent.zinc.finger.factors;strand=-;id=ZNF449.Me9548.1.YY2017.HT-SE2;seq=CCCCCCCCCC;score=10.4571;pval=8.34e-05;Averageconservationscore=NA",
"Class=C2H2.zinc.finger.factors;Family=More.than.3.adjacent.zinc.finger.factors;strand=-;id=ZNF449.Me9548.1.YY2017.HT-SE2;seq=CCCCCCCCCC;score=10.4571;pval=8.34e-05;Averageconservationscore=NA",
"Class=C2H2.zinc.finger.factors;Family=More.than.3.adjacent.zinc.finger.factors;strand=-;id=ZNF449.Me9548.1.YY2017.HT-SE2;seq=CCCCCCCCCC;score=10.4571;pval=8.34e-05;Averageconservationscore=NA",
"Class=C2H2.zinc.finger.factors;Family=More.than.3.adjacent.zinc.finger.factors;strand=-;id=ZNF449.Me9548.1.YY2017.HT-SE2;seq=ACCCCCCCCC;score=10.8429;pval=6.74e-05;Averageconservationscore=NA"
), V5 = c(0L, 0L, 0L, 0L)), row.names = c(NA, 4L), class = "data.frame")
script
df$V4 <- strsplit(df$V4, '[,]')
df$V4 <- df$V4[[ ? ]][c(1,2,4,3,5,6,7,8)]
order col4 before (id after strand)
Class=C2H2.zinc.finger.factors;Family=More.than.3.adjacent.zinc.finger.factors;strand=-;id=YY1.Ca9487.2.YY2017.HT-SE2;seq=AGCCATCTTGTCTCACGAGTCCA;score=5.57576;pval=8.28e-05;Averageconservationscore=NA
order col4 after (id before strand)
Class=C2H2.zinc.finger.factors;Family=More.than.3.adjacent.zinc.finger.factors;id=YY1.Ca9487.2.YY2017.HT-SE2;strand=-;seq=AGCCATCTTGTCTCACGAGTCCA;score=5.57576;pval=8.28e-05;Averageconservationscore=NA
使用 strsplit
拆分字符串,使用 sapply
重新排序并粘贴回字符串。
df$V4 <- sapply(strsplit(df$V4, ';', fixed = TRUE), function(x)
paste0(x[c(1,2,4,3,5,6,7,8)], collapse = ';'))
我们可以在 gsub
中使用正则表达式而不拆分
df$V4 <- gsub("^([^;]+;[^;]+;)([^;]+);([^;]+)(.*)", "\1\3;\2\4", df$V4)
a) 第 4 列中的所有行都有相同的字符串。
b) 我正在尝试重新排序这个字符串。
c) 每行从strsplit分成8组。
d) 我想为列的所有行重新排序这些组(重新排序的方式相同)。
然后我尝试了这个脚本... 我只是找不到要在 [[ 中插入什么? ]]... 我尝试了行数和列名... 但仍然无法更改列中的顺序... 我应该在 [[ 的地方插入什么? ]] 使其工作,为列的所有行排序字符串? 有什么建议吗?
df
dput(df[1:4, ])
structure(list(V1 = c("chr1", "chr1", "chr1", "chr1"), V2 = 3003641:3003644,
V3 = 3003650:3003653, V4 = c("Class=C2H2.zinc.finger.factors;Family=More.than.3.adjacent.zinc.finger.factors;strand=-;id=ZNF449.Me9548.1.YY2017.HT-SE2;seq=CCCCCCCCCC;score=10.4571;pval=8.34e-05;Averageconservationscore=NA",
"Class=C2H2.zinc.finger.factors;Family=More.than.3.adjacent.zinc.finger.factors;strand=-;id=ZNF449.Me9548.1.YY2017.HT-SE2;seq=CCCCCCCCCC;score=10.4571;pval=8.34e-05;Averageconservationscore=NA",
"Class=C2H2.zinc.finger.factors;Family=More.than.3.adjacent.zinc.finger.factors;strand=-;id=ZNF449.Me9548.1.YY2017.HT-SE2;seq=CCCCCCCCCC;score=10.4571;pval=8.34e-05;Averageconservationscore=NA",
"Class=C2H2.zinc.finger.factors;Family=More.than.3.adjacent.zinc.finger.factors;strand=-;id=ZNF449.Me9548.1.YY2017.HT-SE2;seq=ACCCCCCCCC;score=10.8429;pval=6.74e-05;Averageconservationscore=NA"
), V5 = c(0L, 0L, 0L, 0L)), row.names = c(NA, 4L), class = "data.frame")
script
df$V4 <- strsplit(df$V4, '[,]')
df$V4 <- df$V4[[ ? ]][c(1,2,4,3,5,6,7,8)]
order col4 before (id after strand)
Class=C2H2.zinc.finger.factors;Family=More.than.3.adjacent.zinc.finger.factors;strand=-;id=YY1.Ca9487.2.YY2017.HT-SE2;seq=AGCCATCTTGTCTCACGAGTCCA;score=5.57576;pval=8.28e-05;Averageconservationscore=NA
order col4 after (id before strand)
Class=C2H2.zinc.finger.factors;Family=More.than.3.adjacent.zinc.finger.factors;id=YY1.Ca9487.2.YY2017.HT-SE2;strand=-;seq=AGCCATCTTGTCTCACGAGTCCA;score=5.57576;pval=8.28e-05;Averageconservationscore=NA
使用 strsplit
拆分字符串,使用 sapply
重新排序并粘贴回字符串。
df$V4 <- sapply(strsplit(df$V4, ';', fixed = TRUE), function(x)
paste0(x[c(1,2,4,3,5,6,7,8)], collapse = ';'))
我们可以在 gsub
中使用正则表达式而不拆分
df$V4 <- gsub("^([^;]+;[^;]+;)([^;]+);([^;]+)(.*)", "\1\3;\2\4", df$V4)