删除包含匹配数字字符串的行

Question

我有一个包含 3 列的数据框：

df

A             B               C
round1    test1        testing1
round1    test1        testing2
round1    test1        testing3
round1    test1        testing4
round1    test1        testing5
round2    test2        testing1
round2    test2        testing2
round2    test2        testing3
round2    test2        testing4
round2    test2        testing5
.
.
.
.
.
round100  test30       testing30
round100  test30       testing31

如何删除 B 和 C 列的字符串中的数值匹配的行？

Answer 1

只需提取数字部分并进行比较。

NumB = sub("\D+(\d+).*", "\1", DAT$B)
NumC = sub("\D+(\d+).*", "\1", DAT$C)
DAT = DAT[NumB != NumC,]

数据

DAT = read.table(text="A       B     C
round1    test1        testing1
round1    test1        testing2
round1    test1        testing3
round1    test1        testing4
round1    test1        testing5
round2    test2        testing1
round2    test2        testing2
round2    test2        testing3
round2    test2        testing4
round2    test2        testing5",
header=TRUE, stringsAsFactors = FALSE)

Answer 2

将非数字 "\D" 替换为空字符串并比较剩下的内容：

subset(DF, gsub("\D", "", B) != gsub("\D", "", C))

给出输入 DF 在下面的注释中重复显示的位置：

          A      B         C
2    round1  test1  testing2
3    round1  test1  testing3
4    round1  test1  testing4
5    round1  test1  testing5
6    round2  test2  testing1
8    round2  test2  testing3
9    round2  test2  testing4
10   round2  test2  testing5
12 round100 test30 testing31

备注

可重现形式的输入是：

Lines <- "
A             B               C
round1    test1        testing1
round1    test1        testing2
round1    test1        testing3
round1    test1        testing4
round1    test1        testing5
round2    test2        testing1
round2    test2        testing2
round2    test2        testing3
round2    test2        testing4
round2    test2        testing5
round100  test30       testing30
round100  test30       testing31"
DF <- read.table(text = Lines, header = TRUE)

删除包含匹配数字字符串的行

remove rows containing matching numeric strings

r

gsub

dplyr

数据

备注