让 r 忽略列中值出现的顺序(通过粘贴多列创建)
Make r ignore the order at which values appear in a column (created from pasting multiple columns)
给定一个可以取值 A,B,C,D
的变量 x
变量 x
的三列:
df1<-
rbind(c("A","B","C"),c("A","D","C"),c("B","A","C"),c("A","C","B"), c("B","C","A"), c("D","A","B"), c("A","B","D"), c("A","D","C"), c("A",NA,NA),c("D","A",NA),c("A","D",NA))
如何在前面的三个列中制作指示组合的列,以便排列(ABC、ACB、BAC)将被视为 ABC 的相同组合,(AD、DA)将被视为相同AD组合?
使用 apply(df1,1,function(x) paste(x[!is.na(x)], collapse=", ")->df1$x4
粘贴三列并使用 df1%>%group(x4)%>%summarize(c=count(x4))
将算作 AD,DA
不同而不是相同。
编辑标题
我想要的结果是
a<-cbind(c("ABC",4),c("ACD",2),c("ABD",2),c("A",1),c ("AD",2))
有人已经解决了我的问题。谢谢
您可以 apply
函数 paste
在对每个行向量进行排序后。
df1 <-
cbind(df1, apply(df1, 1, function(x) paste(sort(x), collapse = "")))
df1
# [,1] [,2] [,3] [,4]
# [1,] "A" "B" "C" "ABC"
# [2,] "A" "D" "C" "ACD"
# [3,] "B" "A" "C" "ABC"
# [4,] "A" "C" "B" "ABC"
# [5,] "B" "C" "A" "ABC"
# [6,] "D" "A" "B" "ABD"
# [7,] "A" "B" "D" "ABD"
# [8,] "A" "D" "C" "ACD"
# [9,] "A" NA NA "A"
#[10,] "D" "A" NA "AD"
#[11,] "A" "D" NA "AD"
您现在可以简单地 table
列,无需加载外部包和更复杂的管道。
table(df1[, 4])
#A ABC ABD ACD AD
#1 4 2 2 2
给定一个可以取值 A,B,C,D
x
变量 x
的三列:
df1<-
rbind(c("A","B","C"),c("A","D","C"),c("B","A","C"),c("A","C","B"), c("B","C","A"), c("D","A","B"), c("A","B","D"), c("A","D","C"), c("A",NA,NA),c("D","A",NA),c("A","D",NA))
如何在前面的三个列中制作指示组合的列,以便排列(ABC、ACB、BAC)将被视为 ABC 的相同组合,(AD、DA)将被视为相同AD组合?
使用 apply(df1,1,function(x) paste(x[!is.na(x)], collapse=", ")->df1$x4
粘贴三列并使用 df1%>%group(x4)%>%summarize(c=count(x4))
将算作 AD,DA
不同而不是相同。
编辑标题
我想要的结果是 a<-cbind(c("ABC",4),c("ACD",2),c("ABD",2),c("A",1),c ("AD",2))
有人已经解决了我的问题。谢谢
您可以 apply
函数 paste
在对每个行向量进行排序后。
df1 <-
cbind(df1, apply(df1, 1, function(x) paste(sort(x), collapse = "")))
df1
# [,1] [,2] [,3] [,4]
# [1,] "A" "B" "C" "ABC"
# [2,] "A" "D" "C" "ACD"
# [3,] "B" "A" "C" "ABC"
# [4,] "A" "C" "B" "ABC"
# [5,] "B" "C" "A" "ABC"
# [6,] "D" "A" "B" "ABD"
# [7,] "A" "B" "D" "ABD"
# [8,] "A" "D" "C" "ACD"
# [9,] "A" NA NA "A"
#[10,] "D" "A" NA "AD"
#[11,] "A" "D" NA "AD"
您现在可以简单地 table
列,无需加载外部包和更复杂的管道。
table(df1[, 4])
#A ABC ABD ACD AD
#1 4 2 2 2