使用 dplyr 进行多级排序
Multilevel ordering with dplyr
我有以下数据框:
tdf <- structure(list(GO = c("Cytokine-cytokine receptor interaction",
"Cytokine-cytokine receptor interaction|Endocytosis", "I-kappaB kinase/NF-kappaB signaling",
"NF-kappa B signaling pathway", "NF-kappaB import into nucleus",
"T cell chemotaxis"), PosCount = c(17, 18, 4, 5, 1, 2), shortgo = structure(c(1L,
1L, 2L, 2L, 2L, 3L), .Label = c("z", "X", "y"), class = "factor")), .Names = c("GO",
"PosCount", "shortgo"), row.names = c(NA, 6L), class = "data.frame")
看起来像这样:
GO PosCount shortgo
1 Cytokine-cytokine receptor interaction 17 z
2 Cytokine-cytokine receptor interaction|Endocytosis 18 z
3 I-kappaB kinase/NF-kappaB signaling 4 X
4 NF-kappa B signaling pathway 5 X
5 NF-kappaB import into nucleus 1 X
6 T cell chemotaxis 2 y
然后我想做的是首先按 shortgo
字母顺序排序 - 不区分大小写 - 然后对每个 shortgo
组在内部按 PosCount
排序。产生这个:
GO PosCount shortgo
NF-kappa B signaling pathway 5 X
I-kappaB kinase/NF-kappaB signaling 4 X
NF-kappaB import into nucleus 1 X
T cell chemotaxis 2 y
Cytokine-cytokine receptor interaction|Endocytosis 18 z
Cytokine-cytokine receptor interaction 17 z
但是为什么这不起作用:
library(dplyr)
tdf[order(tdf$shortgo),]
tdf <- tdf %>% group_by(shortgo) %>% arrange(desc(PosCount))
正确的做法是什么?
您只需将它们组合成一个调用即可。尽管您需要先将 shortgo
转换为 character
class(解释见下文)
tdf %>%
arrange(as.character(shortgo), desc(PosCount))
# GO PosCount shortgo
# 1 NF-kappa B signaling pathway 5 x
# 2 I-kappaB kinase/NF-kappaB signaling 4 x
# 3 NF-kappaB import into nucleus 1 x
# 4 T cell chemotaxis 2 y
# 5 Cytokine-cytokine receptor interaction|Endocytosis 18 z
# 6 Cytokine-cytokine receptor interaction 17 z
所以您需要转换为字符的原因是因为 shortgo
是一个因子,它基本上是具有 levels
属性的 integer
向量。所以 order
使用这些整数来排序你的向量。在您的情况下,整数不对应于级别的正确顺序
tdf$shortgo
## [1] z z x x x y
## Levels: z x y
as.numeric(tdf$shortgo)
## [1] 1 1 2 2 2 3
所以你可以看到 z
编码为 1,x
编码为 2,y
编码为 3 而它应该是 3、2、1。因此 sort
returns "wrong" 结果
sort(tdf$shortgo)
# 1] z z x x x y
# Levels: z x y
比较
test <- factor(sort(as.character(tdf$shortgo)))
sort(test)
## [1] x x x y z z
## Levels: x y z
您可以使用 order
基础 R
:
with(tdf, tdf[order(tolower(shortgo), -PosCount),])
# GO PosCount shortgo
#4 NF-kappa B signaling pathway 5 X
#3 I-kappaB kinase/NF-kappaB signaling 4 X
#5 NF-kappaB import into nucleus 1 X
#6 T cell chemotaxis 2 y
#2 Cytokine-cytokine receptor interaction|Endocytosis 18 z
#1 Cytokine-cytokine receptor interaction 17 z
我有以下数据框:
tdf <- structure(list(GO = c("Cytokine-cytokine receptor interaction",
"Cytokine-cytokine receptor interaction|Endocytosis", "I-kappaB kinase/NF-kappaB signaling",
"NF-kappa B signaling pathway", "NF-kappaB import into nucleus",
"T cell chemotaxis"), PosCount = c(17, 18, 4, 5, 1, 2), shortgo = structure(c(1L,
1L, 2L, 2L, 2L, 3L), .Label = c("z", "X", "y"), class = "factor")), .Names = c("GO",
"PosCount", "shortgo"), row.names = c(NA, 6L), class = "data.frame")
看起来像这样:
GO PosCount shortgo
1 Cytokine-cytokine receptor interaction 17 z
2 Cytokine-cytokine receptor interaction|Endocytosis 18 z
3 I-kappaB kinase/NF-kappaB signaling 4 X
4 NF-kappa B signaling pathway 5 X
5 NF-kappaB import into nucleus 1 X
6 T cell chemotaxis 2 y
然后我想做的是首先按 shortgo
字母顺序排序 - 不区分大小写 - 然后对每个 shortgo
组在内部按 PosCount
排序。产生这个:
GO PosCount shortgo
NF-kappa B signaling pathway 5 X
I-kappaB kinase/NF-kappaB signaling 4 X
NF-kappaB import into nucleus 1 X
T cell chemotaxis 2 y
Cytokine-cytokine receptor interaction|Endocytosis 18 z
Cytokine-cytokine receptor interaction 17 z
但是为什么这不起作用:
library(dplyr)
tdf[order(tdf$shortgo),]
tdf <- tdf %>% group_by(shortgo) %>% arrange(desc(PosCount))
正确的做法是什么?
您只需将它们组合成一个调用即可。尽管您需要先将 shortgo
转换为 character
class(解释见下文)
tdf %>%
arrange(as.character(shortgo), desc(PosCount))
# GO PosCount shortgo
# 1 NF-kappa B signaling pathway 5 x
# 2 I-kappaB kinase/NF-kappaB signaling 4 x
# 3 NF-kappaB import into nucleus 1 x
# 4 T cell chemotaxis 2 y
# 5 Cytokine-cytokine receptor interaction|Endocytosis 18 z
# 6 Cytokine-cytokine receptor interaction 17 z
所以您需要转换为字符的原因是因为 shortgo
是一个因子,它基本上是具有 levels
属性的 integer
向量。所以 order
使用这些整数来排序你的向量。在您的情况下,整数不对应于级别的正确顺序
tdf$shortgo
## [1] z z x x x y
## Levels: z x y
as.numeric(tdf$shortgo)
## [1] 1 1 2 2 2 3
所以你可以看到 z
编码为 1,x
编码为 2,y
编码为 3 而它应该是 3、2、1。因此 sort
returns "wrong" 结果
sort(tdf$shortgo)
# 1] z z x x x y
# Levels: z x y
比较
test <- factor(sort(as.character(tdf$shortgo)))
sort(test)
## [1] x x x y z z
## Levels: x y z
您可以使用 order
基础 R
:
with(tdf, tdf[order(tolower(shortgo), -PosCount),])
# GO PosCount shortgo
#4 NF-kappa B signaling pathway 5 X
#3 I-kappaB kinase/NF-kappaB signaling 4 X
#5 NF-kappaB import into nucleus 1 X
#6 T cell chemotaxis 2 y
#2 Cytokine-cytokine receptor interaction|Endocytosis 18 z
#1 Cytokine-cytokine receptor interaction 17 z