如何在 R 中安排嵌套数据(即具有父级的数据)?
How to arrange nested data (i.e., data with parenting) in R?
我有一个多层次的数据集:
- 类别(例如,“国家/地区”)
- 国家(例如“美国”)
- 城市(例如“纽约”)
- 县(例如“曼哈顿”)
- 地点(例如“时代广场”)
每一行(LVL 1 条目除外)都链接到上一级的父项。
例如:时代广场->曼哈顿->纽约->美国->国家
我的问题:如何对这个数据集进行排序:
df2 <- structure(list(ID = c(3,6,9,11,12,19,411,50,77,83,105),
Parent = c(12,12,77,105,19,NA,3,41,19,77,19),
Level = c(3,3,3,3,2,1,4,5,2,3,2),
Name = c("New York","Boston","Oxford","Vancouver","USA","Countries",
"Manhattan","Times Square","UK","London","Canada")),
class = "data.frame",
row.names = c(NA, -11L))
进入这个:
df2 <- structure(list(ID = c(19,12,3,41,50,6,77,83,9,105,11),
Parent = c(NA,19,12,3,41,12,19,77,77,19,105),
Level = c(1,2,3,4,5,3,2,3,3,2,3),
Name = c("Countries","USA","New York","Manhattan","Times Square",
"Boston","UK","London","Oxford","Canada","Vancouver")),
class = "data.frame",
row.names = c(NA, -11L))
在df2
中,列表按级别优先排列,但每个链接的子级别都在正下方。
我尝试了几种 dyplr::arrange()
变体(例如 arrange(Level, Parent)
),但都无法解释嵌套数据。我认为解决方案可能是 group_by() 和使用 arrange( ,.by_group = TRUE) 的组合,如此处所做的那样 (R, dplyr - combination of group_by() and arrange() does not produce expected result?)。可惜我自己解决不了。
有人可以帮忙吗? tidyverse
/dplyr
解决方案将是首选:-)
这是一个使用igraph::dfs
的解决方案
library(igraph)
g <- with(na.omit(df2), graph.data.frame(cbind(Parent, ID), directed = TRUE))
data.frame(ID = as.integer(names(dfs(g, root = "19")$order))) |>
left_join(df2)
##> + Joining, by = "ID"
##> ID Parent Level Name
##> 1 19 NA 1 Countries
##> 2 12 19 2 USA
##> 3 3 12 3 New York
##> 4 41 3 4 Manhattan
##> 5 50 41 5 Times Square
##> 6 6 12 3 Boston
##> 7 77 19 2 UK
##> 8 9 77 3 Oxford
##> 9 83 77 3 London
##> 10 105 19 2 Canada
##> 11 11 105 3 Vancouver
我有一个多层次的数据集:
- 类别(例如,“国家/地区”)
- 国家(例如“美国”)
- 城市(例如“纽约”)
- 县(例如“曼哈顿”)
- 地点(例如“时代广场”)
每一行(LVL 1 条目除外)都链接到上一级的父项。
例如:时代广场->曼哈顿->纽约->美国->国家
我的问题:如何对这个数据集进行排序:
df2 <- structure(list(ID = c(3,6,9,11,12,19,411,50,77,83,105),
Parent = c(12,12,77,105,19,NA,3,41,19,77,19),
Level = c(3,3,3,3,2,1,4,5,2,3,2),
Name = c("New York","Boston","Oxford","Vancouver","USA","Countries",
"Manhattan","Times Square","UK","London","Canada")),
class = "data.frame",
row.names = c(NA, -11L))
进入这个:
df2 <- structure(list(ID = c(19,12,3,41,50,6,77,83,9,105,11),
Parent = c(NA,19,12,3,41,12,19,77,77,19,105),
Level = c(1,2,3,4,5,3,2,3,3,2,3),
Name = c("Countries","USA","New York","Manhattan","Times Square",
"Boston","UK","London","Oxford","Canada","Vancouver")),
class = "data.frame",
row.names = c(NA, -11L))
在df2
中,列表按级别优先排列,但每个链接的子级别都在正下方。
我尝试了几种 dyplr::arrange()
变体(例如 arrange(Level, Parent)
),但都无法解释嵌套数据。我认为解决方案可能是 group_by() 和使用 arrange( ,.by_group = TRUE) 的组合,如此处所做的那样 (R, dplyr - combination of group_by() and arrange() does not produce expected result?)。可惜我自己解决不了。
有人可以帮忙吗? tidyverse
/dplyr
解决方案将是首选:-)
这是一个使用igraph::dfs
library(igraph)
g <- with(na.omit(df2), graph.data.frame(cbind(Parent, ID), directed = TRUE))
data.frame(ID = as.integer(names(dfs(g, root = "19")$order))) |>
left_join(df2)
##> + Joining, by = "ID"
##> ID Parent Level Name
##> 1 19 NA 1 Countries
##> 2 12 19 2 USA
##> 3 3 12 3 New York
##> 4 41 3 4 Manhattan
##> 5 50 41 5 Times Square
##> 6 6 12 3 Boston
##> 7 77 19 2 UK
##> 8 9 77 3 Oxford
##> 9 83 77 3 London
##> 10 105 19 2 Canada
##> 11 11 105 3 Vancouver