将列表列的*特定*元素提取到新列
Extract *specific* elements of a list column to new columns
在 R 中,我知道如何将(命名的)列表列的元素提取到单独的列中,前提是它们的长度相同:
library(tidyverse)
tib1 <- tibble(x = 1:3, y = list(list(a = 1, b = 2, c = 3),
list(a = 3, b = 4, c = 5),
list(a = 5, b = 6, c = 7)))
tib1
# A tibble: 3 x 2
x y
<int> <list>
1 1 <list [3]>
2 2 <list [3]>
3 3 <list [3]>
bind_cols(tib1[1], bind_rows(tib1$y))
# A tibble: 3 x 4
x a b c
<int> <dbl> <dbl> <dbl>
1 1 1.00 2.00 3.00
2 2 3.00 4.00 5.00
3 3 5.00 6.00 7.00
问题是一旦列表中的元素之一长度不同(这里a
):
tib2 <- tibble(x = 1:3, y = list(list(a = 1:2, b = 2, c = 3),
list(a = 3:4, b = 4, c = 5),
list(a = 5:6, b = 6, c = 7)))
bind_cols(tib2[1], bind_rows(tib2$y))
Error in bind_rows_(x, .id) : Argument 2 must be length 2, not 1
有没有一种优雅的方式告诉 R 在提取中不包含 a
,或者只包含 b
和 c
,或者只包含长度相同的元素 l
?希望以 "pipe-ish"、"tidyverse-ish" 的方式?
预期结果应该以某种方式保留 a
,或者只是保留整个 y
字段,以便我将来可以以某种方式访问它:
tibble(x = 1:3, y = list(list(a = 1:2, b = 2, c = 3),
list(a = 3:4, b = 4, c = 5),
list(a = 5:6, b = 6, c = 7)),
b = c(2, 4, 6),
c = c(3, 5, 7))
# A tibble: 3 x 4
x y b c
<int> <list> <dbl> <dbl>
1 1 <list [3]> 2.00 3.00
2 2 <list [3]> 4.00 5.00
3 3 <list [3]> 6.00 7.00
或者最好作为一个新的列表列:
tibble(x = 1:3,
a = list(1:2, 3:4, 5:6),
b = c(2, 4, 6),
c = c(3, 5, 7))
# A tibble: 3 x 4
x a b c
<int> <list> <dbl> <dbl>
1 1 <int [2]> 2.00 3.00
2 2 <int [2]> 4.00 5.00
3 3 <int [2]> 6.00 7.00
带有 tidyverse
的选项是 map
通过 list
列 'y',将其转换为 'tibble',然后 unnest
展开行,summarise
'a' 列在按其他列
分组后作为 list
library(tidyverse)
tib2 %>%
mutate(y = map(y, as_tibble)) %>%
unnest %>%
group_by(x, b, c) %>%
summarise(a = list(a)) %>%
select(x, a, b, c)
# A tibble: 3 x 4
# Groups: x, b [3]
# x a b c
# <int> <list> <dbl> <dbl>
#1 1 <int [2]> 2.00 3.00
#2 2 <int [2]> 4.00 5.00
#3 3 <int [2]> 6.00 7.00
这也是一个基本的 R 解决方案,
dd <- data.frame(x = tib2$x, t(do.call(cbind, tib2$y)))
这给出了,
x a b c
1 1 1, 2 2 3
2 2 3, 4 4 5
3 3 5, 6 6 7
检查结构,我们看到所有三列都是列表。
str(dd)
'data.frame': 3 obs. of 4 variables:
$ x: int 1 2 3
$ a:List of 3
..$ : int 1 2
..$ : int 3 4
..$ : int 5 6
$ b:List of 3
..$ : num 2
..$ : num 4
..$ : num 6
$ c:List of 3
..$ : num 3
..$ : num 5
..$ : num 7
如果您想取消列出 b
和 c
,那么只需
dd[-c(1, 2)] <- lapply(dd[-c(1, 2)], unlist)
给出结构:
str(dd)
'data.frame': 3 obs. of 4 variables:
$ x: int 1 2 3
$ a:List of 3
..$ : int 1 2
..$ : int 3 4
..$ : int 5 6
$ b: num 2 4 6
$ c: num 3 5 7
这是另一种可能的方法:
transpose(tib2$y) %>%
lapply(., function(x) if(all(lengths(x) == 1)) unlist(x, use.names = FALSE) else x) %>%
bind_cols(., tib2[1])
# # A tibble: 3 x 4
# a b c x
# <list> <dbl> <dbl> <int>
# 1 <int [2]> 2. 3. 1
# 2 <int [2]> 4. 5. 2
# 3 <int [2]> 6. 7. 3
坚持 "tidyverse",我想方法是:
transpose(tib2$y) %>%
map_if(~ all(lengths(.) == 1), unlist) %>%
bind_cols(., tib2[1])
# # A tibble: 3 x 4
# a b c x
# <list> <dbl> <dbl> <int>
# 1 <int [2]> 2. 3. 1
# 2 <int [2]> 4. 5. 2
# 3 <int [2]> 6. 7. 3
另一个tidyverse
选项:
library(tidyverse)
tib2 %>%
mutate(a = map(y, ~ .x[lengths(.x) > 1])) %>%
bind_cols(., map_dfr(.$y, ~ .x[lengths(.x) == 1])) %>%
select(-y)
给出:
# A tibble: 3 x 4
x a b c
<int> <list> <dbl> <dbl>
1 1 <list [1]> 2.00 3.00
2 2 <list [1]> 4.00 5.00
3 3 <list [1]> 6.00 7.00
另一个tidyverse
解决方案:
short <- which(lengths(tib2$y[[1]]) == 1)
long <- setdiff(seq_along(tib2$y[[1]]),short)
tib3 <- tib2 %>%
mutate(long = map(y,~.[long])) %>%
mutate(short = map(y,~.[short]))
bind_cols(tib2,tib3["long"], bind_rows(tib3$short))
# A tibble: 3 x 5
# x y long b c
# <int> <list> <list> <dbl> <dbl>
# 1 1 <list [3]> <list [1]> 2 3
# 2 2 <list [3]> <list [1]> 4 5
# 3 3 <list [3]> <list [1]> 6 7
在 R 中,我知道如何将(命名的)列表列的元素提取到单独的列中,前提是它们的长度相同:
library(tidyverse)
tib1 <- tibble(x = 1:3, y = list(list(a = 1, b = 2, c = 3),
list(a = 3, b = 4, c = 5),
list(a = 5, b = 6, c = 7)))
tib1
# A tibble: 3 x 2 x y <int> <list> 1 1 <list [3]> 2 2 <list [3]> 3 3 <list [3]>
bind_cols(tib1[1], bind_rows(tib1$y))
# A tibble: 3 x 4 x a b c <int> <dbl> <dbl> <dbl> 1 1 1.00 2.00 3.00 2 2 3.00 4.00 5.00 3 3 5.00 6.00 7.00
问题是一旦列表中的元素之一长度不同(这里a
):
tib2 <- tibble(x = 1:3, y = list(list(a = 1:2, b = 2, c = 3),
list(a = 3:4, b = 4, c = 5),
list(a = 5:6, b = 6, c = 7)))
bind_cols(tib2[1], bind_rows(tib2$y))
Error in bind_rows_(x, .id) : Argument 2 must be length 2, not 1
有没有一种优雅的方式告诉 R 在提取中不包含 a
,或者只包含 b
和 c
,或者只包含长度相同的元素 l
?希望以 "pipe-ish"、"tidyverse-ish" 的方式?
预期结果应该以某种方式保留 a
,或者只是保留整个 y
字段,以便我将来可以以某种方式访问它:
tibble(x = 1:3, y = list(list(a = 1:2, b = 2, c = 3),
list(a = 3:4, b = 4, c = 5),
list(a = 5:6, b = 6, c = 7)),
b = c(2, 4, 6),
c = c(3, 5, 7))
# A tibble: 3 x 4 x y b c <int> <list> <dbl> <dbl> 1 1 <list [3]> 2.00 3.00 2 2 <list [3]> 4.00 5.00 3 3 <list [3]> 6.00 7.00
或者最好作为一个新的列表列:
tibble(x = 1:3,
a = list(1:2, 3:4, 5:6),
b = c(2, 4, 6),
c = c(3, 5, 7))
# A tibble: 3 x 4 x a b c <int> <list> <dbl> <dbl> 1 1 <int [2]> 2.00 3.00 2 2 <int [2]> 4.00 5.00 3 3 <int [2]> 6.00 7.00
带有 tidyverse
的选项是 map
通过 list
列 'y',将其转换为 'tibble',然后 unnest
展开行,summarise
'a' 列在按其他列
list
library(tidyverse)
tib2 %>%
mutate(y = map(y, as_tibble)) %>%
unnest %>%
group_by(x, b, c) %>%
summarise(a = list(a)) %>%
select(x, a, b, c)
# A tibble: 3 x 4
# Groups: x, b [3]
# x a b c
# <int> <list> <dbl> <dbl>
#1 1 <int [2]> 2.00 3.00
#2 2 <int [2]> 4.00 5.00
#3 3 <int [2]> 6.00 7.00
这也是一个基本的 R 解决方案,
dd <- data.frame(x = tib2$x, t(do.call(cbind, tib2$y)))
这给出了,
x a b c 1 1 1, 2 2 3 2 2 3, 4 4 5 3 3 5, 6 6 7
检查结构,我们看到所有三列都是列表。
str(dd)
'data.frame': 3 obs. of 4 variables:
$ x: int 1 2 3
$ a:List of 3
..$ : int 1 2
..$ : int 3 4
..$ : int 5 6
$ b:List of 3
..$ : num 2
..$ : num 4
..$ : num 6
$ c:List of 3
..$ : num 3
..$ : num 5
..$ : num 7
如果您想取消列出 b
和 c
,那么只需
dd[-c(1, 2)] <- lapply(dd[-c(1, 2)], unlist)
给出结构:
str(dd)
'data.frame': 3 obs. of 4 variables:
$ x: int 1 2 3
$ a:List of 3
..$ : int 1 2
..$ : int 3 4
..$ : int 5 6
$ b: num 2 4 6
$ c: num 3 5 7
这是另一种可能的方法:
transpose(tib2$y) %>%
lapply(., function(x) if(all(lengths(x) == 1)) unlist(x, use.names = FALSE) else x) %>%
bind_cols(., tib2[1])
# # A tibble: 3 x 4
# a b c x
# <list> <dbl> <dbl> <int>
# 1 <int [2]> 2. 3. 1
# 2 <int [2]> 4. 5. 2
# 3 <int [2]> 6. 7. 3
坚持 "tidyverse",我想方法是:
transpose(tib2$y) %>%
map_if(~ all(lengths(.) == 1), unlist) %>%
bind_cols(., tib2[1])
# # A tibble: 3 x 4
# a b c x
# <list> <dbl> <dbl> <int>
# 1 <int [2]> 2. 3. 1
# 2 <int [2]> 4. 5. 2
# 3 <int [2]> 6. 7. 3
另一个tidyverse
选项:
library(tidyverse)
tib2 %>%
mutate(a = map(y, ~ .x[lengths(.x) > 1])) %>%
bind_cols(., map_dfr(.$y, ~ .x[lengths(.x) == 1])) %>%
select(-y)
给出:
# A tibble: 3 x 4
x a b c
<int> <list> <dbl> <dbl>
1 1 <list [1]> 2.00 3.00
2 2 <list [1]> 4.00 5.00
3 3 <list [1]> 6.00 7.00
另一个tidyverse
解决方案:
short <- which(lengths(tib2$y[[1]]) == 1)
long <- setdiff(seq_along(tib2$y[[1]]),short)
tib3 <- tib2 %>%
mutate(long = map(y,~.[long])) %>%
mutate(short = map(y,~.[short]))
bind_cols(tib2,tib3["long"], bind_rows(tib3$short))
# A tibble: 3 x 5
# x y long b c
# <int> <list> <list> <dbl> <dbl>
# 1 1 <list [3]> <list [1]> 2 3
# 2 2 <list [3]> <list [1]> 4 5
# 3 3 <list [3]> <list [1]> 6 7