在嵌套数据(tibble 嵌套命名列表)中,我如何根据嵌套值的存在来改变某个深度级别的新列
In nested data (tibble nesting a named list), how could I mutate a new column in certain depth level based on existence of nested values
当我的数据中包含嵌套列表时,我想改变特定级别的新列,以更深的嵌套级别中存在的值为条件。具体来说,被嵌套的对象是一个命名列表,我想检查该列表的 names 以确定数据中是否存在信息。
编辑——我替换了示例中的数据
虽然@Ronak 的回答确实用最初给出的数据回答了我的问题,但我意识到我犯了一个错误,玩具数据示例没有正确反映我的数据结构。
以下是正确反映我的情况的数据。
library(tibble)
df_correct <-
structure(
list(
var_name = c("age", "classes"),
title = c("What is your age?",
"what classes have you taken?"),
class_descriptions = list(
NULL,
list(
History = "History of Art",
Chemistry = "Organic Chemistry",
other = "Other Classes"
)
)
),
row.names = c(NA,-2L),
class = c("tbl_df",
"tbl", "data.frame")
)
## # A tibble: 2 x 3
## var_name title class_descriptions
## <chr> <chr> <list>
## 1 age What is your age? <NULL>
## 2 classes what classes have you taken? <named list [3]> <--- this list is what I need to check against
df_correct %>%
unnest_wider(class_descriptions)
## # A tibble: 2 x 5
## based on whether
## "History" exists
## ↓
## var_name title History Chemistry other
## <chr> <chr> <chr> <chr> <chr>
## 1 age What is your age? NA NA NA
## 2 classes what classes have you taken? History of Art Organic Chemistry Other Classes
所以给定 df_correct
,不一定要使用 unnest_wider
(这只是为了显示嵌套数据的结构),我怎样才能在 df_correct
中改变一个新列考虑“历史”是否出现在 class_descriptions
?
期望输出 -- 已更新
# A tibble: 2 x 4
var_name title class_descriptions has_taken_history
<chr> <chr> <list> <lgl>
1 age What is your age? <NULL> NA
2 classes what classes have you taken? <named list [3]> TRUE
更多关于所需的解决方案(而不是输出)
我提出这个问题的希望是找到一种方法在 df_correct
中添加另一列来判断字符串是否存在于命名列表 names class_descriptions
.换句话说,我正在寻找一个需要 2 个输入的解决方案 only:
- 要搜索的内容(在此示例中,字符串
"History"
)
- 搜索位置(在此示例中,是嵌套在
df_correct
中的命名列表 class_descriptions
的名称)。
如果找到字符串,则在 df_correct
的新列中填充 TRUE
,否则,填充 FALSE
.
编辑 2
df_correct %>%
mutate(has_taken_history = map_lgl(class_descriptions,
~'History' %in% names(.x)))
# var_name title class_descriptions has_taken_history
# <chr> <chr> <list> <lgl>
#1 age What is your age? <NULL> FALSE
#2 classes what classes have you taken? <named list [3]> TRUE
编辑 1
对于编辑后的数据你可以做:
library(tidyverse)
df_correct %>%
mutate(class_descriptions1 = class_descriptions) %>%
unnest_wider(class_descriptions) %>%
mutate(across(History:other, ~ifelse(is.na(.), NA, TRUE))) %>%
dplyr::select(var_name, title, class_descriptions = class_descriptions1, has_taken_history = History)
# var_name title class_descriptions has_taken_history
# <chr> <chr> <list> <lgl>
#1 age What is your age? <NULL> NA
#2 classes what classes have you taken? <named list [3]> TRUE
您可以只保留输出中需要的主题。
原答案
您可以使用 map_lgl
返回一个逻辑向量:
df %>%
unnest_wider(info) %>%
mutate(has_taken_history = map_lgl(classes_taken, ~"History" %in% .x),
has_taken_chemistry = map_lgl(classes_taken, ~"Chemistry" %in% .x))
# student_name location year_born classes_taken has_taken_history has_taken_chemistry
# <chr> <chr> <dbl> <list> <lgl> <lgl>
#1 John San Francisco 2000 <chr [4]> TRUE FALSE
#2 Sarah Miami 2002 <chr [4]> TRUE TRUE
所有主题的更通用的解决方案是 unnest
主题并以宽格式获取数据。
df %>%
unnest_wider(info) %>%
unnest(classes_taken) %>%
mutate(value = TRUE) %>%
pivot_wider(names_from = classes_taken, values_from = value, values_fill = FALSE)
# student_name location year_born Astronomy Cosmology History Robotics Chemistry Biology Zoology
# <chr> <chr> <dbl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
#1 John San Francisco 2000 TRUE TRUE TRUE TRUE FALSE FALSE FALSE
#2 Sarah Miami 2002 FALSE FALSE TRUE FALSE TRUE TRUE TRUE
当我的数据中包含嵌套列表时,我想改变特定级别的新列,以更深的嵌套级别中存在的值为条件。具体来说,被嵌套的对象是一个命名列表,我想检查该列表的 names 以确定数据中是否存在信息。
编辑——我替换了示例中的数据
虽然@Ronak 的回答确实用最初给出的数据回答了我的问题,但我意识到我犯了一个错误,玩具数据示例没有正确反映我的数据结构。 以下是正确反映我的情况的数据。
library(tibble)
df_correct <-
structure(
list(
var_name = c("age", "classes"),
title = c("What is your age?",
"what classes have you taken?"),
class_descriptions = list(
NULL,
list(
History = "History of Art",
Chemistry = "Organic Chemistry",
other = "Other Classes"
)
)
),
row.names = c(NA,-2L),
class = c("tbl_df",
"tbl", "data.frame")
)
## # A tibble: 2 x 3
## var_name title class_descriptions
## <chr> <chr> <list>
## 1 age What is your age? <NULL>
## 2 classes what classes have you taken? <named list [3]> <--- this list is what I need to check against
df_correct %>%
unnest_wider(class_descriptions)
## # A tibble: 2 x 5
## based on whether
## "History" exists
## ↓
## var_name title History Chemistry other
## <chr> <chr> <chr> <chr> <chr>
## 1 age What is your age? NA NA NA
## 2 classes what classes have you taken? History of Art Organic Chemistry Other Classes
所以给定 df_correct
,不一定要使用 unnest_wider
(这只是为了显示嵌套数据的结构),我怎样才能在 df_correct
中改变一个新列考虑“历史”是否出现在 class_descriptions
?
期望输出 -- 已更新
# A tibble: 2 x 4
var_name title class_descriptions has_taken_history
<chr> <chr> <list> <lgl>
1 age What is your age? <NULL> NA
2 classes what classes have you taken? <named list [3]> TRUE
更多关于所需的解决方案(而不是输出)
我提出这个问题的希望是找到一种方法在 df_correct
中添加另一列来判断字符串是否存在于命名列表 names class_descriptions
.换句话说,我正在寻找一个需要 2 个输入的解决方案 only:
- 要搜索的内容(在此示例中,字符串
"History"
) - 搜索位置(在此示例中,是嵌套在
df_correct
中的命名列表class_descriptions
的名称)。
如果找到字符串,则在 df_correct
的新列中填充 TRUE
,否则,填充 FALSE
.
编辑 2
df_correct %>%
mutate(has_taken_history = map_lgl(class_descriptions,
~'History' %in% names(.x)))
# var_name title class_descriptions has_taken_history
# <chr> <chr> <list> <lgl>
#1 age What is your age? <NULL> FALSE
#2 classes what classes have you taken? <named list [3]> TRUE
编辑 1
对于编辑后的数据你可以做:
library(tidyverse)
df_correct %>%
mutate(class_descriptions1 = class_descriptions) %>%
unnest_wider(class_descriptions) %>%
mutate(across(History:other, ~ifelse(is.na(.), NA, TRUE))) %>%
dplyr::select(var_name, title, class_descriptions = class_descriptions1, has_taken_history = History)
# var_name title class_descriptions has_taken_history
# <chr> <chr> <list> <lgl>
#1 age What is your age? <NULL> NA
#2 classes what classes have you taken? <named list [3]> TRUE
您可以只保留输出中需要的主题。
原答案
您可以使用 map_lgl
返回一个逻辑向量:
df %>%
unnest_wider(info) %>%
mutate(has_taken_history = map_lgl(classes_taken, ~"History" %in% .x),
has_taken_chemistry = map_lgl(classes_taken, ~"Chemistry" %in% .x))
# student_name location year_born classes_taken has_taken_history has_taken_chemistry
# <chr> <chr> <dbl> <list> <lgl> <lgl>
#1 John San Francisco 2000 <chr [4]> TRUE FALSE
#2 Sarah Miami 2002 <chr [4]> TRUE TRUE
所有主题的更通用的解决方案是 unnest
主题并以宽格式获取数据。
df %>%
unnest_wider(info) %>%
unnest(classes_taken) %>%
mutate(value = TRUE) %>%
pivot_wider(names_from = classes_taken, values_from = value, values_fill = FALSE)
# student_name location year_born Astronomy Cosmology History Robotics Chemistry Biology Zoology
# <chr> <chr> <dbl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
#1 John San Francisco 2000 TRUE TRUE TRUE TRUE FALSE FALSE FALSE
#2 Sarah Miami 2002 FALSE FALSE TRUE FALSE TRUE TRUE TRUE