将带有列表列的数据框保存为 csv 文件
Save a data frame with list-columns as csv file
我有以下看起来像这样的数据框(3 列作为列表)。
A tibble: 14 x 4
clinic_name drop_in_hours appointment_hours services
<chr> <list> <list> <list>
1 Birth Control and Sexual Health Centre <list [1]> <list [1]> <list [1]>
2 Black Creek Community Health Centre (Sheridan Mall Site) <list [1]> <list [1]> <list [1]>
3 Black Creek Community Health Centre (Yorkgate mall Site) <list [1]> <list [1]> <list [1]>
4 Crossways Clinic <list [1]> <list [1]> <list [1]>
5 Hassle Free Clinic <list [1]> <list [1]> <list [1]>
6 Immigrant Women's Health Center <list [1]> <list [1]> <list [1]>
7 Rexdale Community Health Center <list [1]> <list [1]> <list [1]>
8 Rexdale Youth Resource Center <list [1]> <list [1]> <list [1]>
9 Scarborough Sexual Health Clinic <list [1]> <list [1]> <list [1]>
10 Special Treatment Clinic <list [1]> <list [1]> <list [1]>
11 Taibu Community Health Center <list [1]> <list [1]> <list [1]>
12 The Gate <list [1]> <list [1]> <list [1]>
13 The Jane Street Clinic <list [1]> <list [1]> <list [1]>
14 The Talk Shop <list [1]> <list [1]> <list [1]>
我想将其输出为 csv 文件。我注意到数据框的列不应该是 R 中的列表。所以我做了一些 google 并发现了这个 save data.frames with list-column 所以我试了一下:
library(tidyverse)
df %>%
mutate(drop_in_hours = map_chr(drop_in_hours, ~ capture.output(dput(.))),
appointment_hours = map_chr(appointment_hours, ~ capture.output(dput(.))),
services = map_chr(services, ~ capture.output(dput(.))) ) %>%
write_csv("health.csv")
但是我得到一个错误,我是不是遗漏了什么?
Error in mutate_impl(.data, dots) :
Evaluation error: Result 4 is not a length 1 atomic vector
.
您想将列保存为列表有什么具体原因吗?或者,您可以使用 unnest
并将其保存在 csv 中。下面的例子
library(tidyverse)
df_list<-data_frame(abc = letters[1:3], lst = list(1:3, 1:3, 1:3))
df_list %>% unnest() %>% write.csv("list.csv")
此外,当您阅读该文件时,您可以nest
将其返回
df <- read.csv("list.csv")[ ,2:3]
df %>% nest(lst)
exploratory::list_to_text()
会将 list
列转换为 character
列。默认值为 sep = ", "
,如果写入 .csv,我建议更改为其他内容。
devtools::install_github("exploratory-io/exploratory_func")
list_to_text <- function(column, sep = ", "){
loadNamespace("stringr")
ret <- sapply(column, function(x) {
ret <- stringr::str_c(x, collapse = sep)
if(identical(ret, character(0))){
# if it's character(0)
NA
} else {
ret
}
})
as.character(ret)
}
https://github.com/exploratory-io/exploratory_func/blob/master/LICENSE.md
创建包含列表列的标题:
library(tibble)
clinic_name <- c('bobo center', 'yoyo plaza', 'lolo market')
drop_in_hours <- list(c("Monday: 2 pm - 5 pm", "Tuesday: 4 pm - 7 pm"))
appointment_hours <- list(c("Monday: 1 pm - 2 pm", "Tuesday: 2 pm - 3 pm"))
services <- list(c("skin graft", "chicken heart replacement"))
tibb <- data_frame(clinic_name, drop_in_hours, appointment_hours, services)
print(tibb)
编写一个 general-purpose 函数,将任何列表列转换为字符类型:
set_lists_to_chars <- function(x) {
if(class(x) == 'list') {
y <- paste(unlist(x[1]), sep='', collapse=', ')
} else {
y <- x
}
return(y)
}
将函数应用于带有列表列的小标题:
new_frame <- data.frame(lapply(tibb, set_lists_to_chars), stringsAsFactors = F)
new_frame
将新格式化的数据框写入 csv 文件:
write.csv(new_frame, file='Desktop/clinics.csv')
这是一个 csv 文件,其中的列表列展开为常规字符串。
这是一个all-encompassing函数。只需传递您的标题和文件名:
tibble_with_lists_to_csv <- function(tibble_object, file_path_name) {
set_lists_to_chars <- function(x) {
if(class(x) == 'list') { y <- paste(unlist(x[1]), sep='', collapse=', ') } else { y <- x }
return(y) }
new_frame <- data.frame(lapply(tibble_object, set_lists_to_chars), stringsAsFactors = F)
write.csv(new_frame, file=file_path_name)
}
用法:
tibble_with_lists_to_csv(tibb, '~/Desktop/tibb.csv')
我有一个类似的数据框,其中包含我想保存为 csv 的列表列。我想出了这个方法。以及如何将列转回列表。
library(tidyverse)
# create a df with a list column
df <- tibble(x=rep(1:5,each=2), y=LETTERS[1:10]) %>%
group_by(x) %>%
summarise(z=list(y))
# this throws an error
write_csv(df, "test.csv")
# convert the list column to a string
df2 <- df %>%
group_by(x) %>% # where x==unique(x)
mutate(z=paste(z))
# this works
write_csv(df2, "test.csv")
# read the csv
df3 <- read_csv("test.csv")
# reconstruct original df by parsing the strings
#
df4 <- df3 %>%
group_by(x) %>%
mutate(z=list(eval(parse(text=z))))
这是另一个可能更简单的选项。
根据数据,逗号分隔值可能会变得复杂,因此我使用横线 |
来分隔列表列中的值:
library(tidyverse)
starwars %>%
rowwise() %>%
mutate_if(is.list, ~paste(unlist(.), collapse = '|')) %>%
write.csv('df_starwars.csv', row.names = FALSE)
starwars
是 dplyr
示例数据帧之一。
我有以下看起来像这样的数据框(3 列作为列表)。
A tibble: 14 x 4
clinic_name drop_in_hours appointment_hours services
<chr> <list> <list> <list>
1 Birth Control and Sexual Health Centre <list [1]> <list [1]> <list [1]>
2 Black Creek Community Health Centre (Sheridan Mall Site) <list [1]> <list [1]> <list [1]>
3 Black Creek Community Health Centre (Yorkgate mall Site) <list [1]> <list [1]> <list [1]>
4 Crossways Clinic <list [1]> <list [1]> <list [1]>
5 Hassle Free Clinic <list [1]> <list [1]> <list [1]>
6 Immigrant Women's Health Center <list [1]> <list [1]> <list [1]>
7 Rexdale Community Health Center <list [1]> <list [1]> <list [1]>
8 Rexdale Youth Resource Center <list [1]> <list [1]> <list [1]>
9 Scarborough Sexual Health Clinic <list [1]> <list [1]> <list [1]>
10 Special Treatment Clinic <list [1]> <list [1]> <list [1]>
11 Taibu Community Health Center <list [1]> <list [1]> <list [1]>
12 The Gate <list [1]> <list [1]> <list [1]>
13 The Jane Street Clinic <list [1]> <list [1]> <list [1]>
14 The Talk Shop <list [1]> <list [1]> <list [1]>
我想将其输出为 csv 文件。我注意到数据框的列不应该是 R 中的列表。所以我做了一些 google 并发现了这个 save data.frames with list-column 所以我试了一下:
library(tidyverse)
df %>%
mutate(drop_in_hours = map_chr(drop_in_hours, ~ capture.output(dput(.))),
appointment_hours = map_chr(appointment_hours, ~ capture.output(dput(.))),
services = map_chr(services, ~ capture.output(dput(.))) ) %>%
write_csv("health.csv")
但是我得到一个错误,我是不是遗漏了什么?
Error in mutate_impl(.data, dots) :
Evaluation error: Result 4 is not a length 1 atomic vector
.
您想将列保存为列表有什么具体原因吗?或者,您可以使用 unnest
并将其保存在 csv 中。下面的例子
library(tidyverse)
df_list<-data_frame(abc = letters[1:3], lst = list(1:3, 1:3, 1:3))
df_list %>% unnest() %>% write.csv("list.csv")
此外,当您阅读该文件时,您可以nest
将其返回
df <- read.csv("list.csv")[ ,2:3]
df %>% nest(lst)
exploratory::list_to_text()
会将 list
列转换为 character
列。默认值为 sep = ", "
,如果写入 .csv,我建议更改为其他内容。
devtools::install_github("exploratory-io/exploratory_func")
list_to_text <- function(column, sep = ", "){
loadNamespace("stringr")
ret <- sapply(column, function(x) {
ret <- stringr::str_c(x, collapse = sep)
if(identical(ret, character(0))){
# if it's character(0)
NA
} else {
ret
}
})
as.character(ret)
}
https://github.com/exploratory-io/exploratory_func/blob/master/LICENSE.md
创建包含列表列的标题:
library(tibble)
clinic_name <- c('bobo center', 'yoyo plaza', 'lolo market')
drop_in_hours <- list(c("Monday: 2 pm - 5 pm", "Tuesday: 4 pm - 7 pm"))
appointment_hours <- list(c("Monday: 1 pm - 2 pm", "Tuesday: 2 pm - 3 pm"))
services <- list(c("skin graft", "chicken heart replacement"))
tibb <- data_frame(clinic_name, drop_in_hours, appointment_hours, services)
print(tibb)
编写一个 general-purpose 函数,将任何列表列转换为字符类型:
set_lists_to_chars <- function(x) {
if(class(x) == 'list') {
y <- paste(unlist(x[1]), sep='', collapse=', ')
} else {
y <- x
}
return(y)
}
将函数应用于带有列表列的小标题:
new_frame <- data.frame(lapply(tibb, set_lists_to_chars), stringsAsFactors = F)
new_frame
将新格式化的数据框写入 csv 文件:
write.csv(new_frame, file='Desktop/clinics.csv')
这是一个 csv 文件,其中的列表列展开为常规字符串。
这是一个all-encompassing函数。只需传递您的标题和文件名:
tibble_with_lists_to_csv <- function(tibble_object, file_path_name) {
set_lists_to_chars <- function(x) {
if(class(x) == 'list') { y <- paste(unlist(x[1]), sep='', collapse=', ') } else { y <- x }
return(y) }
new_frame <- data.frame(lapply(tibble_object, set_lists_to_chars), stringsAsFactors = F)
write.csv(new_frame, file=file_path_name)
}
用法:
tibble_with_lists_to_csv(tibb, '~/Desktop/tibb.csv')
我有一个类似的数据框,其中包含我想保存为 csv 的列表列。我想出了这个方法。以及如何将列转回列表。
library(tidyverse)
# create a df with a list column
df <- tibble(x=rep(1:5,each=2), y=LETTERS[1:10]) %>%
group_by(x) %>%
summarise(z=list(y))
# this throws an error
write_csv(df, "test.csv")
# convert the list column to a string
df2 <- df %>%
group_by(x) %>% # where x==unique(x)
mutate(z=paste(z))
# this works
write_csv(df2, "test.csv")
# read the csv
df3 <- read_csv("test.csv")
# reconstruct original df by parsing the strings
#
df4 <- df3 %>%
group_by(x) %>%
mutate(z=list(eval(parse(text=z))))
这是另一个可能更简单的选项。
根据数据,逗号分隔值可能会变得复杂,因此我使用横线 |
来分隔列表列中的值:
library(tidyverse)
starwars %>%
rowwise() %>%
mutate_if(is.list, ~paste(unlist(.), collapse = '|')) %>%
write.csv('df_starwars.csv', row.names = FALSE)
starwars
是 dplyr
示例数据帧之一。