在多个数据帧上嵌套函数
Nest function over multiple dataframes
我有几个与公司顾问关系的数据框,一个对应感兴趣的一年。
例如,2015 年的数据框如下所示。我们称它为 advisors2015(然后我还有 advisors2014、advisors2013、advisors2012 等等):
> advisors2015
[,1] [,2] [,3] [,4]
colnam "Mark" "Company.name" "Company.ID" "Advisor.Name"
row1 "1" "VOLKSWAGEN AG" "DE2070000543" "PRICEWATERHOUSECOOPERS"
row2 " " "VOLKSWAGEN AG" "DE2070000543" "PWC DEUTSCHE REVISION"
row3 " " "VOLKSWAGEN AG" "DE2070000543" "C&L TREUARBEIT REVISION"
row4 "2" "ROYAL DUTCH SHELL PLC" "GB04366849" "LLOYDS TSB REGISTRARS"
row4 "2" "ROYAL DUTCH SHELL PLC" "GB04366849" "LLOYDS TSB REGISTRARS"
row5 " " "ROYAL DUTCH SHELL PLC" "GB04366849" "PRICEWATERHOUSECOOPERS"
row6 " " "ROYAL DUTCH SHELL PLC" "GB04366849" "KPMG ACCOUNTANTS NV"
row7 " " "ROYAL DUTCH SHELL PLC" "GB04366849" "ERNST & YOUNG"
row8 "3" "BP PLC" "GB00102498" "CAPITA ASSET SERVICES"
And this is for 2014:
> advisors2014
[,1] [,2] [,3] [,4]
colnam "Mark" "Company.name" "Company.ID" "Advisor.Name"
row1 "1" "VOLKSWAGEN AG" "DE2070000543" "PRICEWATERHOUSECOOPERS"
row2 " " "VOLKSWAGEN AG" "DE2070000543" "PWC DEUTSCHE REVISION"
row3 " " "VOLKSWAGEN AG" "DE2070000543" "C&L TREUARBEIT REVISION"
row4 "2" "ROYAL DUTCH SHELL PLC" "GB04366849" "LLOYDS TSB REGISTRARS"
row5 " " "ROYAL DUTCH SHELL PLC" "GB04366849" "PRICEWATERHOUSECOOPERS"
row6 "3" "BP PLC" "GB00102498" "CAPITA ASSET SERVICES"
row7 "4" "COCACOLA" "GB111222333" " "
如您所见,每家公司可能有一名或多名顾问。当然它们也可能随着时间的推移而改变:今年(这意味着在这个数据框中)大众汽车有 3 名顾问,但明年它可能只有一个,或者用其他人替换其中一些。
为了跟踪所有这些变化,我想要一个数据框,在其中为每个 company/year 观察保存顾问列表。
我知道我们可以使用 nest
函数来做到这一点,但据我了解,它用于从 相同数据帧 中的列创建列表,而我有多个数据帧,比如说 10 个,就像上面的那样。
谁能帮我解决这个问题?非常感谢。
如果您要查找其中列为 year
、Company.name
的单个数据框和一个包含列表的列,其中每个元素都是一个包含当年行的数据框Company.name 然后:
library(dplyr)
library(purrr)
library(tidyr)
ls(pattern = "^advisors\d{4}$", envir = .GlobalEnv) %>%
mget(envir = .GlobalEnv) %>%
map_dfr(as.data.frame.matrix, .id = "year") %>%
mutate(year = sub("advisors", "", year) %>% as.numeric) %>%
nest(-c(year, Company.name))
给予:
# A tibble: 6 x 3
year Company.name data
<dbl> <fct> <list>
1 2015. VOLKSWAGEN AG <data.frame [3 x 3]>
2 2015. ROYAL DUTCH SHELL PLC <data.frame [4 x 3]>
3 2015. BP PLC <data.frame [1 x 3]>
4 2016. VOLKSWAGEN AG <data.frame [3 x 3]>
5 2016. ROYAL DUTCH SHELL PLC <data.frame [4 x 3]>
6 2016. BP PLC <data.frame [1 x 3]>
或者如果您只想要一个长格式数据框,则省略 nest
行。
备注
我们假设输入是:
advisors2015 <-
structure(list(Mark = c(1L, NA, NA, 2L, NA, NA, NA, 3L),
Company.name = structure(c(3L,
3L, 3L, 2L, 2L, 2L, 2L, 1L), .Label = c("BP PLC", "ROYAL DUTCH SHELL PLC",
"VOLKSWAGEN AG"), class = "factor"), Company.ID = structure(c(1L,
1L, 1L, 3L, 3L, 3L, 3L, 2L), .Label = c("DE2070000543", "GB00102498",
"GB04366849"), class = "factor"), Advisor.Name = structure(c(6L,
8L, 1L, 5L, 7L, 4L, 3L, 2L), .Label = c("C&L TREUARBEIT REVISION",
"CAPITA ASSET SERVICES", "ERNST & YOUNG", "KPMG ACCOUNTANTS NV",
"LLOYDS TSB REGISTRARS", "PRICEWATERHOUSECOOPERS", "PRICEWATERHOUSECOOPERS LLP",
"PWC DEUTSCHE REVISION"), class = "factor")),
class = "data.frame", row.names = c(NA, -8L))
advisors2015 <- advisors2016 <- as.table(as.matrix(advisors2015))
我有几个与公司顾问关系的数据框,一个对应感兴趣的一年。
例如,2015 年的数据框如下所示。我们称它为 advisors2015(然后我还有 advisors2014、advisors2013、advisors2012 等等):
> advisors2015
[,1] [,2] [,3] [,4]
colnam "Mark" "Company.name" "Company.ID" "Advisor.Name"
row1 "1" "VOLKSWAGEN AG" "DE2070000543" "PRICEWATERHOUSECOOPERS"
row2 " " "VOLKSWAGEN AG" "DE2070000543" "PWC DEUTSCHE REVISION"
row3 " " "VOLKSWAGEN AG" "DE2070000543" "C&L TREUARBEIT REVISION"
row4 "2" "ROYAL DUTCH SHELL PLC" "GB04366849" "LLOYDS TSB REGISTRARS"
row4 "2" "ROYAL DUTCH SHELL PLC" "GB04366849" "LLOYDS TSB REGISTRARS"
row5 " " "ROYAL DUTCH SHELL PLC" "GB04366849" "PRICEWATERHOUSECOOPERS"
row6 " " "ROYAL DUTCH SHELL PLC" "GB04366849" "KPMG ACCOUNTANTS NV"
row7 " " "ROYAL DUTCH SHELL PLC" "GB04366849" "ERNST & YOUNG"
row8 "3" "BP PLC" "GB00102498" "CAPITA ASSET SERVICES"
And this is for 2014:
> advisors2014
[,1] [,2] [,3] [,4]
colnam "Mark" "Company.name" "Company.ID" "Advisor.Name"
row1 "1" "VOLKSWAGEN AG" "DE2070000543" "PRICEWATERHOUSECOOPERS"
row2 " " "VOLKSWAGEN AG" "DE2070000543" "PWC DEUTSCHE REVISION"
row3 " " "VOLKSWAGEN AG" "DE2070000543" "C&L TREUARBEIT REVISION"
row4 "2" "ROYAL DUTCH SHELL PLC" "GB04366849" "LLOYDS TSB REGISTRARS"
row5 " " "ROYAL DUTCH SHELL PLC" "GB04366849" "PRICEWATERHOUSECOOPERS"
row6 "3" "BP PLC" "GB00102498" "CAPITA ASSET SERVICES"
row7 "4" "COCACOLA" "GB111222333" " "
如您所见,每家公司可能有一名或多名顾问。当然它们也可能随着时间的推移而改变:今年(这意味着在这个数据框中)大众汽车有 3 名顾问,但明年它可能只有一个,或者用其他人替换其中一些。
为了跟踪所有这些变化,我想要一个数据框,在其中为每个 company/year 观察保存顾问列表。
我知道我们可以使用 nest
函数来做到这一点,但据我了解,它用于从 相同数据帧 中的列创建列表,而我有多个数据帧,比如说 10 个,就像上面的那样。
谁能帮我解决这个问题?非常感谢。
如果您要查找其中列为 year
、Company.name
的单个数据框和一个包含列表的列,其中每个元素都是一个包含当年行的数据框Company.name 然后:
library(dplyr)
library(purrr)
library(tidyr)
ls(pattern = "^advisors\d{4}$", envir = .GlobalEnv) %>%
mget(envir = .GlobalEnv) %>%
map_dfr(as.data.frame.matrix, .id = "year") %>%
mutate(year = sub("advisors", "", year) %>% as.numeric) %>%
nest(-c(year, Company.name))
给予:
# A tibble: 6 x 3
year Company.name data
<dbl> <fct> <list>
1 2015. VOLKSWAGEN AG <data.frame [3 x 3]>
2 2015. ROYAL DUTCH SHELL PLC <data.frame [4 x 3]>
3 2015. BP PLC <data.frame [1 x 3]>
4 2016. VOLKSWAGEN AG <data.frame [3 x 3]>
5 2016. ROYAL DUTCH SHELL PLC <data.frame [4 x 3]>
6 2016. BP PLC <data.frame [1 x 3]>
或者如果您只想要一个长格式数据框,则省略 nest
行。
备注
我们假设输入是:
advisors2015 <-
structure(list(Mark = c(1L, NA, NA, 2L, NA, NA, NA, 3L),
Company.name = structure(c(3L,
3L, 3L, 2L, 2L, 2L, 2L, 1L), .Label = c("BP PLC", "ROYAL DUTCH SHELL PLC",
"VOLKSWAGEN AG"), class = "factor"), Company.ID = structure(c(1L,
1L, 1L, 3L, 3L, 3L, 3L, 2L), .Label = c("DE2070000543", "GB00102498",
"GB04366849"), class = "factor"), Advisor.Name = structure(c(6L,
8L, 1L, 5L, 7L, 4L, 3L, 2L), .Label = c("C&L TREUARBEIT REVISION",
"CAPITA ASSET SERVICES", "ERNST & YOUNG", "KPMG ACCOUNTANTS NV",
"LLOYDS TSB REGISTRARS", "PRICEWATERHOUSECOOPERS", "PRICEWATERHOUSECOOPERS LLP",
"PWC DEUTSCHE REVISION"), class = "factor")),
class = "data.frame", row.names = c(NA, -8L))
advisors2015 <- advisors2016 <- as.table(as.matrix(advisors2015))