一个 DF 的数据帧列表;从列表中的每个 DF 中提取日期列,将所有值传递给带有 1985-2017 列的单个 DF)
List of dataframes to one DF; Extract date column from each DF in list, pass all values to single DF w/ columns 1985-2017)
我有一个包含 169 个数据框 (assetcount_dfs) 的列表,对应于地理网格上的正方形,每个正方形都包含一捆资产。我想填写一个单独的数据框,计算 1985-2017 年每个日期开始的资产数量。
以下是此数据帧列表的结构:
Square1_DF (3 rows/assets) | x | y | dates char[1989, N/A, 1991]
...
Square169_DF (1 row/asset) | x | y | dates char[2002]
我想将其转换为一个计算这些日期的数据框,在“dateDF”中:
| 1989 | 1990 | ... | 2015 | 2016 | 2017
Square 1 0 1 3 2 0
...
Square 169 0 0 0 1 3
这是我的数据的玩具样本。在 assetcount_dfs
中的每个数据框中,'val' 列代表我想要填充 dateDF 的日期:
sdf1 <- data.frame(a = c("1","4","5","1"), x = c("sdf","asf","asdf","sdf"), val = c("2014","2012","#N/A", "2001"))
sdf2 <- data.frame(a = c("1","4"), x = c("sdf","asdf"), val = c("#N/A","2011"))
sdf3 <- data.frame(a = c("1","4","5","1","1"), x = c("sdf","asf","asdf","sdf","sdf"), val = c("2010","2015","2000","2002", "2003"))
assetcount_dfs <- list(sdf1 = sdf1,sdf2 = sdf2,sdf3 = sdf3)
date_range <- 1985:2017
dateDF <- data.frame(matrix(ncol = length(date_range),nrow = 3)) # actual length is 169 rows, only using 3 for this example
colnames(dateDF) <- paste0('X',1985:2017) # name columns 'X'DATE
rownames(dateDF) <- names(assetcount_dfs)
dateDF[] <- 0 # filled with zeroes
当前尝试
在每个数据框的 'val' 列中,我想检查是否有任何日期值在 1985-2017 范围内,如果是,将它们添加到 dateDF 的 X--- 日期列。
我尝试使用 'purr'(如 lapply)对每个 DF 进行操作,但我很难理解从这里到哪里去。
invisible(map(listx, function(df) {
for (i in df$val){
if (as.integer(i) %in% 1985:2017){
datesDF_colName <- paste0('X',i)
dateDF[substitute(df), datesDF_colName] <- dateDF[[datesDF_colName]] + 1
# Attempt to set dateDF value at [grid-square DF's name / row, Column based on Year ]
}
}}))
# Output:
# Error in `[<-.data.frame`(`*tmp*`, substitute(df), datesDF_colName, value =
# c(1, :
# anyNA() applied to non-(list or vector) of type 'language'
# Called from: `[<-.data.frame`(`*tmp*`, substitute(df), datesDF_colName,
# value = c(1,
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# Note my sample code for 'listx' for some reason generates DFs with factors, although I am currently dealing with character arrays.
我会使用 tidyverse()
来处理这个问题。与其尝试在循环中编辑 dateDF
,不如计算一年与数据框 ID 一起出现的频率,然后将数据重塑为您正在寻找的格式。
library(tidyverse)
assets2 <- assetcount_dfs %>%
# combine all the small data frames into a single big df
bind_rows(.id = 'rowdf') %>%
# toss out the N/A values so they don't get counted
filter(val != "#N/A")
simpleDateDF <- assets2 %>%
# count each year and what data frame it's from
count(rowdf, val) %>%
# spread the years out into columns, using 0 as the default
spread(val, n, fill = 0)
我有一个包含 169 个数据框 (assetcount_dfs) 的列表,对应于地理网格上的正方形,每个正方形都包含一捆资产。我想填写一个单独的数据框,计算 1985-2017 年每个日期开始的资产数量。
以下是此数据帧列表的结构:
Square1_DF (3 rows/assets) | x | y | dates char[1989, N/A, 1991]
...
Square169_DF (1 row/asset) | x | y | dates char[2002]
我想将其转换为一个计算这些日期的数据框,在“dateDF”中:
| 1989 | 1990 | ... | 2015 | 2016 | 2017
Square 1 0 1 3 2 0
...
Square 169 0 0 0 1 3
这是我的数据的玩具样本。在 assetcount_dfs
中的每个数据框中,'val' 列代表我想要填充 dateDF 的日期:
sdf1 <- data.frame(a = c("1","4","5","1"), x = c("sdf","asf","asdf","sdf"), val = c("2014","2012","#N/A", "2001"))
sdf2 <- data.frame(a = c("1","4"), x = c("sdf","asdf"), val = c("#N/A","2011"))
sdf3 <- data.frame(a = c("1","4","5","1","1"), x = c("sdf","asf","asdf","sdf","sdf"), val = c("2010","2015","2000","2002", "2003"))
assetcount_dfs <- list(sdf1 = sdf1,sdf2 = sdf2,sdf3 = sdf3)
date_range <- 1985:2017
dateDF <- data.frame(matrix(ncol = length(date_range),nrow = 3)) # actual length is 169 rows, only using 3 for this example
colnames(dateDF) <- paste0('X',1985:2017) # name columns 'X'DATE
rownames(dateDF) <- names(assetcount_dfs)
dateDF[] <- 0 # filled with zeroes
当前尝试
在每个数据框的 'val' 列中,我想检查是否有任何日期值在 1985-2017 范围内,如果是,将它们添加到 dateDF 的 X--- 日期列。
我尝试使用 'purr'(如 lapply)对每个 DF 进行操作,但我很难理解从这里到哪里去。
invisible(map(listx, function(df) {
for (i in df$val){
if (as.integer(i) %in% 1985:2017){
datesDF_colName <- paste0('X',i)
dateDF[substitute(df), datesDF_colName] <- dateDF[[datesDF_colName]] + 1
# Attempt to set dateDF value at [grid-square DF's name / row, Column based on Year ]
}
}}))
# Output:
# Error in `[<-.data.frame`(`*tmp*`, substitute(df), datesDF_colName, value =
# c(1, :
# anyNA() applied to non-(list or vector) of type 'language'
# Called from: `[<-.data.frame`(`*tmp*`, substitute(df), datesDF_colName,
# value = c(1,
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# Note my sample code for 'listx' for some reason generates DFs with factors, although I am currently dealing with character arrays.
我会使用 tidyverse()
来处理这个问题。与其尝试在循环中编辑 dateDF
,不如计算一年与数据框 ID 一起出现的频率,然后将数据重塑为您正在寻找的格式。
library(tidyverse)
assets2 <- assetcount_dfs %>%
# combine all the small data frames into a single big df
bind_rows(.id = 'rowdf') %>%
# toss out the N/A values so they don't get counted
filter(val != "#N/A")
simpleDateDF <- assets2 %>%
# count each year and what data frame it's from
count(rowdf, val) %>%
# spread the years out into columns, using 0 as the default
spread(val, n, fill = 0)