从大数据 table 提取列到小数据 tables 并保存在列表中
Extract columns from big data table to small data tables and save in a list
我从外部服务器获得数据table(不同产品的时间序列取决于日期),它可以有以下最大列数(日期总是第一列,所有其他列可以存在或不存在,或者只有两个额外的列,或其他):
set.seed(123)
dt.data <- data.table(date = seq(as.Date('2020-01-01'), by = '1 day', length.out = 365),
'DEB Cal-2019' = rnorm(365, 2, 1), 'DEB Cal-2021' = rnorm(365, 2, 1),
'DEB Cal-2022' = rnorm(365, 2, 1), 'DEB Cal-2023' = rnorm(365, 2, 1),
'ATB Cal-2019' = rnorm(365, 2, 1), 'ATB Cal-2021' = rnorm(365, 2, 1),
'ATB Cal-2022' = rnorm(365, 2, 1), 'ATB Cal-2023' = rnorm(365, 2, 1),
'TTF Cal-2019' = rnorm(365, 2, 1), 'TTF Cal-2021' = rnorm(365, 2, 1),
'TTF Cal-2022' = rnorm(365, 2, 1), 'TTF Cal-2023' = rnorm(365, 2, 1),
'NCG Cal-2019' = rnorm(365, 2, 1), 'NCG Cal-2021' = rnorm(365, 2, 1),
'NCG Cal-2022' = rnorm(365, 2, 1), 'NCG Cal-2023' = rnorm(365, 2, 1),
'AUTVTP Cal-2019' = rnorm(365, 2, 1), 'AUTVTP Cal-2021' = rnorm(365, 2, 1),
'AUTVTP Cal-2022' = rnorm(365, 2, 1), 'AUTVTP Cal-2023' = rnorm(365, 2, 1),
'ATW Cal-2019' = rnorm(365, 2, 1), 'ATW Cal-2021' = rnorm(365, 2, 1),
'ATW Cal-2022' = rnorm(365, 2, 1), 'ATW Cal-2023' = rnorm(365, 2, 1),
'BRN Cal-2019' = rnorm(365, 2, 1), 'BRN Cal-2021' = rnorm(365, 2, 1),
'BRN Cal-2022' = rnorm(365, 2, 1), 'BRN Cal-2023' = rnorm(365, 2, 1),
'FEUA MDEC1' = rnorm(365, 2, 1),
check.names = FALSE)
现在我想在其自己的数据中保存/提取每个出现的带有日期列的列 table。理想情况下,然后将所有提取的数据 table 添加到列表中。我知道我应该用 for 循环以某种方式执行此操作,但我无法解决它。
在我收到每个产品的单独数据 tables 之后,我必须对每个数据 tables 执行以下操作(示例数据 table 现在是此处用于 AUTVTP Cal-2022
):
DT <- data.table(date = seq(as.Date('2020-01-01'), by = '1 day', length.out = 365),
'AUTVTP Cal-2022' = rnorm(365, 2, 1), check.names = FALSE)
DT <- DT %>%
mutate(month = format(date, '%b'),
date = format(date, '%d')) %>%
tidyr::pivot_wider(names_from = date, values_from = 'AUTVTP Cal-2022') %>%
relocate(`01`, .after = month)
## Calculate monthly and quarterly mean values: ##
DT <- setDT(DT)[, monthAvg := rowMeans(.SD, na.rm = TRUE), .SDcols = -1]
DT <- DT[, quartAvg := mean(monthAvg), ceiling(seq_len(nrow(DT))/3)]
DT <- DT[, yearAvg := mean(monthAvg), ceiling(seq_len(nrow(DT))/12)]
## Round all values of the data table to 2 digits: ##
DT <- DT %>% mutate_if(is.numeric, round, 2)
我该怎么做?
重塑为长格式,然后拆分。
split(
melt(dt.data, id.vars = "date"),
by = "variable", keep.by = FALSE)
然后您可以使用 lapply
遍历列表并执行您的 tidyverse 代码执行的任何操作。
但是,通常您不应该拆分 data.table。它效率低下,而且通常没有必要。
编辑:
我建议你忘记拆分。将您的代码包装在这样的函数中:
foo <- function(DT, colname) {
DT <- DT[, c("date", colname), with = FALSE]
DT <- DT %>%
mutate(month = format(date, '%b'),
date = format(date, '%d')) %>%
tidyr::pivot_wider(names_from = date, values_from = colname) %>%
relocate(`01`, .after = month)
## Calculate monthly and quarterly mean values: ##
DT <- setDT(DT)[, monthAvg := rowMeans(.SD, na.rm = TRUE), .SDcols = -1]
DT <- DT[, quartAvg := mean(monthAvg), ceiling(seq_len(nrow(DT))/3)]
DT <- DT[, yearAvg := mean(monthAvg), ceiling(seq_len(nrow(DT))/12)]
## Round all values of the data table to 2 digits: ##
DT %>% mutate_if(is.numeric, round, 2)
}
然后,当您需要 table 用于闪亮应用中的特定列时,您可以简单地调用此函数:
foo(dt.data, 'DEB Cal-2019')
如果你坚持pre-computing列表:
lapply(names(dt.data)[names(dt.data) != "date"],
foo, DT = dt.data)
使用 split.default
和每个列表的第一列 cbind
创建数据帧列表。
lapply(split.default(dt.data[, -1], names(dt.data[, -1])), cbind, dt.data[, 1])
我从外部服务器获得数据table(不同产品的时间序列取决于日期),它可以有以下最大列数(日期总是第一列,所有其他列可以存在或不存在,或者只有两个额外的列,或其他):
set.seed(123)
dt.data <- data.table(date = seq(as.Date('2020-01-01'), by = '1 day', length.out = 365),
'DEB Cal-2019' = rnorm(365, 2, 1), 'DEB Cal-2021' = rnorm(365, 2, 1),
'DEB Cal-2022' = rnorm(365, 2, 1), 'DEB Cal-2023' = rnorm(365, 2, 1),
'ATB Cal-2019' = rnorm(365, 2, 1), 'ATB Cal-2021' = rnorm(365, 2, 1),
'ATB Cal-2022' = rnorm(365, 2, 1), 'ATB Cal-2023' = rnorm(365, 2, 1),
'TTF Cal-2019' = rnorm(365, 2, 1), 'TTF Cal-2021' = rnorm(365, 2, 1),
'TTF Cal-2022' = rnorm(365, 2, 1), 'TTF Cal-2023' = rnorm(365, 2, 1),
'NCG Cal-2019' = rnorm(365, 2, 1), 'NCG Cal-2021' = rnorm(365, 2, 1),
'NCG Cal-2022' = rnorm(365, 2, 1), 'NCG Cal-2023' = rnorm(365, 2, 1),
'AUTVTP Cal-2019' = rnorm(365, 2, 1), 'AUTVTP Cal-2021' = rnorm(365, 2, 1),
'AUTVTP Cal-2022' = rnorm(365, 2, 1), 'AUTVTP Cal-2023' = rnorm(365, 2, 1),
'ATW Cal-2019' = rnorm(365, 2, 1), 'ATW Cal-2021' = rnorm(365, 2, 1),
'ATW Cal-2022' = rnorm(365, 2, 1), 'ATW Cal-2023' = rnorm(365, 2, 1),
'BRN Cal-2019' = rnorm(365, 2, 1), 'BRN Cal-2021' = rnorm(365, 2, 1),
'BRN Cal-2022' = rnorm(365, 2, 1), 'BRN Cal-2023' = rnorm(365, 2, 1),
'FEUA MDEC1' = rnorm(365, 2, 1),
check.names = FALSE)
现在我想在其自己的数据中保存/提取每个出现的带有日期列的列 table。理想情况下,然后将所有提取的数据 table 添加到列表中。我知道我应该用 for 循环以某种方式执行此操作,但我无法解决它。
在我收到每个产品的单独数据 tables 之后,我必须对每个数据 tables 执行以下操作(示例数据 table 现在是此处用于 AUTVTP Cal-2022
):
DT <- data.table(date = seq(as.Date('2020-01-01'), by = '1 day', length.out = 365),
'AUTVTP Cal-2022' = rnorm(365, 2, 1), check.names = FALSE)
DT <- DT %>%
mutate(month = format(date, '%b'),
date = format(date, '%d')) %>%
tidyr::pivot_wider(names_from = date, values_from = 'AUTVTP Cal-2022') %>%
relocate(`01`, .after = month)
## Calculate monthly and quarterly mean values: ##
DT <- setDT(DT)[, monthAvg := rowMeans(.SD, na.rm = TRUE), .SDcols = -1]
DT <- DT[, quartAvg := mean(monthAvg), ceiling(seq_len(nrow(DT))/3)]
DT <- DT[, yearAvg := mean(monthAvg), ceiling(seq_len(nrow(DT))/12)]
## Round all values of the data table to 2 digits: ##
DT <- DT %>% mutate_if(is.numeric, round, 2)
我该怎么做?
重塑为长格式,然后拆分。
split(
melt(dt.data, id.vars = "date"),
by = "variable", keep.by = FALSE)
然后您可以使用 lapply
遍历列表并执行您的 tidyverse 代码执行的任何操作。
但是,通常您不应该拆分 data.table。它效率低下,而且通常没有必要。
编辑:
我建议你忘记拆分。将您的代码包装在这样的函数中:
foo <- function(DT, colname) {
DT <- DT[, c("date", colname), with = FALSE]
DT <- DT %>%
mutate(month = format(date, '%b'),
date = format(date, '%d')) %>%
tidyr::pivot_wider(names_from = date, values_from = colname) %>%
relocate(`01`, .after = month)
## Calculate monthly and quarterly mean values: ##
DT <- setDT(DT)[, monthAvg := rowMeans(.SD, na.rm = TRUE), .SDcols = -1]
DT <- DT[, quartAvg := mean(monthAvg), ceiling(seq_len(nrow(DT))/3)]
DT <- DT[, yearAvg := mean(monthAvg), ceiling(seq_len(nrow(DT))/12)]
## Round all values of the data table to 2 digits: ##
DT %>% mutate_if(is.numeric, round, 2)
}
然后,当您需要 table 用于闪亮应用中的特定列时,您可以简单地调用此函数:
foo(dt.data, 'DEB Cal-2019')
如果你坚持pre-computing列表:
lapply(names(dt.data)[names(dt.data) != "date"],
foo, DT = dt.data)
使用 split.default
和每个列表的第一列 cbind
创建数据帧列表。
lapply(split.default(dt.data[, -1], names(dt.data[, -1])), cbind, dt.data[, 1])