转置数据框并将列名添加为 R 中的变量
Transpose a data frame and add column names as variables in R
我有一个时间序列数据框,它很宽并且以个股数据作为列名。
我想将此数据框转换为长格式,同时不取消查看数据属于哪个代码的能力。
下面是数据。
df = structure(list(Date = structure(c(1607922000, 1608008400), class = c("POSIXct",
"POSIXt"), tzone = ""), AAPL.Close = c(0.32982465, 0.34001608
), MSFT.Close = c(0.26307234, 0.27235893), GS.Close = c(0.30742572,
0.29825025), QQQ.Close = c(0.25350002, 0.24456267)), row.names = 1:2, class = "data.frame")
Date AAPL.Close MSFT.Close GS.Close QQQ.Close
1 2020-12-14 0.3298246 0.2630723 0.3074257 0.2535000
2 2020-12-15 0.3400161 0.2723589 0.2982502 0.2445627
我希望新数据框看起来像这样。
Date Data Ticker
2020-12-14 .3298 AAPL
2020-12-15 .3400 AAPL
2020-12-14 .260 MSFT
2020-12-15 .27 MSFT
.
.
感谢您的帮助
我们可以使用 pivot_longer
library(tidyr)
library(dplyr)
df %>%
mutate(Date = as.Date(Date)) %>%
pivot_longer(cols = -Date, names_to = c("Ticker", ".value"),
names_sep = "\.") %>%
rename(Data = Close)
在过去的几天里,我多次尝试 data.table 尝试额外的 akrun 解决方案:-)
library(data.table)
library(stringr)
setDT(df)
df = melt(df, id.vars=c("Date"), variable.name = "Data", value.name="Ticker")
df[, Data:=str_replace(Data, ".Close", "")]
输出:
> df
Date Data Ticker
1: 2020-12-14 05:00:00 AAPL 0.3298246
2: 2020-12-15 05:00:00 AAPL 0.3400161
3: 2020-12-14 05:00:00 MSFT 0.2630723
4: 2020-12-15 05:00:00 MSFT 0.2723589
5: 2020-12-14 05:00:00 GS 0.3074257
6: 2020-12-15 05:00:00 GS 0.2982502
7: 2020-12-14 05:00:00 QQQ 0.2535000
8: 2020-12-15 05:00:00 QQQ 0.2445627
更新,如果您只想对日期而不是日期时间进行分组:
df = melt(df[, Date:=as.Date(Date)], id.vars=c("Date"), variable.name = "Data", value.name="Ticker")
library(data.table)
library(magrittr)
setDT(df)
melt(data = df, id.vars = "Date") %>%
.[, variable := gsub(pattern = "(.+)\.(.+)", replacement = "\1", x = variable)] %>%
.[]
Date variable value
1: 2020-12-14 08:00:00 AAPL 0.3298246
2: 2020-12-15 08:00:00 AAPL 0.3400161
3: 2020-12-14 08:00:00 MSFT 0.2630723
4: 2020-12-15 08:00:00 MSFT 0.2723589
5: 2020-12-14 08:00:00 GS 0.3074257
6: 2020-12-15 08:00:00 GS 0.2982502
7: 2020-12-14 08:00:00 QQQ 0.2535000
8: 2020-12-15 08:00:00 QQQ 0.2445627
我有一个时间序列数据框,它很宽并且以个股数据作为列名。 我想将此数据框转换为长格式,同时不取消查看数据属于哪个代码的能力。
下面是数据。
df = structure(list(Date = structure(c(1607922000, 1608008400), class = c("POSIXct",
"POSIXt"), tzone = ""), AAPL.Close = c(0.32982465, 0.34001608
), MSFT.Close = c(0.26307234, 0.27235893), GS.Close = c(0.30742572,
0.29825025), QQQ.Close = c(0.25350002, 0.24456267)), row.names = 1:2, class = "data.frame")
Date AAPL.Close MSFT.Close GS.Close QQQ.Close
1 2020-12-14 0.3298246 0.2630723 0.3074257 0.2535000
2 2020-12-15 0.3400161 0.2723589 0.2982502 0.2445627
我希望新数据框看起来像这样。
Date Data Ticker
2020-12-14 .3298 AAPL
2020-12-15 .3400 AAPL
2020-12-14 .260 MSFT
2020-12-15 .27 MSFT
.
.
感谢您的帮助
我们可以使用 pivot_longer
library(tidyr)
library(dplyr)
df %>%
mutate(Date = as.Date(Date)) %>%
pivot_longer(cols = -Date, names_to = c("Ticker", ".value"),
names_sep = "\.") %>%
rename(Data = Close)
在过去的几天里,我多次尝试 data.table 尝试额外的 akrun 解决方案:-)
library(data.table)
library(stringr)
setDT(df)
df = melt(df, id.vars=c("Date"), variable.name = "Data", value.name="Ticker")
df[, Data:=str_replace(Data, ".Close", "")]
输出:
> df
Date Data Ticker
1: 2020-12-14 05:00:00 AAPL 0.3298246
2: 2020-12-15 05:00:00 AAPL 0.3400161
3: 2020-12-14 05:00:00 MSFT 0.2630723
4: 2020-12-15 05:00:00 MSFT 0.2723589
5: 2020-12-14 05:00:00 GS 0.3074257
6: 2020-12-15 05:00:00 GS 0.2982502
7: 2020-12-14 05:00:00 QQQ 0.2535000
8: 2020-12-15 05:00:00 QQQ 0.2445627
更新,如果您只想对日期而不是日期时间进行分组:
df = melt(df[, Date:=as.Date(Date)], id.vars=c("Date"), variable.name = "Data", value.name="Ticker")
library(data.table)
library(magrittr)
setDT(df)
melt(data = df, id.vars = "Date") %>%
.[, variable := gsub(pattern = "(.+)\.(.+)", replacement = "\1", x = variable)] %>%
.[]
Date variable value
1: 2020-12-14 08:00:00 AAPL 0.3298246
2: 2020-12-15 08:00:00 AAPL 0.3400161
3: 2020-12-14 08:00:00 MSFT 0.2630723
4: 2020-12-15 08:00:00 MSFT 0.2723589
5: 2020-12-14 08:00:00 GS 0.3074257
6: 2020-12-15 08:00:00 GS 0.2982502
7: 2020-12-14 08:00:00 QQQ 0.2535000
8: 2020-12-15 08:00:00 QQQ 0.2445627