如何仅获取 NON DISTINCT 时间戳的列值的中值
How to take median of column values for NON DISTINCT time stamps only
我正在尝试清理一些即时报价数据。我的数据是长格式的。当我将它转换为宽屏时,它显示
Error: Duplicate identifiers for rows
。时间列有几天的时间戳。 SYM 列具有许多股票的股票代码。这是我的示例数据:
dput(jojo)
structure(list(Time = structure(c(1459481850, 1459481850, 1459482302,
1459482305, 1459482305, 1459482307, 1459482307, 1459482309, 1459482312,
1459482312, 1459482314, 1459482314, 1459482316, 1459482316, 1459482317,
1459482317, 1459482318, 1459482319, 1459482319, 1459482320), class = c("POSIXct",
"POSIXt"), tzone = "Asia/Calcutta"), PRICE = c(1371.25, 1371.25,
1373.95, 1373, 1373, 1373.95, 1373.95, 1373.9, 1374, 1374, 1374.15,
1374.15, 1374, 1374, 1373.85, 1373.85, 1372.55, 1374.05, 1374.05,
1374.15), SIZE = c(39, 58, 5, 4, 7, 20, 5, 10, 21, 179, 10, 100,
98, 78, 14, 11, 30, 10, 11, 39), SYM = c("A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B")), .Names = c("Time", "PRICE", "SIZE", "SYM"), row.names = c(NA,
20L), class = "data.frame")
我需要先找到相同的时间戳,然后为这些时间戳取 PRICE 和 SIZE 的中值,并将这些相同的时间戳行替换为包含数据集中 PRICE 和 SIZE 的中值的单行。但是我的代码总结了整个列而不是股票代码的相同时间戳行。这是我的尝试:
#Cleaning duplicate time stamps
tt<- jojo %>%group_by(SYM )%>% summarise(Time = ifelse(n() >= 2, median, mean))
#Making wide form
tt<-spread(tt, SYM, PRICE)
我收到这个错误:
Error in eval(substitute(expr), envir, enclos) : Not a vector
请提出更正建议。如果不用高频包也能清扫就好了
您需要选择是要使用 dplyr
还是 xts
范式。他们在一起玩得不好,主要是因为 dplyr
期望 data.frames 和 xts
对象是矩阵。 dplyr
还屏蔽了 stats::lag
泛型,这会阻止方法分派(例如,顶层的 运行 lag(.xts(1,1))
不会执行您期望的操作)。
要使用 xts
范式解决此问题:
# create a function to convert to xts and take medians of the two columns
unDuplicate <- function(x) {
# create xts object
X <- xts(x[,c("PRICE","SIZE")], x[,"Time"])
# set column names so they will be unique in wide format
colnames(X) <- paste(colnames(X), x[1,"SYM"], sep = ".")
# function to take median of each column
colMedian <- function(obj, ...) {
apply(obj, 2, median, ...)
}
# aggregate by seconds
period.apply(X, endpoints(X, "seconds"), colMedian)
}
# now you can call the function on each symbol, then merge the results
do.call(merge, lapply(split(jojo, jojo$SYM), unDuplicate))
我正在尝试清理一些即时报价数据。我的数据是长格式的。当我将它转换为宽屏时,它显示
Error: Duplicate identifiers for rows
。时间列有几天的时间戳。 SYM 列具有许多股票的股票代码。这是我的示例数据:
dput(jojo)
structure(list(Time = structure(c(1459481850, 1459481850, 1459482302,
1459482305, 1459482305, 1459482307, 1459482307, 1459482309, 1459482312,
1459482312, 1459482314, 1459482314, 1459482316, 1459482316, 1459482317,
1459482317, 1459482318, 1459482319, 1459482319, 1459482320), class = c("POSIXct",
"POSIXt"), tzone = "Asia/Calcutta"), PRICE = c(1371.25, 1371.25,
1373.95, 1373, 1373, 1373.95, 1373.95, 1373.9, 1374, 1374, 1374.15,
1374.15, 1374, 1374, 1373.85, 1373.85, 1372.55, 1374.05, 1374.05,
1374.15), SIZE = c(39, 58, 5, 4, 7, 20, 5, 10, 21, 179, 10, 100,
98, 78, 14, 11, 30, 10, 11, 39), SYM = c("A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B")), .Names = c("Time", "PRICE", "SIZE", "SYM"), row.names = c(NA,
20L), class = "data.frame")
我需要先找到相同的时间戳,然后为这些时间戳取 PRICE 和 SIZE 的中值,并将这些相同的时间戳行替换为包含数据集中 PRICE 和 SIZE 的中值的单行。但是我的代码总结了整个列而不是股票代码的相同时间戳行。这是我的尝试:
#Cleaning duplicate time stamps
tt<- jojo %>%group_by(SYM )%>% summarise(Time = ifelse(n() >= 2, median, mean))
#Making wide form
tt<-spread(tt, SYM, PRICE)
我收到这个错误:
Error in eval(substitute(expr), envir, enclos) : Not a vector
请提出更正建议。如果不用高频包也能清扫就好了
您需要选择是要使用 dplyr
还是 xts
范式。他们在一起玩得不好,主要是因为 dplyr
期望 data.frames 和 xts
对象是矩阵。 dplyr
还屏蔽了 stats::lag
泛型,这会阻止方法分派(例如,顶层的 运行 lag(.xts(1,1))
不会执行您期望的操作)。
要使用 xts
范式解决此问题:
# create a function to convert to xts and take medians of the two columns
unDuplicate <- function(x) {
# create xts object
X <- xts(x[,c("PRICE","SIZE")], x[,"Time"])
# set column names so they will be unique in wide format
colnames(X) <- paste(colnames(X), x[1,"SYM"], sep = ".")
# function to take median of each column
colMedian <- function(obj, ...) {
apply(obj, 2, median, ...)
}
# aggregate by seconds
period.apply(X, endpoints(X, "seconds"), colMedian)
}
# now you can call the function on each symbol, then merge the results
do.call(merge, lapply(split(jojo, jojo$SYM), unDuplicate))