计算各自历史排名

Question

我想计算一些水果的价格与之前的每日价格相比的排名。

最近几天价格如下past.csv。

"Product","3/06/2018","3/05/2018","3/04/2018","3/03/2018"
"Apple",1.3,1.2,1.2,1.3
"Orange",1.3,1.4,1.6,1.7
"Kiwi",0.8,0.9,1.0,1.2
"Banana",0.6,0.8,0.9,1.0

目前价格在current.csv.

以下

"Day","Product","Price"
"3/07/2018","Apple",1.4
"3/07/2018","Orange",1.5
"3/07/2018","Kiwi",1.1
"3/07/2018","Banana",0.7

看完两个CSV文件后，我复制了当前价格

past.df <- read.csv(file="past.csv", header=TRUE, check.names=FALSE)
current.df <- read.csv(file="current.csv", header=TRUE, check.names=FALSE)
past.df$"Price" <- current.df$"Price"[match(past.df$"Product", current.df$"Product")]

计算相关子集的 ECDF

before.last.col <- ncol(past.df) - 1
past.df$"Rank" <- ecdf(past.df[,2:before.last.col])(current$"Price")

然后复制回来

current.df$"Rank" <- past.df$"Rank"[match(current.df$"Product", past.df$"Product")]

我期待列 c(1.0, 0.5, 0.75, 0.25)。

我错过了什么？

Answer 1

问题出在您定义 past.df$"Rank" 的方式上。基本上，您将 ecdf 视为多元经验 cdf，而它只是单变量。为此，我们需要逐行应用ecdf。例如，

past.df$"Rank" <- sapply(1:nrow(past.df), function(x) 
  ecdf(unlist(past.df[x, 2:before.last.col]))(current.df$"Price"[x]))

这导致

current.df$"Rank"
# [1] 1.00 0.50 0.75 0.25

计算各自历史排名

Computing rank in respective histories

r

distribution

rank