计算各自历史排名
Computing rank in respective histories
我想计算一些水果的价格与之前的每日价格相比的排名。
最近几天价格如下past.csv
。
"Product","3/06/2018","3/05/2018","3/04/2018","3/03/2018"
"Apple",1.3,1.2,1.2,1.3
"Orange",1.3,1.4,1.6,1.7
"Kiwi",0.8,0.9,1.0,1.2
"Banana",0.6,0.8,0.9,1.0
目前价格在current.csv
.
以下
"Day","Product","Price"
"3/07/2018","Apple",1.4
"3/07/2018","Orange",1.5
"3/07/2018","Kiwi",1.1
"3/07/2018","Banana",0.7
看完两个CSV文件后,我复制了当前价格
past.df <- read.csv(file="past.csv", header=TRUE, check.names=FALSE)
current.df <- read.csv(file="current.csv", header=TRUE, check.names=FALSE)
past.df$"Price" <- current.df$"Price"[match(past.df$"Product", current.df$"Product")]
计算相关子集的 ECDF
before.last.col <- ncol(past.df) - 1
past.df$"Rank" <- ecdf(past.df[,2:before.last.col])(current$"Price")
然后复制回来
current.df$"Rank" <- past.df$"Rank"[match(current.df$"Product", past.df$"Product")]
我期待列 c(1.0, 0.5, 0.75, 0.25)。
我错过了什么?
问题出在您定义 past.df$"Rank"
的方式上。基本上,您将 ecdf
视为多元经验 cdf,而它只是单变量。为此,我们需要逐行应用ecdf
。例如,
past.df$"Rank" <- sapply(1:nrow(past.df), function(x)
ecdf(unlist(past.df[x, 2:before.last.col]))(current.df$"Price"[x]))
这导致
current.df$"Rank"
# [1] 1.00 0.50 0.75 0.25
我想计算一些水果的价格与之前的每日价格相比的排名。
最近几天价格如下past.csv
。
"Product","3/06/2018","3/05/2018","3/04/2018","3/03/2018"
"Apple",1.3,1.2,1.2,1.3
"Orange",1.3,1.4,1.6,1.7
"Kiwi",0.8,0.9,1.0,1.2
"Banana",0.6,0.8,0.9,1.0
目前价格在current.csv
.
"Day","Product","Price"
"3/07/2018","Apple",1.4
"3/07/2018","Orange",1.5
"3/07/2018","Kiwi",1.1
"3/07/2018","Banana",0.7
看完两个CSV文件后,我复制了当前价格
past.df <- read.csv(file="past.csv", header=TRUE, check.names=FALSE)
current.df <- read.csv(file="current.csv", header=TRUE, check.names=FALSE)
past.df$"Price" <- current.df$"Price"[match(past.df$"Product", current.df$"Product")]
计算相关子集的 ECDF
before.last.col <- ncol(past.df) - 1
past.df$"Rank" <- ecdf(past.df[,2:before.last.col])(current$"Price")
然后复制回来
current.df$"Rank" <- past.df$"Rank"[match(current.df$"Product", past.df$"Product")]
我期待列 c(1.0, 0.5, 0.75, 0.25)。
我错过了什么?
问题出在您定义 past.df$"Rank"
的方式上。基本上,您将 ecdf
视为多元经验 cdf,而它只是单变量。为此,我们需要逐行应用ecdf
。例如,
past.df$"Rank" <- sapply(1:nrow(past.df), function(x)
ecdf(unlist(past.df[x, 2:before.last.col]))(current.df$"Price"[x]))
这导致
current.df$"Rank"
# [1] 1.00 0.50 0.75 0.25