R - 如何索引排名并相应地显示数据框？

Question

我有一个数据框，其中列出了一些个人姓名及其以美元进行的货币交易。 table按照不同的地区列出了数据以及通过现金或信用卡进行的有效交易，例如：

X    Dist    transact.cash    transact.card
a    1       USD              USD
b    1       USD              USD

其中 X 是个人，his/her 交易一段时间，保持该时间段固定，Dist 是 he/she 居住的地区。总共有超过 4000 个观测值。每个 Dist 80-100 行。到目前为止，排序、切片和其他一切都是简单的操作，根据交易模式 dat.cash 和 dat.card 被子集化 tables；但是我在提取有关对数据集排名的信息时遇到了问题。为此，我编写了一个函数，我在其中指定了一个等级，该函数应该显示从该等级开始的那些行：

rankdat <- function(transact, numb) {
               # Truncated
                 valid.nums = c('highest', 'lowest', 1:nrow(dat.cash)) # for cash subset
                     if (transact == 'cash' && numb == 'highest') { # This is easy
                 sort <- dat.cash[order(dat.cash[, 3], decreasing = T), ]# For sorting only cash data set
                  } else if (transact == 'cash' and numb == 1:nrow(dat.cash)) { 
                 sort <- dat.cash[order(dat.cash[, 3], decreasing = T) == numb, ] } # Not getting results here
                 }

最后一行是 returning NULL 而不是排名交易及其所有行。将 == 替换为 %in% 仍然会得到 NULL 并且使用 rank() 不会改变任何东西。对于 highest 和 lowest 数字，这没什么大不了的，因为它只涉及简单的排序。如果我指定 rankdat('cash', 10)，该函数应该 return 值从第 10 个最高交易开始并递减，而不考虑 Dist，类似于：

 X    Dist    transact.cash
 b    1       10th highest
 h    2       11th highest
 p    1       12th highest
 and  so      on

Answer 1

假设您有以下 data.frame:

df=data.frame(X=c(rep('A',2),rep('B',3),rep('A',3),rep('B',2)),
               Dist=c(rep(1,5),rep(0,5)),
               transact.cash=c(rep('USD',5),rep('€',5)),
               transact.card=c(rep('USD',5),rep('€',5)))

我们得到：

   X Dist transact.cash transact.card
1  A    1           USD           USD
2  A    1           USD           USD
3  B    1           USD           USD
4  B    1           USD           USD
5  B    1           USD           USD
6  A    0             €             €
7  A    0             €             €
8  A    0             €             €
9  B    0             €             €
10 B    0             €             €

如果您想对具有多列 transact.cash 或 transact.cash 的数据框进行排序，您可以使用 Whosebug : How to sort a dataframe by column(s)。在您的示例中，您仅指定了 dat.cash，因此：

sort = df[order(df$transact.cash, decreasing=T),] # Order your dataFrame with transact.cash column

如果要提取符合特定语句的行，需要使用 which() 和 == 进行数字、双精度、逻辑匹配或 %in% 进行字符串匹配。例如：

XA = df[which(df$X %in% "A"),] # Select row by user
XDist = df[which(df$Dist == 1),] # Select row by District

最后，如果您想select排序后的前五行：

sort[1:5,] # Select first five rows
sort[1:numb,] # Select first numb rows

有了它，您可以执行一个简单的功能来轻松地从数据框中提取数据。

希望对您有所帮助

Answer 2

这个函数可以做到这一点：

rankdat <- function(df,rank.by,num=10,method="top",decreasing=T){
  # ------------------------------------------------------
  # RANKDAT
  # ------------------------------------------------------
  # ARGUMENT 
  # ========
  # df        Input dataFrame [d.f]
  # num       Selected row [num]
  # rank.by   Name of column(s) used to rank dataFrame
  # method    Method used to extract rows
  #             top - to select top rank (e.g. 10 first rows)
  #             specific - to select specific row
  # ------------------------------------------------------
  eval(parse(text=paste("sort=df[with(df,order(",rank.by,"), decreasing=",decreasing,"),]",sep=""))) # order dataFrame by 
  if(method %in% "top"){
    return(sort[1:num,])
  }else if(method %in% "specific"){
    return(sort[num,])
  }else{
    stop("Please select method used to extract data !!!")
  }
}

R - 如何索引排名并相应地显示数据框？

R - how to index rank and accordingly display a data frame?

sorting

r

ranking

dataframe