如何为 R 函数构建等效的 countif？

Question

我注意到有人问过类似的问题，但我很难对这个功能进行故障排除，因为它不工作。我正在尝试在 r 中创建一个 countif 函数。我有一些关于地震震级的数据，我已经创建了数据箱（从 2 到 8 的序列，增量为 0.1），我想看看有多少地震大于或等于我的箱值。

这是我的数据，是地震震级。我在我的函数中调用这个 qdta$mag 因为它是一个来自更大数据框的变量。我刚刚制作了这个片段供大家测试。

qdta = sample(seq(0,8,.05),500, replace = T)

这是我的“数据仓”，我的函数的目的是计算有多少地震大于或等于我的仓值（2、2.1、2.2、2.3、2.4 等）。然后，我创建了值列来存储计数。

L = as.data.frame(seq(2,8,.1))
L$value = 0

这是我的函数 - 函数运行s，就像我在创建时没有收到错误一样，但它没有运行正确，这意味着计数值没有存储。

#creating the number of loops
loop1 = dim(L)[1]
loop2 = dim(qdta)[1]

#creating my function

#1. I want the function to 
#A. Look at the z cell of qdta$mag (start with first number)
#then check if its bigger than the first cell in first column of x
# if it is, then add a +1 to the value, if not, leave as is. 

#Do this loop however many times I say in loop2 (the size of the qdta), 
#then move to the next i (the next bin value in the L dataframe)

countf = function(x){

  for(i in loop1){
    for(z in loop2){
    
    x[i,2] = ifelse(qdta$mag[z] >= x[i,1],
                    x[i,2] + 1,
                    x[i,2])
    }
  }
}

countf(L)

Answer 1

查看我的更改，尤其是在澄清哪个变量是哪个方面。

最大的项目是您将循环大小（loop1、loop2）视为一个列表，但只有一个数字（例如，500），所以循环有效运行一旦“for i = 500”，更改为 1:loop1，这是你没有得到输出的主要原因。

Data = data.frame(mag=sample(seq(0,8,.05),500, replace = T))

L = data.frame(magbin = seq(2,8,.1),
               value = 0)
                                   
loop1 = dim(L)[1]
loop2 = dim(Data)[1]

  for(i in 1:loop1){
    for(z in 1:loop2){
      L$value[i] <- ifelse(Data$mag[z] >= L$magbin[i], (L$value[i] + 1), L$value[i])
    }
  }

   magbin value
1     2.0   370
2     2.1   365
3     2.2   356
4     2.3   347
5     2.4   344
6     2.5   332
...

Answer 2

进一步思考这个问题，想放弃嵌套循环。

怀疑有一个优雅的 apply 或 purrr 方法。

感谢这个答案 - - 我们使用 base::outer 将函数 >= 应用于 x 和 y 的每个组合。

L$value2 <- colSums(outer(Data$mag, L$magbin, FUN = ">="))

   magbin value value2
1     2.0   374    374
2     2.1   365    365
3     2.2   358    358
4     2.3   356    356
5     2.4   351    351
...

将大循环从 500 条记录更改为 5000 条记录，并且运行 microbenchmark

Unit: milliseconds
         expr         min          lq        mean      median          uq        max neval
  loop_method 3209.485572 3283.103802 3716.386542 3380.223922 3757.661975 5758.78067   100
 outer_method    1.994086    2.082549    2.372194    2.179313    2.280101    5.73027   100

如何为 R 函数构建等效的 countif？

How to build countif equivalent for R Function?

r

function

nested-loops