如何 运行 R 中数据框中每一行的函数?

How to run a function for each row in a dataframe in R?

我有一个数据框,其中包含不同时间段(A、B、C、D)每天的最高值和最低值。我想获得该值范围内出现频率最高的值。这是我的数据框的可重现示例:

library(modeest)
library(tidyverse)

st <- as.Date("2021-05-19")
en <- as.Date("2021-07-16")

day_month = seq(st, en, by = "1 days")

low_A = seq(1.25, 15.75, by = 0.25)
high_A = seq(2.25, 16.75, by = 0.25)
low_B = seq(0.25, 14.75, by = 0.25)
high_B = seq(0.50, 15, by = 0.25)
low_C = seq(1.25, 15.75, by = 0.25)
high_C = seq(2.25, 16.75, by = 0.25)
low_D = seq(0.75, 15.25, by = 0.25)
high_D = seq(2.25, 16.75, by = 0.25)

df <- data.frame(day_month, high_A, low_A, high_B, low_B, high_C, low_C, high_D, low_D)

鉴于我每天都有一个值范围,价格触及当天最高价和最低价之间的每个值。因此,对于给定的一天,假设我的数据框的最后一天是 2021-07-16,最高值为 16.75,而当天的最低值为 14.75。

tail(df)
    day_month high_A low_A high_B low_B high_C low_C high_D low_D
54 2021-07-11  15.50 14.50  13.75 13.50  15.50 14.50  15.50 14.00
55 2021-07-12  15.75 14.75  14.00 13.75  15.75 14.75  15.75 14.25
56 2021-07-13  16.00 15.00  14.25 14.00  16.00 15.00  16.00 14.50
57 2021-07-14  16.25 15.25  14.50 14.25  16.25 15.25  16.25 14.75
58 2021-07-15  16.50 15.50  14.75 14.50  16.50 15.50  16.50 15.00
59 2021-07-16  16.75 15.75  15.00 14.75  16.75 15.75  16.75 15.25

所以价格在那个价值范围之间。如果我使用 0.25 点的间隔,价格是 运行 从 16.75、16.50、16.25、16.00 ...15.00、14.75。每次都有一个值范围,对于时间 A,我的数据框最后一天的范围是 15.50、15.25、15.00、14.75、14.50。对于时间 B,最后一天的范围是 13.75、13.50,依此类推。 我想要的是使用这些值范围找到当天的模式(或最频繁的值)。所以我创建了这个函数:

poc_2 <- function(df){
# create the sequence of each period
  x_A <- with(df, seq(low_A, high_A, by = 0.25))
  x_B <- with(df, seq(low_B, high_B, by = 0.25))
  x_C <- with(df, seq(low_C, high_C, by = 0.25))
  x_D <- with(df, seq(low_D, high_D, by = 0.25))

# the range has different lenght so I use this to make all range of the same length with  NA values  
  n <- max(length(x_A), length(x_B), length(x_C), length(x_D))
  
  length(x_A) <- n
  length(x_B) <- n
  length(x_C) <- n
  length(x_D) <- n
  
  pf <- cbind(x_A, x_B,x_C, x_D)
  
  xfd <- data.frame(pf)

# I change the format of my data frame so I can calculate the mode of all values

  long <- xfd %>% gather(x, value, x_A:x_D)

# delete the NA values that are given by the change of length

  long <- na.omit(long)

# get the mode of the last value

  return(last(mfv(long$value)))
}

此代码有效,returns 仅一行的预期值:

poc_2(df[59,])
[1] 16.75

这给出了最后一天该范围的最高众数。我想为我的数据框的每一行计算这个。我尝试了几个找到的选项。

df %>% rowwise() %>% mutate(poc = poc_2())

# gives an error:
Error: Problem with `mutate()` input `poc`.
x argument "df" is missing, with no default
i Input `poc` is `poc_2()`.
i The error occurred in row 1.

我也试过:

apply(df, 1, poc_2)

Error in eval(substitute(expr), data, enclos = parent.frame()) : 
  invalid 'envir' argument of type 'character'

我的问题是:有什么方法可以为我的数据框的所有行获取此函数,以便我可以存储具有当天模式的新变量?

如果您想使用 poc_2 函数,您可以使用以下选项之一来实现 -

library(tidyverse)

#1. sapply split
sapply(split(df, seq(nrow(df))),poc_2) 

#2. by
by(df, seq(nrow(df)),poc_2) 

#3. tidyverse
df %>% group_split(row_number()) %>%map_dbl(poc_2)

#4.
df %>% rowwise() %>% mutate(poc = poc_2(cur_data()))

但是,为什么不使用 here 中的模式功能并将其应用于每一天?

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

df %>%
  pivot_longer(cols = -day_month) %>%
  group_by(day_month) %>%
  summarise(frequent_value = Mode(value)) 

#   day_month  frequent_value
#   <date>              <dbl>
# 1 2021-05-19           2.25
# 2 2021-05-20           2.5 
# 3 2021-05-21           2.75
# 4 2021-05-22           3   
# 5 2021-05-23           3.25
# 6 2021-05-24           3.5 
# 7 2021-05-25           3.75
# 8 2021-05-26           4   
# 9 2021-05-27           4.25
#10 2021-05-28           4.5 
# … with 49 more rows

或使用rowwise -

df %>%
  rowwise() %>%
  mutate(frequent_value = Mode(c_across(-day_month))) %>%
  ungroup