如何 运行 R 中数据框中每一行的函数?
How to run a function for each row in a dataframe in R?
我有一个数据框,其中包含不同时间段(A、B、C、D)每天的最高值和最低值。我想获得该值范围内出现频率最高的值。这是我的数据框的可重现示例:
library(modeest)
library(tidyverse)
st <- as.Date("2021-05-19")
en <- as.Date("2021-07-16")
day_month = seq(st, en, by = "1 days")
low_A = seq(1.25, 15.75, by = 0.25)
high_A = seq(2.25, 16.75, by = 0.25)
low_B = seq(0.25, 14.75, by = 0.25)
high_B = seq(0.50, 15, by = 0.25)
low_C = seq(1.25, 15.75, by = 0.25)
high_C = seq(2.25, 16.75, by = 0.25)
low_D = seq(0.75, 15.25, by = 0.25)
high_D = seq(2.25, 16.75, by = 0.25)
df <- data.frame(day_month, high_A, low_A, high_B, low_B, high_C, low_C, high_D, low_D)
鉴于我每天都有一个值范围,价格触及当天最高价和最低价之间的每个值。因此,对于给定的一天,假设我的数据框的最后一天是 2021-07-16,最高值为 16.75,而当天的最低值为 14.75。
tail(df)
day_month high_A low_A high_B low_B high_C low_C high_D low_D
54 2021-07-11 15.50 14.50 13.75 13.50 15.50 14.50 15.50 14.00
55 2021-07-12 15.75 14.75 14.00 13.75 15.75 14.75 15.75 14.25
56 2021-07-13 16.00 15.00 14.25 14.00 16.00 15.00 16.00 14.50
57 2021-07-14 16.25 15.25 14.50 14.25 16.25 15.25 16.25 14.75
58 2021-07-15 16.50 15.50 14.75 14.50 16.50 15.50 16.50 15.00
59 2021-07-16 16.75 15.75 15.00 14.75 16.75 15.75 16.75 15.25
所以价格在那个价值范围之间。如果我使用 0.25 点的间隔,价格是 运行 从 16.75、16.50、16.25、16.00 ...15.00、14.75。每次都有一个值范围,对于时间 A,我的数据框最后一天的范围是 15.50、15.25、15.00、14.75、14.50。对于时间 B,最后一天的范围是 13.75、13.50,依此类推。
我想要的是使用这些值范围找到当天的模式(或最频繁的值)。所以我创建了这个函数:
poc_2 <- function(df){
# create the sequence of each period
x_A <- with(df, seq(low_A, high_A, by = 0.25))
x_B <- with(df, seq(low_B, high_B, by = 0.25))
x_C <- with(df, seq(low_C, high_C, by = 0.25))
x_D <- with(df, seq(low_D, high_D, by = 0.25))
# the range has different lenght so I use this to make all range of the same length with NA values
n <- max(length(x_A), length(x_B), length(x_C), length(x_D))
length(x_A) <- n
length(x_B) <- n
length(x_C) <- n
length(x_D) <- n
pf <- cbind(x_A, x_B,x_C, x_D)
xfd <- data.frame(pf)
# I change the format of my data frame so I can calculate the mode of all values
long <- xfd %>% gather(x, value, x_A:x_D)
# delete the NA values that are given by the change of length
long <- na.omit(long)
# get the mode of the last value
return(last(mfv(long$value)))
}
此代码有效,returns 仅一行的预期值:
poc_2(df[59,])
[1] 16.75
这给出了最后一天该范围的最高众数。我想为我的数据框的每一行计算这个。我尝试了几个找到的选项。
df %>% rowwise() %>% mutate(poc = poc_2())
# gives an error:
Error: Problem with `mutate()` input `poc`.
x argument "df" is missing, with no default
i Input `poc` is `poc_2()`.
i The error occurred in row 1.
我也试过:
apply(df, 1, poc_2)
Error in eval(substitute(expr), data, enclos = parent.frame()) :
invalid 'envir' argument of type 'character'
我的问题是:有什么方法可以为我的数据框的所有行获取此函数,以便我可以存储具有当天模式的新变量?
如果您想使用 poc_2
函数,您可以使用以下选项之一来实现 -
library(tidyverse)
#1. sapply split
sapply(split(df, seq(nrow(df))),poc_2)
#2. by
by(df, seq(nrow(df)),poc_2)
#3. tidyverse
df %>% group_split(row_number()) %>%map_dbl(poc_2)
#4.
df %>% rowwise() %>% mutate(poc = poc_2(cur_data()))
但是,为什么不使用 here 中的模式功能并将其应用于每一天?
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
df %>%
pivot_longer(cols = -day_month) %>%
group_by(day_month) %>%
summarise(frequent_value = Mode(value))
# day_month frequent_value
# <date> <dbl>
# 1 2021-05-19 2.25
# 2 2021-05-20 2.5
# 3 2021-05-21 2.75
# 4 2021-05-22 3
# 5 2021-05-23 3.25
# 6 2021-05-24 3.5
# 7 2021-05-25 3.75
# 8 2021-05-26 4
# 9 2021-05-27 4.25
#10 2021-05-28 4.5
# … with 49 more rows
或使用rowwise
-
df %>%
rowwise() %>%
mutate(frequent_value = Mode(c_across(-day_month))) %>%
ungroup
我有一个数据框,其中包含不同时间段(A、B、C、D)每天的最高值和最低值。我想获得该值范围内出现频率最高的值。这是我的数据框的可重现示例:
library(modeest)
library(tidyverse)
st <- as.Date("2021-05-19")
en <- as.Date("2021-07-16")
day_month = seq(st, en, by = "1 days")
low_A = seq(1.25, 15.75, by = 0.25)
high_A = seq(2.25, 16.75, by = 0.25)
low_B = seq(0.25, 14.75, by = 0.25)
high_B = seq(0.50, 15, by = 0.25)
low_C = seq(1.25, 15.75, by = 0.25)
high_C = seq(2.25, 16.75, by = 0.25)
low_D = seq(0.75, 15.25, by = 0.25)
high_D = seq(2.25, 16.75, by = 0.25)
df <- data.frame(day_month, high_A, low_A, high_B, low_B, high_C, low_C, high_D, low_D)
鉴于我每天都有一个值范围,价格触及当天最高价和最低价之间的每个值。因此,对于给定的一天,假设我的数据框的最后一天是 2021-07-16,最高值为 16.75,而当天的最低值为 14.75。
tail(df)
day_month high_A low_A high_B low_B high_C low_C high_D low_D
54 2021-07-11 15.50 14.50 13.75 13.50 15.50 14.50 15.50 14.00
55 2021-07-12 15.75 14.75 14.00 13.75 15.75 14.75 15.75 14.25
56 2021-07-13 16.00 15.00 14.25 14.00 16.00 15.00 16.00 14.50
57 2021-07-14 16.25 15.25 14.50 14.25 16.25 15.25 16.25 14.75
58 2021-07-15 16.50 15.50 14.75 14.50 16.50 15.50 16.50 15.00
59 2021-07-16 16.75 15.75 15.00 14.75 16.75 15.75 16.75 15.25
所以价格在那个价值范围之间。如果我使用 0.25 点的间隔,价格是 运行 从 16.75、16.50、16.25、16.00 ...15.00、14.75。每次都有一个值范围,对于时间 A,我的数据框最后一天的范围是 15.50、15.25、15.00、14.75、14.50。对于时间 B,最后一天的范围是 13.75、13.50,依此类推。 我想要的是使用这些值范围找到当天的模式(或最频繁的值)。所以我创建了这个函数:
poc_2 <- function(df){
# create the sequence of each period
x_A <- with(df, seq(low_A, high_A, by = 0.25))
x_B <- with(df, seq(low_B, high_B, by = 0.25))
x_C <- with(df, seq(low_C, high_C, by = 0.25))
x_D <- with(df, seq(low_D, high_D, by = 0.25))
# the range has different lenght so I use this to make all range of the same length with NA values
n <- max(length(x_A), length(x_B), length(x_C), length(x_D))
length(x_A) <- n
length(x_B) <- n
length(x_C) <- n
length(x_D) <- n
pf <- cbind(x_A, x_B,x_C, x_D)
xfd <- data.frame(pf)
# I change the format of my data frame so I can calculate the mode of all values
long <- xfd %>% gather(x, value, x_A:x_D)
# delete the NA values that are given by the change of length
long <- na.omit(long)
# get the mode of the last value
return(last(mfv(long$value)))
}
此代码有效,returns 仅一行的预期值:
poc_2(df[59,])
[1] 16.75
这给出了最后一天该范围的最高众数。我想为我的数据框的每一行计算这个。我尝试了几个找到的选项。
df %>% rowwise() %>% mutate(poc = poc_2())
# gives an error:
Error: Problem with `mutate()` input `poc`.
x argument "df" is missing, with no default
i Input `poc` is `poc_2()`.
i The error occurred in row 1.
我也试过:
apply(df, 1, poc_2)
Error in eval(substitute(expr), data, enclos = parent.frame()) :
invalid 'envir' argument of type 'character'
我的问题是:有什么方法可以为我的数据框的所有行获取此函数,以便我可以存储具有当天模式的新变量?
如果您想使用 poc_2
函数,您可以使用以下选项之一来实现 -
library(tidyverse)
#1. sapply split
sapply(split(df, seq(nrow(df))),poc_2)
#2. by
by(df, seq(nrow(df)),poc_2)
#3. tidyverse
df %>% group_split(row_number()) %>%map_dbl(poc_2)
#4.
df %>% rowwise() %>% mutate(poc = poc_2(cur_data()))
但是,为什么不使用 here 中的模式功能并将其应用于每一天?
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
df %>%
pivot_longer(cols = -day_month) %>%
group_by(day_month) %>%
summarise(frequent_value = Mode(value))
# day_month frequent_value
# <date> <dbl>
# 1 2021-05-19 2.25
# 2 2021-05-20 2.5
# 3 2021-05-21 2.75
# 4 2021-05-22 3
# 5 2021-05-23 3.25
# 6 2021-05-24 3.5
# 7 2021-05-25 3.75
# 8 2021-05-26 4
# 9 2021-05-27 4.25
#10 2021-05-28 4.5
# … with 49 more rows
或使用rowwise
-
df %>%
rowwise() %>%
mutate(frequent_value = Mode(c_across(-day_month))) %>%
ungroup