R 分组循环 seq_along 长度 - 1?
R grouped for loop seq_along length - 1?
我正在尝试创建一个指标来表示项目被接受所需的“尝试”次数。我认为 for 循环是可行的方法,但我在 R 中没有大量的循环经验,而且逻辑有点复杂。任何 help/advice/feedback 将不胜感激!
在玩具示例中,“接受”是“C”,如果提交 (A) 被重置 (B) 或提交 (A) 被接受 (C),则迭代“尝试”向前的开关。
在一个组内,如果事件顺序为A > B或A > C,则将“try”向前迭代1。否则,“try”计数应保持不变。显然,“真实”示例比这个玩具示例复杂得多。
目前,我只是试图让尝试计数正确,而不用担心分组。
我不确定如何限制 seq_along 停止在本质上 [group_by %>% length(group) - 1]。有更好的选择吗?
df = data.frame(group = c(1,1,1,1,1,2,2,2,2),
event = c("A","B","A","A","C","A","B","A","C"))
df$try <- 0
for (i in seq_along(df$event)){
if (df$event[[i]] == "A" &
df$event[[i+1]] %in% c("B", "C"))
{
df$try[[i]] <- df$try + 1
} else {
df$try[[i]] <- df$try
}
}
# this essentially shows the correct answer (win = try + 1, loss = try),
# but has "df$event[[i + 1]] : subscript out of bounds",
# and I need to save the outcome so I can access later
df$try <- 0
for (i in seq_along(df$event)){
if (df$event[[i]] == "A" &
df$event[[i+1]] %in% c("B", "C"))
{
print("Win")
} else {
print("Loss")
}
}
我对玩具示例的预期(最终)答案是:try = c(1,1,1,2,2,1,1,2,2);每组 1 和 2 需要 2 次“尝试”才能被接受
你可以通过添加一个if来解决“下标越界”的问题。
if(i+1 > nrow(df){
print('do nothing')
} else if (
#followed by your original code
)
我假设如果是最后一行,该值将仅为 0。所以另一个 if 应该做这个技巧。
library(tidyverse)
df <- data.frame(group = c(1,1,1,1,1,2,2,2,2),
event = c("A","B","A","A","C","A","B","A","C"))
temp <- data.frame(NULL)
for(i in 1:nrow(df)){
if(i+1 > nrow(df)){
print('This is the last row')
temp <- rbind(temp, 0)
} else if(df$event[[i]] == 'A' &
df$event[[i+1]] %in% c('B', 'C'))
{
temp <- rbind(temp, 1)
} else {
temp <- rbind(temp, 0)
}
}
df2 <- cbind(df, temp) %>%
mutate(
cumulative_sum = cumsum(X1)
)
这似乎目前有效:
如果 i + 1 超过长度则添加一个“中断”
df$try <- 0
for (i in seq_along(df$event)){
if (i+1 == length(df$event)){
break
} else if (df$event[[i]] == "A" &
df$event[[i+1]] %in% c("B", "C"))
{
print("Win")
} else (
print("Loss")
)
}
# updated toy df to show N tries differs:
df = data.frame(group = c(1,1,1,1,1,1,1,2,2,2,2),
event = c("A","B","A","A","B","A","C","A","B","A","C"))
df$try <- 0
for (i in seq_along(df$event)){
if (i == length(df$event)){ # use i otherwise it doesn't catch the last switch
break
} else if (df$event[[i]] == "A" &
df$event[[i+1]] %in% c("B", "C"))
{
df$try[[i]] <- + 1
} else (
df$try[[i]]
)
}
df %>%
group_by(group) %>%
mutate(N_tries = max(cumsum(try)))
您可以使用 lead
获取 dplyr
中的下一个值。试试这个 -
library(dplyr)
df %>%
group_by(group) %>%
mutate(result = cumsum(event == 'A' & lead(event) %in% c('B', 'C'))) %>%
ungroup
# group event try result
# <dbl> <chr> <dbl> <int>
#1 1 A 1 1
#2 1 B 1 1
#3 1 A 1 1
#4 1 A 2 2
#5 1 C 2 2
#6 2 A 1 1
#7 2 B 1 1
#8 2 A 2 2
#9 2 C 2 2
保留输出中的 try
列以供比较。
我正在尝试创建一个指标来表示项目被接受所需的“尝试”次数。我认为 for 循环是可行的方法,但我在 R 中没有大量的循环经验,而且逻辑有点复杂。任何 help/advice/feedback 将不胜感激!
在玩具示例中,“接受”是“C”,如果提交 (A) 被重置 (B) 或提交 (A) 被接受 (C),则迭代“尝试”向前的开关。
在一个组内,如果事件顺序为A > B或A > C,则将“try”向前迭代1。否则,“try”计数应保持不变。显然,“真实”示例比这个玩具示例复杂得多。
目前,我只是试图让尝试计数正确,而不用担心分组。
我不确定如何限制 seq_along 停止在本质上 [group_by %>% length(group) - 1]。有更好的选择吗?
df = data.frame(group = c(1,1,1,1,1,2,2,2,2),
event = c("A","B","A","A","C","A","B","A","C"))
df$try <- 0
for (i in seq_along(df$event)){
if (df$event[[i]] == "A" &
df$event[[i+1]] %in% c("B", "C"))
{
df$try[[i]] <- df$try + 1
} else {
df$try[[i]] <- df$try
}
}
# this essentially shows the correct answer (win = try + 1, loss = try),
# but has "df$event[[i + 1]] : subscript out of bounds",
# and I need to save the outcome so I can access later
df$try <- 0
for (i in seq_along(df$event)){
if (df$event[[i]] == "A" &
df$event[[i+1]] %in% c("B", "C"))
{
print("Win")
} else {
print("Loss")
}
}
我对玩具示例的预期(最终)答案是:try = c(1,1,1,2,2,1,1,2,2);每组 1 和 2 需要 2 次“尝试”才能被接受
你可以通过添加一个if来解决“下标越界”的问题。
if(i+1 > nrow(df){
print('do nothing')
} else if (
#followed by your original code
)
我假设如果是最后一行,该值将仅为 0。所以另一个 if 应该做这个技巧。
library(tidyverse)
df <- data.frame(group = c(1,1,1,1,1,2,2,2,2),
event = c("A","B","A","A","C","A","B","A","C"))
temp <- data.frame(NULL)
for(i in 1:nrow(df)){
if(i+1 > nrow(df)){
print('This is the last row')
temp <- rbind(temp, 0)
} else if(df$event[[i]] == 'A' &
df$event[[i+1]] %in% c('B', 'C'))
{
temp <- rbind(temp, 1)
} else {
temp <- rbind(temp, 0)
}
}
df2 <- cbind(df, temp) %>%
mutate(
cumulative_sum = cumsum(X1)
)
这似乎目前有效:
如果 i + 1 超过长度则添加一个“中断”
df$try <- 0
for (i in seq_along(df$event)){
if (i+1 == length(df$event)){
break
} else if (df$event[[i]] == "A" &
df$event[[i+1]] %in% c("B", "C"))
{
print("Win")
} else (
print("Loss")
)
}
# updated toy df to show N tries differs:
df = data.frame(group = c(1,1,1,1,1,1,1,2,2,2,2),
event = c("A","B","A","A","B","A","C","A","B","A","C"))
df$try <- 0
for (i in seq_along(df$event)){
if (i == length(df$event)){ # use i otherwise it doesn't catch the last switch
break
} else if (df$event[[i]] == "A" &
df$event[[i+1]] %in% c("B", "C"))
{
df$try[[i]] <- + 1
} else (
df$try[[i]]
)
}
df %>%
group_by(group) %>%
mutate(N_tries = max(cumsum(try)))
您可以使用 lead
获取 dplyr
中的下一个值。试试这个 -
library(dplyr)
df %>%
group_by(group) %>%
mutate(result = cumsum(event == 'A' & lead(event) %in% c('B', 'C'))) %>%
ungroup
# group event try result
# <dbl> <chr> <dbl> <int>
#1 1 A 1 1
#2 1 B 1 1
#3 1 A 1 1
#4 1 A 2 2
#5 1 C 2 2
#6 2 A 1 1
#7 2 B 1 1
#8 2 A 2 2
#9 2 C 2 2
保留输出中的 try
列以供比较。