根据最终条件更改列值 - 但按前一周的 ID 分组

Question

正在尝试弄清楚如何编写一些简单的代码。

我有一个数据集，其中包含我实验中个体（小型无脊椎动物）随着时间的推移的观察结果，包括周、个体的 ID # 和感兴趣的观察数据（寄生虫计数）。随着时间的推移，我还有寄生虫计数的累计总数，按个人 ID 分组，这是我每周实际想要的。

我想放弃那些在实验结束时从未观察到寄生虫阳性样本的个体，因为他们没有被成功感染。我的计划是有一个二元指标列，根据每个个体 id 的最终累积总数（它是可能一个人可以在一周内提供阳性样本，但在下一周就不能，因此累计总数为 0 更安全）。然后我会简单地通过正二进制列对数据进行子集化，删除从未为正的个人。

我的数据框的一个非常简化的版本看起来像：

time = c(rep(1,4),rep(2,4),rep(3,4),rep(4,4))
ids = rep(c(101:104),4)
observations = c(rep(c(25,25,0,0),4))
df = data.frame(cbind(time,ids,observations))

df2 = df %>%
  group_by(ids) %>%
  mutate(cumtot = cumsum(observations))
df2

    time   ids observations cumtot
   <dbl> <dbl>        <dbl>  <dbl>
 1     1   101           25     25
 2     1   102           25     25
 3     1   103            0      0
 4     1   104            0      0
 5     2   101           25     50
 6     2   102           25     50
 7     2   103            0      0
 8     2   104            0      0
 9     3   101           25     75
10     3   102           25     75
11     3   103            0      0
12     3   104            0      0
13     4   101           25    100
14     4   102           25    100
15     4   103            0      0
16     4   104            0      0

（我最终会将这些数据按周和治疗组汇总到 means/SEMs。）

我目前所做的尝试创建了一个二进制“感染”列，但只识别了第 14 周累积总和为 0 的个体。 我想要的是代码将此二进制结果应用到每周的所有个人 ID（以便我从每周的汇总数据中删除该个人）。不知道该怎么做那...

# Make a column that indicates if a snail has not shed by experiment end
df_dropped = df2 %>%
  group_by(ids) %>%
  mutate(infected = ifelse(time==max(time)&cumtot==0, 0,1))
df_dropped

    time   ids observations cumtot infected
   <dbl> <dbl>        <dbl>  <dbl>    <dbl>
 1     1   101           25     25        1
 2     1   102           25     25        1
 3     1   103            0      0        1
 4     1   104            0      0        1
 5     2   101           25     50        1
 6     2   102           25     50        1
 7     2   103            0      0        1
 8     2   104            0      0        1
 9     3   101           25     75        1
10     3   102           25     75        1
11     3   103            0      0        1
12     3   104            0      0        1
13     4   101           25    100        1
14     4   102           25    100        1
15     4   103            0      0        0
16     4   104            0      0        0

我希望输出为：

    time   ids observations cumtot infected
   <dbl> <dbl>        <dbl>  <dbl>    <dbl>
 1     1   101           25     25        1
 2     1   102           25     25        1
 3     1   103            0      0        0
 4     1   104            0      0        0
 5     2   101           25     50        1
 6     2   102           25     50        1
 7     2   103            0      0        0
 8     2   104            0      0        0
 9     3   101           25     75        1
10     3   102           25     75        1
11     3   103            0      0        0
12     3   104            0      0        0
13     4   101           25    100        1
14     4   102           25    100        1
15     4   103            0      0        0
16     4   104            0      0        0

谢谢。

Answer 1

你可以直接使用 any():

library(tidyverse)

df_dropped <- df2 %>%
  group_by(ids) %>%
  mutate(infected = as.numeric(any(observations > 0)))

df_dropped
#> # A tibble: 16 x 5
#> # Groups:   ids [4]
#>     time   ids observations cumtot infected
#>    <dbl> <dbl>        <dbl>  <dbl>    <dbl>
#>  1     1   101           25     25        1
#>  2     1   102           25     25        1
#>  3     1   103            0      0        0
#>  4     1   104            0      0        0
#>  5     2   101           25     50        1
#>  6     2   102           25     50        1
#>  7     2   103            0      0        0
#>  8     2   104            0      0        0
#>  9     3   101           25     75        1
#> 10     3   102           25     75        1
#> 11     3   103            0      0        0
#> 12     3   104            0      0        0
#> 13     4   101           25    100        1
#> 14     4   102           25    100        1
#> 15     4   103            0      0        0
#> 16     4   104            0      0        0

^{由 reprex package (v2.0.1)}

于 2022-02-28 创建

根据最终条件更改列值 - 但按前一周的 ID 分组

Change column value based on final condition- but groups by previous week's IDs

r

tidyverse