计算从干预 R 开始的百分比

Question

我有一个数据框，其中包含一组运动员在 8 周时间段内的垂直跳跃能力。一名运动员只完成了七个星期。我想计算一个名为 "percent.change" 的新变量，它计算研究期间每一周与第一周的百分比差异。我一直在尝试使用 dplyr 来解决这个问题，但我被卡住了。我想知道是否有人有直接的解决方案。

数据框称为weeklyPower。 weeklyPower 的示例如下：

athlt week     power
E      1       25.20015
E      2       25.54569
E      3       24.52463
E      4       24.88044
E      5       25.11421
E      6       25.86154
E      7       26.08613
E      8       25.90775
K      1       29.74277
K      2       28.80131
K      3       28.96818
K      4       29.62439
K      5       29.98119
K      6       29.11570
K      7       29.96380
T      1       25.02413
T      2       23.75867
T      3       25.25716
T      4       24.73285
T      5       27.02891
T      6       25.60140
T      7       25.64665
T      8       24.38937

非常感谢您的任何想法。

马特

Answer 1

> Data$percent_change <-  unlist(
+     tapply(Data$power, Data$athlt, function(x) c(NA, 100*x[-1]/x[1]) )
+ )
> Data
   athlt week    power percent_change
1      E    1 25.20015             NA
2      E    2 25.54569      101.37118
3      E    3 24.52463       97.31938
4      E    4 24.88044       98.73132
5      E    5 25.11421       99.65897
6      E    6 25.86154      102.62455
7      E    7 26.08613      103.51577
8      E    8 25.90775      102.80792
9      K    1 29.74277             NA
10     K    2 28.80131       96.83466
11     K    3 28.96818       97.39570
12     K    4 29.62439       99.60199
13     K    5 29.98119      100.80161
14     K    6 29.11570       97.89169
15     K    7 29.96380      100.74314
16     T    1 25.02413             NA
17     T    2 23.75867       94.94304
18     T    3 25.25716      100.93122
19     T    4 24.73285       98.83600
20     T    5 27.02891      108.01139
21     T    6 25.60140      102.30685
22     T    7 25.64665      102.48768
23     T    8 24.38937       97.46341

Answer 2

使用 dplyr 你可以：

df %>% 
  group_by(athlt) %>% 
  arrange(week) %>% 
  mutate(cp = power / first(power) * 100)

给出：

#Source: local data frame [23 x 4]
#Groups: athlt

#   athlt week    power        cp
#1      E    1 25.20015 100.00000
#2      E    2 25.54569 101.37118
#3      E    3 24.52463  97.31938
#4      E    4 24.88044  98.73132
#5      E    5 25.11421  99.65897
#6      E    6 25.86154 102.62455
#7      E    7 26.08613 103.51577
#8      E    8 25.90775 102.80792
#9      K    1 29.74277 100.00000
#10     K    2 28.80131  96.83466

或另一个选项：

df %>% 
  group_by(athlt) %>% 
  mutate(cp = power / power[which.min(week)] * 100)

Answer 3

这里有一个与 data.table 类似的选项，我们将 data.frame 转换为 data.table (setDT(df))，对于每个 "athlt" 组，我们order by "week" 并通过引用快速赋值 (:=) 创建列 'cp'。

library(data.table)
setDT(df)[order(week), cp := power/power[1L]*100 ,by=athlt]
#   athlt week    power        cp
#1:     E    1 25.20015 100.00000
#2:     E    2 25.54569 101.37118
#3:     E    3 24.52463  97.31938
#4:     E    4 24.88044  98.73132
#5:     E    5 25.11421  99.65897
#6:     E    6 25.86154 102.62455

我们也可以使用 setorder，这通常是内存高效的，因为它通过引用重新排序数据集。但是，在这种情况下（正如评论中提到的@Arun），上述方法也非常有效，因为 order 只计算索引而不是重新排序整个数据集。

 setorder(setDT(df),athlt,week)[, cp:= power/power[1L] *100, athlt][]

或者如果 "week" 是数字，你可以使用 which.min 而不用 order

 setDT(df)[, cp := power/power[which.min(week)]*100, by=athlt]

Answer 4

抱歉我迟来的回答，但我发现这很有用，尤其是使用参考期。希望这有帮助。

df %>% group_by(athlt) %>%
      arrange(athlt, week) %>% # first arrange data to get series and then count power per athlete and week
         mutate(wk.growth = round(power/lag(power,1)*100, 1),  # creat weekly delta based on the week before
               ref.growth = round((power / first(power)*100), 1),    # creat weekly delta based on the reference week
               n.week = length(athlt))   %>% 
                  replace_na(list(wk.growth=100)) %>%          # replace NA with 100 to calculate changes
                    mutate(d.growth =  wk.growth - first(wk.growth), # add change rates
                           d.ref.growth = ref.growth - first(ref.growth) # add change rates with reference week
                                                                     )

计算从干预 R 开始的百分比

calculate a percentage from the start of an intervention in R

r

dplyr