计算从干预 R 开始的百分比
calculate a percentage from the start of an intervention in R
我有一个数据框,其中包含一组运动员在 8 周时间段内的垂直跳跃能力。一名运动员只完成了七个星期。我想计算一个名为 "percent.change" 的新变量,它计算研究期间每一周与第一周的百分比差异。我一直在尝试使用 dplyr 来解决这个问题,但我被卡住了。我想知道是否有人有直接的解决方案。
数据框称为weeklyPower。 weeklyPower 的示例如下:
athlt week power
E 1 25.20015
E 2 25.54569
E 3 24.52463
E 4 24.88044
E 5 25.11421
E 6 25.86154
E 7 26.08613
E 8 25.90775
K 1 29.74277
K 2 28.80131
K 3 28.96818
K 4 29.62439
K 5 29.98119
K 6 29.11570
K 7 29.96380
T 1 25.02413
T 2 23.75867
T 3 25.25716
T 4 24.73285
T 5 27.02891
T 6 25.60140
T 7 25.64665
T 8 24.38937
非常感谢您的任何想法。
马特
> Data$percent_change <- unlist(
+ tapply(Data$power, Data$athlt, function(x) c(NA, 100*x[-1]/x[1]) )
+ )
> Data
athlt week power percent_change
1 E 1 25.20015 NA
2 E 2 25.54569 101.37118
3 E 3 24.52463 97.31938
4 E 4 24.88044 98.73132
5 E 5 25.11421 99.65897
6 E 6 25.86154 102.62455
7 E 7 26.08613 103.51577
8 E 8 25.90775 102.80792
9 K 1 29.74277 NA
10 K 2 28.80131 96.83466
11 K 3 28.96818 97.39570
12 K 4 29.62439 99.60199
13 K 5 29.98119 100.80161
14 K 6 29.11570 97.89169
15 K 7 29.96380 100.74314
16 T 1 25.02413 NA
17 T 2 23.75867 94.94304
18 T 3 25.25716 100.93122
19 T 4 24.73285 98.83600
20 T 5 27.02891 108.01139
21 T 6 25.60140 102.30685
22 T 7 25.64665 102.48768
23 T 8 24.38937 97.46341
使用 dplyr
你可以:
df %>%
group_by(athlt) %>%
arrange(week) %>%
mutate(cp = power / first(power) * 100)
给出:
#Source: local data frame [23 x 4]
#Groups: athlt
# athlt week power cp
#1 E 1 25.20015 100.00000
#2 E 2 25.54569 101.37118
#3 E 3 24.52463 97.31938
#4 E 4 24.88044 98.73132
#5 E 5 25.11421 99.65897
#6 E 6 25.86154 102.62455
#7 E 7 26.08613 103.51577
#8 E 8 25.90775 102.80792
#9 K 1 29.74277 100.00000
#10 K 2 28.80131 96.83466
或另一个选项:
df %>%
group_by(athlt) %>%
mutate(cp = power / power[which.min(week)] * 100)
这里有一个与 data.table
类似的选项,我们将 data.frame
转换为 data.table
(setDT(df)
),对于每个 "athlt" 组,我们order
by "week" 并通过引用快速赋值 (:=
) 创建列 'cp'。
library(data.table)
setDT(df)[order(week), cp := power/power[1L]*100 ,by=athlt]
# athlt week power cp
#1: E 1 25.20015 100.00000
#2: E 2 25.54569 101.37118
#3: E 3 24.52463 97.31938
#4: E 4 24.88044 98.73132
#5: E 5 25.11421 99.65897
#6: E 6 25.86154 102.62455
我们也可以使用 setorder
,这通常是内存高效的,因为它通过引用重新排序数据集。但是,在这种情况下(正如评论中提到的@Arun),上述方法也非常有效,因为 order
只计算索引而不是重新排序整个数据集。
setorder(setDT(df),athlt,week)[, cp:= power/power[1L] *100, athlt][]
或者如果 "week" 是数字,你可以使用 which.min
而不用 order
setDT(df)[, cp := power/power[which.min(week)]*100, by=athlt]
抱歉我迟来的回答,但我发现这很有用,尤其是使用参考期。希望这有帮助。
df %>% group_by(athlt) %>%
arrange(athlt, week) %>% # first arrange data to get series and then count power per athlete and week
mutate(wk.growth = round(power/lag(power,1)*100, 1), # creat weekly delta based on the week before
ref.growth = round((power / first(power)*100), 1), # creat weekly delta based on the reference week
n.week = length(athlt)) %>%
replace_na(list(wk.growth=100)) %>% # replace NA with 100 to calculate changes
mutate(d.growth = wk.growth - first(wk.growth), # add change rates
d.ref.growth = ref.growth - first(ref.growth) # add change rates with reference week
)
我有一个数据框,其中包含一组运动员在 8 周时间段内的垂直跳跃能力。一名运动员只完成了七个星期。我想计算一个名为 "percent.change" 的新变量,它计算研究期间每一周与第一周的百分比差异。我一直在尝试使用 dplyr 来解决这个问题,但我被卡住了。我想知道是否有人有直接的解决方案。
数据框称为weeklyPower。 weeklyPower 的示例如下:
athlt week power
E 1 25.20015
E 2 25.54569
E 3 24.52463
E 4 24.88044
E 5 25.11421
E 6 25.86154
E 7 26.08613
E 8 25.90775
K 1 29.74277
K 2 28.80131
K 3 28.96818
K 4 29.62439
K 5 29.98119
K 6 29.11570
K 7 29.96380
T 1 25.02413
T 2 23.75867
T 3 25.25716
T 4 24.73285
T 5 27.02891
T 6 25.60140
T 7 25.64665
T 8 24.38937
非常感谢您的任何想法。
马特
> Data$percent_change <- unlist(
+ tapply(Data$power, Data$athlt, function(x) c(NA, 100*x[-1]/x[1]) )
+ )
> Data
athlt week power percent_change
1 E 1 25.20015 NA
2 E 2 25.54569 101.37118
3 E 3 24.52463 97.31938
4 E 4 24.88044 98.73132
5 E 5 25.11421 99.65897
6 E 6 25.86154 102.62455
7 E 7 26.08613 103.51577
8 E 8 25.90775 102.80792
9 K 1 29.74277 NA
10 K 2 28.80131 96.83466
11 K 3 28.96818 97.39570
12 K 4 29.62439 99.60199
13 K 5 29.98119 100.80161
14 K 6 29.11570 97.89169
15 K 7 29.96380 100.74314
16 T 1 25.02413 NA
17 T 2 23.75867 94.94304
18 T 3 25.25716 100.93122
19 T 4 24.73285 98.83600
20 T 5 27.02891 108.01139
21 T 6 25.60140 102.30685
22 T 7 25.64665 102.48768
23 T 8 24.38937 97.46341
使用 dplyr
你可以:
df %>%
group_by(athlt) %>%
arrange(week) %>%
mutate(cp = power / first(power) * 100)
给出:
#Source: local data frame [23 x 4]
#Groups: athlt
# athlt week power cp
#1 E 1 25.20015 100.00000
#2 E 2 25.54569 101.37118
#3 E 3 24.52463 97.31938
#4 E 4 24.88044 98.73132
#5 E 5 25.11421 99.65897
#6 E 6 25.86154 102.62455
#7 E 7 26.08613 103.51577
#8 E 8 25.90775 102.80792
#9 K 1 29.74277 100.00000
#10 K 2 28.80131 96.83466
或另一个选项:
df %>%
group_by(athlt) %>%
mutate(cp = power / power[which.min(week)] * 100)
这里有一个与 data.table
类似的选项,我们将 data.frame
转换为 data.table
(setDT(df)
),对于每个 "athlt" 组,我们order
by "week" 并通过引用快速赋值 (:=
) 创建列 'cp'。
library(data.table)
setDT(df)[order(week), cp := power/power[1L]*100 ,by=athlt]
# athlt week power cp
#1: E 1 25.20015 100.00000
#2: E 2 25.54569 101.37118
#3: E 3 24.52463 97.31938
#4: E 4 24.88044 98.73132
#5: E 5 25.11421 99.65897
#6: E 6 25.86154 102.62455
我们也可以使用 setorder
,这通常是内存高效的,因为它通过引用重新排序数据集。但是,在这种情况下(正如评论中提到的@Arun),上述方法也非常有效,因为 order
只计算索引而不是重新排序整个数据集。
setorder(setDT(df),athlt,week)[, cp:= power/power[1L] *100, athlt][]
或者如果 "week" 是数字,你可以使用 which.min
而不用 order
setDT(df)[, cp := power/power[which.min(week)]*100, by=athlt]
抱歉我迟来的回答,但我发现这很有用,尤其是使用参考期。希望这有帮助。
df %>% group_by(athlt) %>%
arrange(athlt, week) %>% # first arrange data to get series and then count power per athlete and week
mutate(wk.growth = round(power/lag(power,1)*100, 1), # creat weekly delta based on the week before
ref.growth = round((power / first(power)*100), 1), # creat weekly delta based on the reference week
n.week = length(athlt)) %>%
replace_na(list(wk.growth=100)) %>% # replace NA with 100 to calculate changes
mutate(d.growth = wk.growth - first(wk.growth), # add change rates
d.ref.growth = ref.growth - first(ref.growth) # add change rates with reference week
)