用于将特定单元格填充到数据框(大型数据集)中的循环
For loop for filling specific cells into data frame (large dataset)
版本 R 版本 4.0.5 (2021-03-31)
os Windows 10 x64
系统 x86_64, mingw32
uiRStudio
语言 (EN)
整理 English_United Kingdom.1252
ctype English_United Kingdom.1252
tzEurope/London
日期 2021-08-08
大家好,
我正在尝试计算从 excel 导入的数据框中的一些变量,但我遗漏了一些我似乎找不到的东西。我想这是一个非常具体的情况,因为我在 youtube 教程中搜索“for 循环”、Whosebug posts 和 Google,总的来说,到目前为止没有帮助。因此,我认为 posting 是我从更有经验的程序员那里寻找解决方案的最后手段。
我有一个包含 1392 行和多列的数据集:
> summary(twente_1)
Player.Number Playing.Position Date Week Training.type Total.Distance
Min. : 1.00 Length:3192 Length:3192 Min. : 1.0 Length:3192 Min. : 0
1st Qu.: 3.75 Class :character Class :character 1st Qu.:10.0 Class :character 1st Qu.: 0
Median : 7.00 Mode :character Mode :character Median :19.5 Mode :character Median : 3669
Mean : 9.25 Mean :19.5 Mean : 3757
3rd Qu.:15.50 3rd Qu.:29.0 3rd Qu.: 5500
Max. :19.00 Max. :38.0 Max. :19226
NA's :3180 NA's :2736
HSR SD High.Intensity.Actions..acc.dec. Player.Load SUM.Weekly.Total.Distance
Min. : 0 Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0
1st Qu.: 0 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.:21304
Median : 22 Median : 0.00 Median :12.00 Median : 89.0 Median :27969
Mean : 123 Mean : 23.21 Mean :15.18 Mean : 191.2 Mean :26298
3rd Qu.: 168 3rd Qu.: 20.00 3rd Qu.:24.00 3rd Qu.: 240.0 3rd Qu.:32727
Max. :1590 Max. :475.00 Max. :90.00 Max. :1777.0 Max. :50194
NA's :2736
SUM.HSR SUM.SD SUM.ACC.DEC SUM.Player.Load Daily.Mean St.Deviation
Min. : 0.0 Min. : 0.00 Min. : 0.0 Min. : 0 Mode:logical Mode:logical
1st Qu.: 552.0 1st Qu.: 57.25 1st Qu.: 67.0 1st Qu.: 876 NA's:3192 NA's:3192
Median : 843.5 Median :142.00 Median :104.0 Median :1318
Mean : 861.0 Mean :162.50 Mean :106.3 Mean :1339
3rd Qu.:1164.2 3rd Qu.:235.00 3rd Qu.:147.0 3rd Qu.:1799
Max. :3504.0 Max. :711.00 Max. :259.0 Max. :3373
NA's :2736 NA's :2736 NA's :2736 NA's :2736
Monotony.Total.Distance Monotony.HSR Monotony.SD Monotony.High.Intensity.Actions Monotony.Player.Load
Mode:logical Mode:logical Mode:logical Mode:logical Mode:logical
NA's:3192 NA's:3192 NA's:3192 NA's:3192 NA's:3192
Strain.Total.Distance Strain.HSR Strain.SD Strain.High.Intensity.Actions Strain.Player.Load
Mode:logical Mode:logical Mode:logical Mode:logical Mode:logical
NA's:3192 NA's:3192 NA's:3192 NA's:3192 NA's:3192
> head(twente_1)
Player.Number Playing.Position Date Week Training.type Total.Distance HSR SD
1 1 ED 11/08/2018 1 'OFF' 0 0 0
2 NA 12/08/2018 NA 'OFF' 0 0 0
3 NA 13/08/2018 NA 'TT' 4599 72 0
4 NA 14/08/2018 NA 'TT' 6328 213 104
5 NA 15/08/2018 NA 'TT' 5522 264 22
6 NA 16/08/2018 NA 'TT' 2873 14 0
High.Intensity.Actions..acc.dec. Player.Load SUM.Weekly.Total.Distance SUM.HSR SUM.SD SUM.ACC.DEC
1 0 0 31953 1205 298 113
2 0 0 NA NA NA NA
3 16 141 NA NA NA NA
4 25 362 NA NA NA NA
5 15 283 NA NA NA NA
6 16 66 NA NA NA NA
SUM.Player.Load Daily.Mean St.Deviation Monotony.Total.Distance Monotony.HSR Monotony.SD
1 1843 NA NA NA NA NA
2 NA NA NA NA NA NA
3 NA NA NA NA NA NA
4 NA NA NA NA NA NA
5 NA NA NA NA NA NA
6 NA NA NA NA NA NA
Monotony.High.Intensity.Actions Monotony.Player.Load Strain.Total.Distance Strain.HSR Strain.SD
1 NA NA NA NA NA
2 NA NA NA NA NA
3 NA NA NA NA NA
4 NA NA NA NA NA
5 NA NA NA NA NA
6 NA NA NA NA NA
Strain.High.Intensity.Actions Strain.Player.Load player_load_sd
1 NA NA NA
2 NA NA NA
3 NA NA NA
4 NA NA NA
5 NA NA NA
6 NA NA NA
我想创建计算一些新变量并将它们存储在特定的单元格中。比如我要求每周的Standard Deviation(一共1392行,也就是456周)
我“想出了”手动执行的代码:
twente_1$player_load_sd[1] = sd(twente_1$Player.Load[1:7])
twente_1$player_load_sd[2] = sd(twente_1$Player.Load[8:14])
twente_1$player_load_sd[3] = sd(twente_1$Player.Load[15:21])
twente_1$player_load_sd[4] = sd(twente_1$Player.Load[22:28])
twente_1$player_load_sd[5] = sd(twente_1$Player.Load[29:35])
twente_1$player_load_sd[6] = sd(twente_1$Player.Load[36:42])
twente_1$player_load_sd[7] = sd(twente_1$Player.Load[43:49])
twente_1$player_load_sd[8] = sd(twente_1$Player.Load[50:56])
twente_1$player_load_sd[9] = sd(twente_1$Player.Load[57:63])
twente_1$player_load_sd[10] = sd(twente_1$Player.Load[64:70])
我确定我可以用“for 循环”来做到这一点,但我无法成功。我试过下面的代码,但它给了我 NAs:
x <- 1
y <- 7
for (i in 1:456) {
twente_1$player_load_sd[i] = sd(twente_1$Player.Load[x:y])
x <- x+7
y <- y+7
}
提前感谢您的宝贵时间和帮助。
我将 1) 创建一个周变量,2) 按周对数据集进行分组,以及 3) 使用分组的数据集计算每周的 SD,而不是 for 循环。这就是它的样子:
这是一个包含 10 周数据的示例数据集。 (如果您还没有安装 tidyverse 库。)
library(tidyverse)
df <- tibble(
day = 1:70,
x = runif(70, 0, 100)
)
首先,让我们通过将行分为 7 组来创建一个周变量。
df <-
df %>%
mutate(
week = rep(1:(nrow(df)/7), each = 7)
)
接下来,按周对数据集进行分组并计算x的标准差。最后别忘了取消分组!
df <-
df %>%
group_by(week) %>%
mutate(week_sd = sd(x)) %>%
ungroup()
我们可以查看前 14 天(即两周)以查看每周的 SD 是否保存在每一行中。
head(df, 14)
如果您想要一个每周一行的新数据集,您可以改为分组和汇总:
df_week <-
df %>%
group_by(week) %>%
summarize(week_sd = sd(x)) %>%
ungroup()
df_week
版本 R 版本 4.0.5 (2021-03-31)
os Windows 10 x64
系统 x86_64, mingw32
uiRStudio
语言 (EN)
整理 English_United Kingdom.1252
ctype English_United Kingdom.1252
tzEurope/London
日期 2021-08-08
大家好,
我正在尝试计算从 excel 导入的数据框中的一些变量,但我遗漏了一些我似乎找不到的东西。我想这是一个非常具体的情况,因为我在 youtube 教程中搜索“for 循环”、Whosebug posts 和 Google,总的来说,到目前为止没有帮助。因此,我认为 posting 是我从更有经验的程序员那里寻找解决方案的最后手段。
我有一个包含 1392 行和多列的数据集:
> summary(twente_1)
Player.Number Playing.Position Date Week Training.type Total.Distance
Min. : 1.00 Length:3192 Length:3192 Min. : 1.0 Length:3192 Min. : 0
1st Qu.: 3.75 Class :character Class :character 1st Qu.:10.0 Class :character 1st Qu.: 0
Median : 7.00 Mode :character Mode :character Median :19.5 Mode :character Median : 3669
Mean : 9.25 Mean :19.5 Mean : 3757
3rd Qu.:15.50 3rd Qu.:29.0 3rd Qu.: 5500
Max. :19.00 Max. :38.0 Max. :19226
NA's :3180 NA's :2736
HSR SD High.Intensity.Actions..acc.dec. Player.Load SUM.Weekly.Total.Distance
Min. : 0 Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0
1st Qu.: 0 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.:21304
Median : 22 Median : 0.00 Median :12.00 Median : 89.0 Median :27969
Mean : 123 Mean : 23.21 Mean :15.18 Mean : 191.2 Mean :26298
3rd Qu.: 168 3rd Qu.: 20.00 3rd Qu.:24.00 3rd Qu.: 240.0 3rd Qu.:32727
Max. :1590 Max. :475.00 Max. :90.00 Max. :1777.0 Max. :50194
NA's :2736
SUM.HSR SUM.SD SUM.ACC.DEC SUM.Player.Load Daily.Mean St.Deviation
Min. : 0.0 Min. : 0.00 Min. : 0.0 Min. : 0 Mode:logical Mode:logical
1st Qu.: 552.0 1st Qu.: 57.25 1st Qu.: 67.0 1st Qu.: 876 NA's:3192 NA's:3192
Median : 843.5 Median :142.00 Median :104.0 Median :1318
Mean : 861.0 Mean :162.50 Mean :106.3 Mean :1339
3rd Qu.:1164.2 3rd Qu.:235.00 3rd Qu.:147.0 3rd Qu.:1799
Max. :3504.0 Max. :711.00 Max. :259.0 Max. :3373
NA's :2736 NA's :2736 NA's :2736 NA's :2736
Monotony.Total.Distance Monotony.HSR Monotony.SD Monotony.High.Intensity.Actions Monotony.Player.Load
Mode:logical Mode:logical Mode:logical Mode:logical Mode:logical
NA's:3192 NA's:3192 NA's:3192 NA's:3192 NA's:3192
Strain.Total.Distance Strain.HSR Strain.SD Strain.High.Intensity.Actions Strain.Player.Load
Mode:logical Mode:logical Mode:logical Mode:logical Mode:logical
NA's:3192 NA's:3192 NA's:3192 NA's:3192 NA's:3192
> head(twente_1)
Player.Number Playing.Position Date Week Training.type Total.Distance HSR SD
1 1 ED 11/08/2018 1 'OFF' 0 0 0
2 NA 12/08/2018 NA 'OFF' 0 0 0
3 NA 13/08/2018 NA 'TT' 4599 72 0
4 NA 14/08/2018 NA 'TT' 6328 213 104
5 NA 15/08/2018 NA 'TT' 5522 264 22
6 NA 16/08/2018 NA 'TT' 2873 14 0
High.Intensity.Actions..acc.dec. Player.Load SUM.Weekly.Total.Distance SUM.HSR SUM.SD SUM.ACC.DEC
1 0 0 31953 1205 298 113
2 0 0 NA NA NA NA
3 16 141 NA NA NA NA
4 25 362 NA NA NA NA
5 15 283 NA NA NA NA
6 16 66 NA NA NA NA
SUM.Player.Load Daily.Mean St.Deviation Monotony.Total.Distance Monotony.HSR Monotony.SD
1 1843 NA NA NA NA NA
2 NA NA NA NA NA NA
3 NA NA NA NA NA NA
4 NA NA NA NA NA NA
5 NA NA NA NA NA NA
6 NA NA NA NA NA NA
Monotony.High.Intensity.Actions Monotony.Player.Load Strain.Total.Distance Strain.HSR Strain.SD
1 NA NA NA NA NA
2 NA NA NA NA NA
3 NA NA NA NA NA
4 NA NA NA NA NA
5 NA NA NA NA NA
6 NA NA NA NA NA
Strain.High.Intensity.Actions Strain.Player.Load player_load_sd
1 NA NA NA
2 NA NA NA
3 NA NA NA
4 NA NA NA
5 NA NA NA
6 NA NA NA
我想创建计算一些新变量并将它们存储在特定的单元格中。比如我要求每周的Standard Deviation(一共1392行,也就是456周)
我“想出了”手动执行的代码:
twente_1$player_load_sd[1] = sd(twente_1$Player.Load[1:7])
twente_1$player_load_sd[2] = sd(twente_1$Player.Load[8:14])
twente_1$player_load_sd[3] = sd(twente_1$Player.Load[15:21])
twente_1$player_load_sd[4] = sd(twente_1$Player.Load[22:28])
twente_1$player_load_sd[5] = sd(twente_1$Player.Load[29:35])
twente_1$player_load_sd[6] = sd(twente_1$Player.Load[36:42])
twente_1$player_load_sd[7] = sd(twente_1$Player.Load[43:49])
twente_1$player_load_sd[8] = sd(twente_1$Player.Load[50:56])
twente_1$player_load_sd[9] = sd(twente_1$Player.Load[57:63])
twente_1$player_load_sd[10] = sd(twente_1$Player.Load[64:70])
我确定我可以用“for 循环”来做到这一点,但我无法成功。我试过下面的代码,但它给了我 NAs:
x <- 1
y <- 7
for (i in 1:456) {
twente_1$player_load_sd[i] = sd(twente_1$Player.Load[x:y])
x <- x+7
y <- y+7
}
提前感谢您的宝贵时间和帮助。
我将 1) 创建一个周变量,2) 按周对数据集进行分组,以及 3) 使用分组的数据集计算每周的 SD,而不是 for 循环。这就是它的样子:
这是一个包含 10 周数据的示例数据集。 (如果您还没有安装 tidyverse 库。)
library(tidyverse)
df <- tibble(
day = 1:70,
x = runif(70, 0, 100)
)
首先,让我们通过将行分为 7 组来创建一个周变量。
df <-
df %>%
mutate(
week = rep(1:(nrow(df)/7), each = 7)
)
接下来,按周对数据集进行分组并计算x的标准差。最后别忘了取消分组!
df <-
df %>%
group_by(week) %>%
mutate(week_sd = sd(x)) %>%
ungroup()
我们可以查看前 14 天(即两周)以查看每周的 SD 是否保存在每一行中。
head(df, 14)
如果您想要一个每周一行的新数据集,您可以改为分组和汇总:
df_week <-
df %>%
group_by(week) %>%
summarize(week_sd = sd(x)) %>%
ungroup()
df_week