用于将特定单元格填充到数据框(大型数据集)中的循环

For loop for filling specific cells into data frame (large dataset)

版本 R 版本 4.0.5 (2021-03-31) os Windows 10 x64
系统 x86_64, mingw32
uiRStudio
语言 (EN)
整理 English_United Kingdom.1252 ctype English_United Kingdom.1252 tzEurope/London
日期 2021-08-08

大家好,

我正在尝试计算从 excel 导入的数据框中的一些变量,但我遗漏了一些我似乎找不到的东西。我想这是一个非常具体的情况,因为我在 youtube 教程中搜索“for 循环”、Whosebug posts 和 Google,总的来说,到目前为止没有帮助。因此,我认为 posting 是我从更有经验的程序员那里寻找解决方案的最后手段。

我有一个包含 1392 行和多列的数据集:

> summary(twente_1)
 Player.Number   Playing.Position       Date                Week      Training.type      Total.Distance 
 Min.   : 1.00   Length:3192        Length:3192        Min.   : 1.0   Length:3192        Min.   :    0  
 1st Qu.: 3.75   Class :character   Class :character   1st Qu.:10.0   Class :character   1st Qu.:    0  
 Median : 7.00   Mode  :character   Mode  :character   Median :19.5   Mode  :character   Median : 3669  
 Mean   : 9.25                                         Mean   :19.5                      Mean   : 3757  
 3rd Qu.:15.50                                         3rd Qu.:29.0                      3rd Qu.: 5500  
 Max.   :19.00                                         Max.   :38.0                      Max.   :19226  
 NA's   :3180                                          NA's   :2736                                     
      HSR             SD         High.Intensity.Actions..acc.dec.  Player.Load     SUM.Weekly.Total.Distance
 Min.   :   0   Min.   :  0.00   Min.   : 0.00                    Min.   :   0.0   Min.   :    0            
 1st Qu.:   0   1st Qu.:  0.00   1st Qu.: 0.00                    1st Qu.:   0.0   1st Qu.:21304            
 Median :  22   Median :  0.00   Median :12.00                    Median :  89.0   Median :27969            
 Mean   : 123   Mean   : 23.21   Mean   :15.18                    Mean   : 191.2   Mean   :26298            
 3rd Qu.: 168   3rd Qu.: 20.00   3rd Qu.:24.00                    3rd Qu.: 240.0   3rd Qu.:32727            
 Max.   :1590   Max.   :475.00   Max.   :90.00                    Max.   :1777.0   Max.   :50194            
                                                                                   NA's   :2736             
    SUM.HSR           SUM.SD        SUM.ACC.DEC    SUM.Player.Load Daily.Mean     St.Deviation  
 Min.   :   0.0   Min.   :  0.00   Min.   :  0.0   Min.   :   0    Mode:logical   Mode:logical  
 1st Qu.: 552.0   1st Qu.: 57.25   1st Qu.: 67.0   1st Qu.: 876    NA's:3192      NA's:3192     
 Median : 843.5   Median :142.00   Median :104.0   Median :1318                                 
 Mean   : 861.0   Mean   :162.50   Mean   :106.3   Mean   :1339                                 
 3rd Qu.:1164.2   3rd Qu.:235.00   3rd Qu.:147.0   3rd Qu.:1799                                 
 Max.   :3504.0   Max.   :711.00   Max.   :259.0   Max.   :3373                                 
 NA's   :2736     NA's   :2736     NA's   :2736    NA's   :2736                                 
 Monotony.Total.Distance Monotony.HSR   Monotony.SD    Monotony.High.Intensity.Actions Monotony.Player.Load
 Mode:logical            Mode:logical   Mode:logical   Mode:logical                    Mode:logical        
 NA's:3192               NA's:3192      NA's:3192      NA's:3192                       NA's:3192           
                                                                                                           
                                                                                                           
                                                                                                           
                                                                                                           
                                                                                                           
 Strain.Total.Distance Strain.HSR     Strain.SD      Strain.High.Intensity.Actions Strain.Player.Load
 Mode:logical          Mode:logical   Mode:logical   Mode:logical                  Mode:logical      
 NA's:3192             NA's:3192      NA's:3192      NA's:3192                     NA's:3192
> head(twente_1)
  Player.Number Playing.Position       Date Week Training.type Total.Distance HSR  SD
1             1               ED 11/08/2018    1         'OFF'              0   0   0
2            NA                  12/08/2018   NA         'OFF'              0   0   0
3            NA                  13/08/2018   NA          'TT'           4599  72   0
4            NA                  14/08/2018   NA          'TT'           6328 213 104
5            NA                  15/08/2018   NA          'TT'           5522 264  22
6            NA                  16/08/2018   NA          'TT'           2873  14   0
  High.Intensity.Actions..acc.dec. Player.Load SUM.Weekly.Total.Distance SUM.HSR SUM.SD SUM.ACC.DEC
1                                0           0                     31953    1205    298         113
2                                0           0                        NA      NA     NA          NA
3                               16         141                        NA      NA     NA          NA
4                               25         362                        NA      NA     NA          NA
5                               15         283                        NA      NA     NA          NA
6                               16          66                        NA      NA     NA          NA
  SUM.Player.Load Daily.Mean St.Deviation Monotony.Total.Distance Monotony.HSR Monotony.SD
1            1843         NA           NA                      NA           NA          NA
2              NA         NA           NA                      NA           NA          NA
3              NA         NA           NA                      NA           NA          NA
4              NA         NA           NA                      NA           NA          NA
5              NA         NA           NA                      NA           NA          NA
6              NA         NA           NA                      NA           NA          NA
  Monotony.High.Intensity.Actions Monotony.Player.Load Strain.Total.Distance Strain.HSR Strain.SD
1                              NA                   NA                    NA         NA        NA
2                              NA                   NA                    NA         NA        NA
3                              NA                   NA                    NA         NA        NA
4                              NA                   NA                    NA         NA        NA
5                              NA                   NA                    NA         NA        NA
6                              NA                   NA                    NA         NA        NA
  Strain.High.Intensity.Actions Strain.Player.Load player_load_sd
1                            NA                 NA             NA
2                            NA                 NA             NA
3                            NA                 NA             NA
4                            NA                 NA             NA
5                            NA                 NA             NA
6                            NA                 NA             NA

我想创建计算一些新变量并将它们存储在特定的单元格中。比如我要求每周的Standard Deviation(一共1392行,也就是456周)

我“想出了”手动执行的代码:

twente_1$player_load_sd[1] = sd(twente_1$Player.Load[1:7])
twente_1$player_load_sd[2] = sd(twente_1$Player.Load[8:14])
twente_1$player_load_sd[3] = sd(twente_1$Player.Load[15:21])
twente_1$player_load_sd[4] = sd(twente_1$Player.Load[22:28])
twente_1$player_load_sd[5] = sd(twente_1$Player.Load[29:35])
twente_1$player_load_sd[6] = sd(twente_1$Player.Load[36:42])
twente_1$player_load_sd[7] = sd(twente_1$Player.Load[43:49])
twente_1$player_load_sd[8] = sd(twente_1$Player.Load[50:56])
twente_1$player_load_sd[9] = sd(twente_1$Player.Load[57:63])
twente_1$player_load_sd[10] = sd(twente_1$Player.Load[64:70])

我确定我可以用“for 循环”来做到这一点,但我无法成功。我试过下面的代码,但它给了我 NAs:

x <- 1
y <- 7
for (i in 1:456) {
        twente_1$player_load_sd[i] = sd(twente_1$Player.Load[x:y])
        x <- x+7
        y <- y+7
}

提前感谢您的宝贵时间和帮助。

我将 1) 创建一个周变量,2) 按周对数据集进行分组,以及 3) 使用分组的数据集计算每周的 SD,而不是 for 循环。这就是它的样子:

这是一个包含 10 周数据的示例数据集。 (如果您还没有安装 tidyverse 库。)

library(tidyverse)
df <- tibble(
  day = 1:70,
  x = runif(70, 0, 100)
)

首先,让我们通过将行分为 7 组来创建一个周变量。

df <- 
  df %>% 
  mutate(
    week = rep(1:(nrow(df)/7), each = 7)
  )

接下来,按周对数据集进行分组并计算x的标准差。最后别忘了取消分组!

df <- 
  df %>% 
  group_by(week) %>% 
  mutate(week_sd = sd(x)) %>% 
  ungroup()

我们可以查看前 14 天(即两周)以查看每周的 SD 是否保存在每一行中。

head(df, 14)

如果您想要一个每周一行的新数据集,您可以改为分组和汇总:

df_week <- 
  df %>% 
  group_by(week) %>% 
  summarize(week_sd = sd(x)) %>% 
  ungroup()

df_week