如何将数据集划分为分区并存储特定功能的不同属性？

Question

在这个例子中，我只想使用 data.frame 的一列：
我选择的列应该每 70 行分成一个分区。
例如：1..70 / 71...140 / 141...210 直到 N=65.000
输出：在每个子集上，特定功能应存储不同的属性。

在这种特殊情况下，我想从验证包的验证函数中存储 $MSE 和 $ME。再次强调这个过程：

我只想每 70 行浏览一次我的专栏；
使用验证功能；
并将一些属性存储在新的 data.frame

。

ID.            MSE.         ME. 
1 (1to70)      0.3          0.6   
2 (71to140)    0.2          0.5
3 (141to210)   0.25         0.76
...            ...          ...

我已经尝试了以下方法，但我无法处理如何按照上面的解释为每个分区存储我的属性。

set.seed(1) # reproducible data
df <- as.data.frame(runif(65000,0,1))
probabilities.to.check.against <- runif(70,0,1)

store.as.df <- df[1] %>% 
    mutate(ID = floor((row_number()-1)/70)) %>%   # I'm trying to select partitions every 70 rows
    group_by(ID) %>% 
    verify(probabilities.to.check.against, PARTIOTIONS_OF_DF, frcst.type = "cont", obs.type = "cont")

Answer 1

试一试

# split your data
runs <- 70
sdf <- split(df, 0:(nrow(df)-1) %/% runs)

# Validate split
library(purrr)
head(map_dbl(sdf, ~nrow(.x)))
#  0  1  2  3  4  5 
# 70 70 70 70 70 70

# Answer
ans <- map_df(sdf, ~as.data.frame(verify(probabilities.to.check.against, .x[,1], frcst.type = "cont", obs.type = "cont")[c("MSE","ME")]), .id="id")

# Output
     # id       MSE            ME
# 1     0 0.1326722  5.145940e-03
# 2     1 0.1662103 -3.211852e-02
# 3     2 0.1522823  1.594105e-02
# 4     3 0.1485422 -1.069273e-01
# 5     4 0.1714966  1.595200e-03
# 6     5 0.2195108  1.866164e-03
# 7     6 0.1942890 -1.029523e-02
# 8     7 0.1730359  4.800538e-04
# 9     8 0.1432483  1.843559e-02
# 10    9 0.1554882 -6.644684e-03
# 11   10 0.1895140 -3.035421e-02
# # etc

如果您想使用指定的格式，请更改 id

ans$id <- c(paste0(head(as.numeric(ans$id),-1)*runs+1, "-", tail(as.numeric(ans$id),-1)*runs), tail(as.numeric(ans$id),1)*runs+1)
# [1] "1-70"        "71-140"      "141-210"     "211-280"     "281-350"

如何将数据集划分为分区并存储特定功能的不同属性？

How to divide a dataset into partitions and store different attributs of a specific function?

loops

store

r

dataframe

dplyr