如何编写一个循环，通过其中一列中的唯一值对数据集进行切片并遍历脚本？

Question

我有一个 df :

> head(df)
# A tibble: 6 x 4
           x     y     z form     
       <dbl> <int> <dbl> <list>   
1 6633000042    11 0.25  <chr [2]>
2 6633000043    11 0.978 <chr [2]>
3 6633000044    11 0.998 <chr [1]>
4 6633000057    11 0.499 <chr [2]>
5 6633000058    11 0.499 <chr [2]>
6 6633000059    11 0.329 <chr [2]>

第四列，form，对数据进行分组。

我需要做的是根据 form、运行中的唯一值将这些数据集分割成一堆单独的数据集，每个数据集通过一个脚本在名为 c 的数据集，然后将所有这些数据集一起追加到 df 中，列为

> names(df)
[1] "x"    "y"    "z"    "form" "c"

我试过

的不同变体

uniq <- unique(unlist(df$form))
for (i in 1:length(uniq)){
~script~
}

但我似乎无法让它工作...我觉得我遗漏了一些明显的东西。有什么建议吗？

Answer 1

尝试：

uniq <- unique(unlist(df$form))
for (i in length(uniq)){
data<- df[df$form==uniq[i]]
data$c<- '''your script'''
if (i==1) final_df<- data else final_df<- rbind(final_df, data)
}

Answer 2

这里有一些你想要的东西，至少是足够接近于解决在小标题中使用嵌套列的任务（如果我理解正确的话）：

library(tidyverse)

# dummy data
ndf <- tibble(x = c(1, 2, 3), 
              y = c(1, 2, 3), 
              z = list(c("a","b","c"),
                       c("a","b"),
                       "a"))

(dfl <- ndf %>%
    # "unpack" in list nested columns
    tidyr::unnest(cols = z) %>%
    # split into list of dfs of unpacked column z
    dplyr::group_split(z))

[[1]]
# A tibble: 3 x 3
      x     y z    
  <dbl> <dbl> <chr>
1     1     1 a    
2     2     2 a    
3     3     3 a    

[[2]]
# A tibble: 2 x 3
      x     y z    
  <dbl> <dbl> <chr>
1     1     1 b    
2     2     2 b    

[[3]]
# A tibble: 1 x 3
      x     y z    
  <dbl> <dbl> <chr>
1     1     1 c    

# apply custom formula on the list of dfs, in this case summation of x and y
(dfl2 <- purrr::map_df(dfl, ~.x %>%
                                dplyr::mutate(a = x + y)) %>%
    # choose which columns to nest back
    tidyr::nest(data = c(z, a)))

# A tibble: 3 x 3
      x     y data                
  <dbl> <dbl> <list>              
1     1     1 <tibble[,2] [3 x 2]>
2     2     2 <tibble[,2] [2 x 2]>
3     3     3 <tibble[,2] [1 x 2]>

最后两个命令周围的括号只是确保结果打印到控制台，您可以删除它们。

根据您的评论：

# aproach of custom function
myfunction <- function(data){
    res <- data %>%
               dplyr::mutate(a = x + y)
    return(res)
}

# make function return NA in case of failure
myfunction <- purrr::possibly(myfunction, NA)

# apply custom formula on the list of dfs, in this case summation of x and y
purrr::map(dfl, ~myfunction(.x))

[[1]]
# A tibble: 3 x 4
      x     y z         a
  <dbl> <dbl> <chr> <dbl>
1     1     1 a         2
2     2     2 a         4
3     3     3 a         6

[[2]]
# A tibble: 2 x 4
      x     y z         a
  <dbl> <dbl> <chr> <dbl>
1     1     1 b         2
2     2     2 b         4

[[3]]
# A tibble: 1 x 4
      x     y z         a
  <dbl> <dbl> <chr> <dbl>
1     1     1 c         2

您可以使用 map() 获取 return 中的列表，这意味着函数输出不必具有相同的 df 结构，您甚至可以使用模型作为输出并稍后使用 map2 处理它们()（采用两个列表输入来处理对、数据和模型，即，并给出一个列表输出）

如何编写一个循环，通过其中一列中的唯一值对数据集进行切片并遍历脚本？

How to write a loop that slices a dataset by the unique values in one of the columns and iterates through a script?

iteration

loops

r