改变 purrr 块中的数据列表列并通过对数字变量进行分组来获得静态最小值

mutate data list column in purrr block and get a static min by grouping numeric variable

以钻石为例,我想按切割方式分组,然后为每个分组添加一个行号,然后洗牌。然后我想对价格应用转换,在本例中只是价格 + 1,然后我想找到对应于第 1 行的价格并将其作为整个特征的值。

尝试过:

mydiamonds <- diamonds %>%
  group_by(cut) %>% 
  mutate(rownum = row_number()) %>% 
  nest %>% 
  mutate(data = map(data, ~ .x %>% sample_n(nrow(.x)))) %>% 
  mutate(data = map(data, ~ .x %>% mutate(InitialPrice = price + rownum)))

这让我很接近:

mydiamonds$data[[1]] %>% head
# A tibble: 6 x 11
  carat color clarity depth table price     x     y     z rownum InitialPrice
  <dbl> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>  <int>        <int>
1  0.4  E     VS1      62.4  54     951  4.73  4.75  2.96  13792        14743
2  0.71 H     VS2      60.9  55    2450  5.76  5.74  3.5   20808        23258
3  1.01 F     VVS2     61    57    8688  6.52  6.46  3.96   6567        15255
4  0.62 G     VS2      61.6  55    2321  5.51  5.53  3.4   20438        22759
5  0.77 F     VS1      60.9  58    3655  5.91  5.95  3.61   1717         5372
6  1.37 G     VVS2     62.3  55.5 12207  7.05  7.14  4.43   8013        20220

从这里我想做的是找到对应于 rownum == 1 的 InitialPrice 的值,然后将 InitialPrice 覆盖为 mydiamonds$data 中每个数据帧的单个值。

我在最后一行中尝试再次变异,像这样:

mutate(data = map(data, ~ .x %>% mutate(InitialPrice = price + rownum) %>% mutate(InitialPrice = . %>% filter(rownum ==1) %>% pull(InitialPrice))))

但是出现错误:

Error: Problem with mutate() input data. x Problem with mutate() input InitialPrice. x Input InitialPrice must be a vector, not a fseq/function object. ℹ Input InitialPrice is . %>% filter(rownum == 1) %>% pull(InitialPrice). ℹ Input data is map(...).

我该怎么做?

我们可以将 . 括在大括号中

library(dplyr)
library(ggplot2)
library(purrr)
mydiamonds %>% 
   mutate(data = map(data, ~ .x %>% 
       mutate(InitialPrice = price + rownum ) %>%
       mutate(InitialPrice = {.} %>% 
                 filter(rownum ==1) %>% 
                 pull(InitialPrice))))
# A tibble: 5 x 2
# Groups:   cut [5]
#  cut       data                  
#  <ord>     <list>                
#1 Ideal     <tibble [21,551 × 11]>
#2 Premium   <tibble [13,791 × 11]>
#3 Good      <tibble [4,906 × 11]> 
#4 Very Good <tibble [12,082 × 11]>
#5 Fair      <tibble [1,610 × 11]> 

你可以这样做:

library(tidyverse)

result <- mydiamonds %>%
              mutate(data = map(data, ~.x %>% 
                            mutate(InitialPrice = InitialPrice[rownum == 1])))

result$data[[1]]

# A tibble: 21,551 x 11
#   carat color clarity depth table price     x     y     z rownum InitialPrice
#   <dbl> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>  <int>        <int>
# 1  0.7  I     VVS1     61.8  56    2492  5.72  5.74  3.54  20897          327
# 2  0.51 G     VS1      61.8  60    1757  5.08  5.12  3.15  18405          327
# 3  0.32 G     VVS1     61.4  57     814  4.39  4.41  2.7   11820          327
# 4  0.33 H     VVS1     62.5  56     901  4.44  4.42  2.77  13130          327
# 5  0.72 G     SI2      62.1  54    2079  5.77  5.82  3.6   19769          327
# 6  1.31 G     VVS2     59.2  59   11459  7.12  7.18  4.23   7807          327
# 7  0.32 F     VVS2     61.6  55     945  4.41  4.42  2.72  13714          327
# 8  0.39 G     VVS1     62.1  54.7  1008  4.64  4.72  2.91  14462          327
# 9  0.7  E     VVS2     62.3  53.7  3990  5.67  5.72  3.55   2138          327
#10  0.71 D     SI2      62.7  55    2551  5.67  5.71  3.57  21042          327
# … with 21,541 more rows