改变 purrr 块中的数据列表列并通过对数字变量进行分组来获得静态最小值
mutate data list column in purrr block and get a static min by grouping numeric variable
以钻石为例,我想按切割方式分组,然后为每个分组添加一个行号,然后洗牌。然后我想对价格应用转换,在本例中只是价格 + 1,然后我想找到对应于第 1 行的价格并将其作为整个特征的值。
尝试过:
mydiamonds <- diamonds %>%
group_by(cut) %>%
mutate(rownum = row_number()) %>%
nest %>%
mutate(data = map(data, ~ .x %>% sample_n(nrow(.x)))) %>%
mutate(data = map(data, ~ .x %>% mutate(InitialPrice = price + rownum)))
这让我很接近:
mydiamonds$data[[1]] %>% head
# A tibble: 6 x 11
carat color clarity depth table price x y z rownum InitialPrice
<dbl> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <int> <int>
1 0.4 E VS1 62.4 54 951 4.73 4.75 2.96 13792 14743
2 0.71 H VS2 60.9 55 2450 5.76 5.74 3.5 20808 23258
3 1.01 F VVS2 61 57 8688 6.52 6.46 3.96 6567 15255
4 0.62 G VS2 61.6 55 2321 5.51 5.53 3.4 20438 22759
5 0.77 F VS1 60.9 58 3655 5.91 5.95 3.61 1717 5372
6 1.37 G VVS2 62.3 55.5 12207 7.05 7.14 4.43 8013 20220
从这里我想做的是找到对应于 rownum == 1 的 InitialPrice 的值,然后将 InitialPrice 覆盖为 mydiamonds$data 中每个数据帧的单个值。
我在最后一行中尝试再次变异,像这样:
mutate(data = map(data, ~ .x %>% mutate(InitialPrice = price + rownum) %>% mutate(InitialPrice = . %>% filter(rownum ==1) %>% pull(InitialPrice))))
但是出现错误:
Error: Problem with mutate()
input data
.
x Problem with mutate()
input InitialPrice
.
x Input InitialPrice
must be a vector, not a fseq/function
object.
ℹ Input InitialPrice
is . %>% filter(rownum == 1) %>% pull(InitialPrice)
.
ℹ Input data
is map(...)
.
我该怎么做?
我们可以将 .
括在大括号中
library(dplyr)
library(ggplot2)
library(purrr)
mydiamonds %>%
mutate(data = map(data, ~ .x %>%
mutate(InitialPrice = price + rownum ) %>%
mutate(InitialPrice = {.} %>%
filter(rownum ==1) %>%
pull(InitialPrice))))
# A tibble: 5 x 2
# Groups: cut [5]
# cut data
# <ord> <list>
#1 Ideal <tibble [21,551 × 11]>
#2 Premium <tibble [13,791 × 11]>
#3 Good <tibble [4,906 × 11]>
#4 Very Good <tibble [12,082 × 11]>
#5 Fair <tibble [1,610 × 11]>
你可以这样做:
library(tidyverse)
result <- mydiamonds %>%
mutate(data = map(data, ~.x %>%
mutate(InitialPrice = InitialPrice[rownum == 1])))
result$data[[1]]
# A tibble: 21,551 x 11
# carat color clarity depth table price x y z rownum InitialPrice
# <dbl> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <int> <int>
# 1 0.7 I VVS1 61.8 56 2492 5.72 5.74 3.54 20897 327
# 2 0.51 G VS1 61.8 60 1757 5.08 5.12 3.15 18405 327
# 3 0.32 G VVS1 61.4 57 814 4.39 4.41 2.7 11820 327
# 4 0.33 H VVS1 62.5 56 901 4.44 4.42 2.77 13130 327
# 5 0.72 G SI2 62.1 54 2079 5.77 5.82 3.6 19769 327
# 6 1.31 G VVS2 59.2 59 11459 7.12 7.18 4.23 7807 327
# 7 0.32 F VVS2 61.6 55 945 4.41 4.42 2.72 13714 327
# 8 0.39 G VVS1 62.1 54.7 1008 4.64 4.72 2.91 14462 327
# 9 0.7 E VVS2 62.3 53.7 3990 5.67 5.72 3.55 2138 327
#10 0.71 D SI2 62.7 55 2551 5.67 5.71 3.57 21042 327
# … with 21,541 more rows
以钻石为例,我想按切割方式分组,然后为每个分组添加一个行号,然后洗牌。然后我想对价格应用转换,在本例中只是价格 + 1,然后我想找到对应于第 1 行的价格并将其作为整个特征的值。
尝试过:
mydiamonds <- diamonds %>%
group_by(cut) %>%
mutate(rownum = row_number()) %>%
nest %>%
mutate(data = map(data, ~ .x %>% sample_n(nrow(.x)))) %>%
mutate(data = map(data, ~ .x %>% mutate(InitialPrice = price + rownum)))
这让我很接近:
mydiamonds$data[[1]] %>% head
# A tibble: 6 x 11
carat color clarity depth table price x y z rownum InitialPrice
<dbl> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <int> <int>
1 0.4 E VS1 62.4 54 951 4.73 4.75 2.96 13792 14743
2 0.71 H VS2 60.9 55 2450 5.76 5.74 3.5 20808 23258
3 1.01 F VVS2 61 57 8688 6.52 6.46 3.96 6567 15255
4 0.62 G VS2 61.6 55 2321 5.51 5.53 3.4 20438 22759
5 0.77 F VS1 60.9 58 3655 5.91 5.95 3.61 1717 5372
6 1.37 G VVS2 62.3 55.5 12207 7.05 7.14 4.43 8013 20220
从这里我想做的是找到对应于 rownum == 1 的 InitialPrice 的值,然后将 InitialPrice 覆盖为 mydiamonds$data 中每个数据帧的单个值。
我在最后一行中尝试再次变异,像这样:
mutate(data = map(data, ~ .x %>% mutate(InitialPrice = price + rownum) %>% mutate(InitialPrice = . %>% filter(rownum ==1) %>% pull(InitialPrice))))
但是出现错误:
Error: Problem with
mutate()
inputdata
. x Problem withmutate()
inputInitialPrice
. x InputInitialPrice
must be a vector, not afseq/function
object. ℹ InputInitialPrice
is. %>% filter(rownum == 1) %>% pull(InitialPrice)
. ℹ Inputdata
ismap(...)
.
我该怎么做?
我们可以将 .
括在大括号中
library(dplyr)
library(ggplot2)
library(purrr)
mydiamonds %>%
mutate(data = map(data, ~ .x %>%
mutate(InitialPrice = price + rownum ) %>%
mutate(InitialPrice = {.} %>%
filter(rownum ==1) %>%
pull(InitialPrice))))
# A tibble: 5 x 2
# Groups: cut [5]
# cut data
# <ord> <list>
#1 Ideal <tibble [21,551 × 11]>
#2 Premium <tibble [13,791 × 11]>
#3 Good <tibble [4,906 × 11]>
#4 Very Good <tibble [12,082 × 11]>
#5 Fair <tibble [1,610 × 11]>
你可以这样做:
library(tidyverse)
result <- mydiamonds %>%
mutate(data = map(data, ~.x %>%
mutate(InitialPrice = InitialPrice[rownum == 1])))
result$data[[1]]
# A tibble: 21,551 x 11
# carat color clarity depth table price x y z rownum InitialPrice
# <dbl> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <int> <int>
# 1 0.7 I VVS1 61.8 56 2492 5.72 5.74 3.54 20897 327
# 2 0.51 G VS1 61.8 60 1757 5.08 5.12 3.15 18405 327
# 3 0.32 G VVS1 61.4 57 814 4.39 4.41 2.7 11820 327
# 4 0.33 H VVS1 62.5 56 901 4.44 4.42 2.77 13130 327
# 5 0.72 G SI2 62.1 54 2079 5.77 5.82 3.6 19769 327
# 6 1.31 G VVS2 59.2 59 11459 7.12 7.18 4.23 7807 327
# 7 0.32 F VVS2 61.6 55 945 4.41 4.42 2.72 13714 327
# 8 0.39 G VVS1 62.1 54.7 1008 4.64 4.72 2.91 14462 327
# 9 0.7 E VVS2 62.3 53.7 3990 5.67 5.72 3.55 2138 327
#10 0.71 D SI2 62.7 55 2551 5.67 5.71 3.57 21042 327
# … with 21,541 more rows