使用 pmap 迭代 tibble 的行
Using pmap to iterate over rows of a tibble
我有一个非常简单的 tibble,我想遍历它的行以使用 pmap
函数应用一个函数。我想我可能误解了 pmap
函数的某些要点,但我在选择参数方面大多有困难。所以
我想知道在这种情况下我是否应该将 rowwise
函数与 pmap
一起使用。但是我还没有看到一个案例。
另一个问题是选择要使用列表或 select
函数迭代的变量:
# Here is my tibble
# Imagine I would like to apply a `n_distinct` function with pmap on it every rows
df <- tibble(id = c("01", "02", "03","04","05","06"),
A = c("Jan", "Mar", "Jan","Jan","Jan","Mar"),
B = c("Feb", "Mar", "Jan","Jan","Mar","Mar"),
C = c("Feb", "Mar", "Feb","Jan","Feb","Feb")
)
# It is perfectly achievable with `rowwise` and `mutate` and results in my desired output
df %>%
rowwise() %>%
mutate(overal = n_distinct(c_across(A:C)))
# A tibble: 6 x 5
# Rowwise:
id A B C overal
<chr> <chr> <chr> <chr> <int>
1 01 Jan Feb Feb 2
2 02 Mar Mar Mar 1
3 03 Jan Jan Feb 2
4 04 Jan Jan Jan 1
5 05 Jan Mar Feb 3
6 06 Mar Mar Feb 2
# But with `pmap` it won't.
df %>%
select(-id) %>%
mutate(overal = pmap_dbl(list(A, B, C), n_distinct))
# A tibble: 6 x 4
A B C overal
<chr> <chr> <chr> <dbl>
1 Jan Feb Feb 1
2 Mar Mar Mar 1
3 Jan Jan Feb 1
4 Jan Jan Jan 1
5 Jan Mar Feb 1
6 Mar Mar Feb 1
我只需要一点关于 pmap
在 tibble 上的行迭代的应用的解释,所以我非常感谢提前提供的任何帮助,谢谢。
我能够找到这个问题,但不能说它是错误还是这里的功能。关键是 pmap
内的 n_distinct()
将给定的输入处理为具有 3 列的数据框。将 n_distinct()
应用于数据框时,它会计算不同行的数量,因此每行中的 1
n_distinct(tibble(a = c(1, 2, 2),
b = 3))
#> [1] 2
诀窍是先将输入转换为向量,然后将其传递给n_distinct
df %>%
select(-id) %>%
mutate(overal = pmap_dbl(list(A, B, C), ~ n_distinct(c(...))))
#> # A tibble: 6 x 4
#> A B C overal
#> <chr> <chr> <chr> <dbl>
#> 1 Jan Feb Feb 2
#> 2 Mar Mar Mar 1
#> 3 Jan Jan Feb 2
#> 4 Jan Jan Jan 1
#> 5 Jan Mar Feb 3
#> 6 Mar Mar Feb 2
我有一个非常简单的 tibble,我想遍历它的行以使用 pmap
函数应用一个函数。我想我可能误解了 pmap
函数的某些要点,但我在选择参数方面大多有困难。所以
我想知道在这种情况下我是否应该将 rowwise
函数与 pmap
一起使用。但是我还没有看到一个案例。
另一个问题是选择要使用列表或 select
函数迭代的变量:
# Here is my tibble
# Imagine I would like to apply a `n_distinct` function with pmap on it every rows
df <- tibble(id = c("01", "02", "03","04","05","06"),
A = c("Jan", "Mar", "Jan","Jan","Jan","Mar"),
B = c("Feb", "Mar", "Jan","Jan","Mar","Mar"),
C = c("Feb", "Mar", "Feb","Jan","Feb","Feb")
)
# It is perfectly achievable with `rowwise` and `mutate` and results in my desired output
df %>%
rowwise() %>%
mutate(overal = n_distinct(c_across(A:C)))
# A tibble: 6 x 5
# Rowwise:
id A B C overal
<chr> <chr> <chr> <chr> <int>
1 01 Jan Feb Feb 2
2 02 Mar Mar Mar 1
3 03 Jan Jan Feb 2
4 04 Jan Jan Jan 1
5 05 Jan Mar Feb 3
6 06 Mar Mar Feb 2
# But with `pmap` it won't.
df %>%
select(-id) %>%
mutate(overal = pmap_dbl(list(A, B, C), n_distinct))
# A tibble: 6 x 4
A B C overal
<chr> <chr> <chr> <dbl>
1 Jan Feb Feb 1
2 Mar Mar Mar 1
3 Jan Jan Feb 1
4 Jan Jan Jan 1
5 Jan Mar Feb 1
6 Mar Mar Feb 1
我只需要一点关于 pmap
在 tibble 上的行迭代的应用的解释,所以我非常感谢提前提供的任何帮助,谢谢。
我能够找到这个问题,但不能说它是错误还是这里的功能。关键是 pmap
内的 n_distinct()
将给定的输入处理为具有 3 列的数据框。将 n_distinct()
应用于数据框时,它会计算不同行的数量,因此每行中的 1
n_distinct(tibble(a = c(1, 2, 2),
b = 3))
#> [1] 2
诀窍是先将输入转换为向量,然后将其传递给n_distinct
df %>%
select(-id) %>%
mutate(overal = pmap_dbl(list(A, B, C), ~ n_distinct(c(...))))
#> # A tibble: 6 x 4
#> A B C overal
#> <chr> <chr> <chr> <dbl>
#> 1 Jan Feb Feb 2
#> 2 Mar Mar Mar 1
#> 3 Jan Jan Feb 2
#> 4 Jan Jan Jan 1
#> 5 Jan Mar Feb 3
#> 6 Mar Mar Feb 2