如何只在多列的一行中保留最小值并使 R 中的所有其他行值为 0

Question

我有一个数据框，其中包含城市中不同地点之间的距离。在根据条件进行分组时，某些位置会因此而被其他位置“共享”，从而导致计算过程中出现重复。因此，为了更正重复，我试图计算一行中的最小距离，并将该行中的所有其他值设为 0，以便我可以只将行方面的最小值纳入我的计算中。

Sample data:

> df <- data.frame(name = letters[1:3],
+                  col1 = rnorm(3,10,1),
+                  col2 = rnorm(3,10,1),
+                  col3 = rnorm(3,10,1))
> df
  name      col1      col2     col3
1    a  9.994703 10.882758 9.005535
2    b 11.505343  9.613655 9.589866
3    c 11.713150  9.240391 9.788279
> df$min <- apply(df[,2:4],1,min)
> df
  name      col1      col2     col3      min
1    a  9.994703 10.882758 9.005535 9.005535
2    b 11.505343  9.613655 9.589866 9.589866
3    c 11.713150  9.240391 9.788279 9.240391
>

现在，我需要将行中不是最小值的值设置为 0。预期输出：

> df
  name      col1      col2     col3      min
1    a       0         0   9.005535 9.005535
2    b       0         0   9.589866 9.589866
3    c       0    9.240391   0      9.240391

谁能告诉我该怎么做。

Answer 1

一个dplyr和purrr的解决方案可以是：

df %>%
 mutate(min_col = pmap(across(starts_with("col")), min),
        across(starts_with("col"), ~ (. == min_col) * .))

  name col1 col2      col3  min_col
1    a    0    0  9.659657 9.659657
2    b    0    0 10.288515 10.28851
3    c    0    0  9.303990  9.30399

Answer 2

我想您并不真的需要 min 列。您可以在同一个 apply 调用中将这些值与行的最小值进行比较，从而将其变为 0。

df[, 2:4] <- t(apply(df[,2:4],1,function(x) x * +(x == min(x))))
df

#  name     col1 col2     col3
#1    a 9.439524    0 0.000000
#2    b 0.000000    0 8.734939
#3    c 0.000000    0 9.313147

数据

set.seed(123)
df <- data.frame(name = letters[1:3],
                 col1 = rnorm(3,10,1),
                 col2 = rnorm(3,10,1),
                 col3 = rnorm(3,10,1))
df
#  name      col1     col2      col3
#1    a  9.439524 10.07051 10.460916
#2    b  9.769823 10.12929  8.734939
#3    c 11.558708 11.71506  9.313147

如何只在多列的一行中保留最小值并使 R 中的所有其他行值为 0

How to just keep the minimum value in a row across multiple columns and make all other row values 0 in R

r

apply

dplyr