在 R 中：按行 return 最大值和对应的列名

Question

我试图在多个列中逐行获取最大值，并使用最大值和相应的列名称创建 2 个新列。然后，使用列名，我需要 select 共享该列名子字符串的另一列的值。

这是我试图解决的一个例子：

measure_day1 <-  c(1,2,5)
measure_day2 <- c(5,7,1)
measure_day3 <- c(2,3,9)
temp_day1 <- c(25, 27, 29)
temp_day2 <- c(31, 33, 35)
temp_day3 <- c(14, 16, 19)

df <- data.frame(measure_day1, measure_day2, measure_day3, temp_day1, temp_day2, temp_day3)

  measure_day1 measure_day2 measure_day3 temp_day1 temp_day2 temp_day3
1            1            5            2        25        31        14
2            2            7            3        27        33        16
3            5            1            9        29        35        19

这就是结果：

 measure_day1 measure_day2 measure_day3 temp_day1 temp_day2 temp_day3 measure_max day_measure_max temp_day_measure_max
1            1            5            2        25        31        14           5    measure_day2                   31
2            2            7            3        27        33        16           7    measure_day2                   33
3            5            1            9        29        35        19           9    measure_day3                   19

我发现了这个类似的问题，但无法完成我的任务。

For each row return the column name of the largest value

非常感谢任何帮助。

Answer 1

试试这个 tidyverse 方法。更实际的做法是将数据重塑为很久之前为每行创建一个 id，然后使用 filter 提取所需的值。使用 pivot_wider() 您可以获得所需的值，然后应用最大值过滤器。最后，您可以使用 left_join() 和您根据行创建的 ID 合并到您的原始数据。这里的代码：

library(dplyr)
library(tidyr)
#Code
newdf <- df %>% mutate(id=1:n()) %>%
  left_join(df %>% mutate(id=1:n()) %>%
  pivot_longer(-id) %>%
  separate(name,c('Var','Day'),sep='_') %>%
  pivot_wider(names_from=Var,values_from=value) %>%
  group_by(id) %>%
  filter(measure==max(measure)) %>%
  mutate(Day=paste0('measure_',Day)) %>% select(-measure) %>%
  rename(measure_max=Day,temp_day_measure_max=temp)) %>% select(-id)

输出：

  measure_day1 measure_day2 measure_day3 temp_day1 temp_day2 temp_day3  measure_max temp_day_measure_max
1            1            5            2        25        31        14 measure_day2                   31
2            2            7            3        27        33        16 measure_day2                   33
3            5            1            9        29        35        19 measure_day3                   19

Answer 2

基础 R 方法：

#Select measure columns
measure_cols <- grep('measure', names(df), value = TRUE)
#Select temp_day column
temp_cols <- grep('temp_day', names(df))
#Get the index of max value in measure column
inds <- max.col(df[measure_cols])
#Get max value in measure columns
df$measure_max <- do.call(pmax, df[measure_cols])
#Get the name of max value in measure columns
df$day_measure_max <- measure_cols[inds]
#Get the corresponding temp column. 
df$temp_day_measure_max <- df[temp_cols][cbind(1:nrow(df), inds)]
df

#  measure_day1 measure_day2 measure_day3 temp_day1 temp_day2 temp_day3
#1            1            5            2        25        31        14
#2            2            7            3        27        33        16
#3            5            1            9        29        35        19

#  measure_max day_measure_max temp_day_measure_max
#1           5    measure_day2                   31
#2           7    measure_day2                   33
#3           9    measure_day3                   19

在 R 中：按行 return 最大值和对应的列名

In R: row wise return max value and corresponding column name

substring

r

dataframe

rowwise