对于每一行,找到与特定字符串匹配的单元格和列名的最后一个字符 return

For each row, find the cell that matches a specific string and return last character of column name

以下是一些示例数据。每一行都是不同的参与者。每个参与者完成五个试验。在每次试验中,他们从一组 10 个水果中挑选一个水果(不更换)。

ID trial_1 trial_2 trial_3 trial_4 trial_5
01 苹果 橙色 香蕉 桃色 葡萄
02 葡萄 西瓜 芒果 桃色 杏色
03 葡萄 芒果 橙色 香蕉
04 西瓜 苹果 桃色 葡萄
05 香蕉 桃色 苹果 葡萄 芒果

我想创建 10 个新列——每个水果一个——其中包含试用编号(如果没有试用编号,则为“NA”):

ID trial_1 trial_2 trial_3 trial_4 trial_5 苹果 杏色 香蕉 葡萄 芒果 橙色 桃色 草莓 西瓜
01 苹果 橙色 香蕉 桃色 葡萄 1 不适用 3 5 不适用 2 4 不适用 不适用 不适用
02 葡萄 西瓜 芒果 桃色 杏色 不适用 5 不适用 1 3 不适用 4 不适用 不适用 2
03 葡萄 芒果 橙色 香蕉 不适用 不适用 5 2 3 4 不适用 1 不适用 不适用
04 西瓜 苹果 桃色 葡萄 2 不适用 不适用 4 不适用 不适用 3 5 不适用 1
05 香蕉 桃色 苹果 葡萄 芒果 3 不适用 1 4 5 不适用 2 不适用 不适用 不适用

我可以像这样对每个水果栏都这样做,但它看起来很笨重:

mutate(apple = ifelse(trial_1 == "apple", 1,
               ifelse(trial_2 == "apple", 2,
               ifelse(trial_2 == "apple", 3,
               ifelse(trial_2 == "apple", 4
               ifelse(trial_2 == "apple", 5, "NA"))))))

我假设有一个更简单、更简洁的解决方案,可能使用 rowwise() 来匹配水果名称,然后只返回列名称的最后一个字符(即数字)。但我无法解决。你能帮忙吗?

考虑按我们想要的顺序创建水果矢量(base R

nm1 <- c("apple", "apricot", "banana", "grapes", "mango", "orange", 
         "peach", "pear", "strawberries", "watermelon")

然后遍历数据行,使用match获取索引并将其分配为新列

df1[nm1] <- t(apply(df1[-1], 1, function(x) match(nm1, x)))

-输出

df1
  ID    trial_1    trial_2 trial_3 trial_4 trial_5 apple apricot banana grapes mango orange peach pear strawberries watermelon
1  1      apple     orange  banana   peach  grapes     1      NA      3      5    NA      2     4   NA           NA         NA
2  2     grapes watermelon   mango   peach apricot    NA       5     NA      1     3     NA     4   NA           NA          2
3  3       pear     grapes   mango  orange  banana    NA      NA      5      2     3      4    NA    1           NA         NA
4  4 watermelon      apple   peach  grapes    pear     2      NA     NA      4    NA     NA     3    5           NA          1
5  5     banana      peach   apple  grapes   mango     3      NA      1      4     5     NA     2   NA           NA         NA

或者另一个base R选项是

xtabs(ind ~ ID + values, transform(stack(df1[-1]), 
        ind = as.integer(sub(".*_", "", ind)), ID = df1$ID))

数据

df1 <- structure(list(ID = 1:5, trial_1 = c("apple", "grapes", "pear", 
"watermelon", "banana"), trial_2 = c("orange", "watermelon", 
"grapes", "apple", "peach"), trial_3 = c("banana", "mango", "mango", 
"peach", "apple"), trial_4 = c("peach", "peach", "orange", "grapes", 
"grapes"), trial_5 = c("grapes", "apricot", "banana", "pear", 
"mango")), class = "data.frame", row.names = c(NA, -5L))
library(tidyverse)
df %>%
  pivot_longer(-ID) %>%
  mutate(name = parse_number(name)) %>%
  pivot_wider(names_from = value, values_from = name)

这将给出右侧的列。要将这些附加到原始文件,

left_join(df, 
    # the code above
)

结果

Joining, by = "ID"
# A tibble: 5 x 15
  ID    trial_1    trial_2    trial_3 trial_4 trial_5 apple orange banana peach grapes watermelon mango apricot  pear
  <chr> <chr>      <chr>      <chr>   <chr>   <chr>   <dbl>  <dbl>  <dbl> <dbl>  <dbl>      <dbl> <dbl>   <dbl> <dbl>
1 01    apple      orange     banana  peach   grapes      1      2      3     4      5         NA    NA      NA    NA
2 02    grapes     watermelon mango   peach   apricot    NA     NA     NA     4      1          2     3       5    NA
3 03    pear       grapes     mango   orange  banana     NA      4      5    NA      2         NA     3      NA     1
4 04    watermelon apple      peach   grapes  pear        2     NA     NA     3      4          1    NA      NA     5
5 05    banana     peach      apple   grapes  mango       3     NA      1     2      4         NA     5      NA    NA

源数据:

tibble::tribble(
   ~ID,     ~trial_1,     ~trial_2, ~trial_3, ~trial_4,  ~trial_5,
  "01",      "apple",     "orange", "banana",  "peach",  "grapes",
  "02",     "grapes", "watermelon",  "mango",  "peach", "apricot",
  "03",       "pear",     "grapes",  "mango", "orange",  "banana",
  "04", "watermelon",      "apple",  "peach", "grapes",    "pear",
  "05",     "banana",      "peach",  "apple", "grapes",   "mango"
  ) -> df

这个问题的另一个 tidyverse 解决方案:

library(dplyr)
library(purrr)

nm <- unique(unlist(df1[-1]))

df1 %>%
  bind_cols(nm %>%
              map_dfc(function(a) pmap_dbl(df1[, -1], ~ match(a, c(...)))) %>%
              set_names(nm))


  ID    trial_1    trial_2 trial_3 trial_4 trial_5 apple grapes pear watermelon banana orange
1  1      apple     orange  banana   peach  grapes     1      5   NA         NA      3      2
2  2     grapes watermelon   mango   peach apricot    NA      1   NA          2     NA     NA
3  3       pear     grapes   mango  orange  banana    NA      2    1         NA      5      4
4  4 watermelon      apple   peach  grapes    pear     2      4    5          1     NA     NA
5  5     banana      peach   apple  grapes   mango     3      4   NA         NA      1     NA
  peach mango apricot
1     4    NA      NA
2     4     3       5
3    NA     3      NA
4     3    NA      NA
5     2     5      NA