使用基于第一列字符串的 mutate 和 webchem::get_cid() 函数创建 R 数据框列

Question

我遇到了一个问题，假设我们有一个 1 列的数据框 dfCHEM

CHEM_NAME
Aspirin
Captopril
(...)

我想使用 webchem::get_cid()

基于第一列的字符串创建第二列

CHEM_NAME    CID
Aspirin      2244
Captopril    44093
(...)

我试过这个代码，但它不起作用：

dfCHEM %>%
    mutate(CID=get_cid(CHEM_NAME)[[1]])

我确信这与在 mutate 中使用 get_cid() 函数有关，该函数不会在相应行检索 CHEM_NAME 字符串值，但我不知道如何以有效的方式纠正此问题。

Answer 1

您可以在代码中添加 rowwise 以强制对每一行进行操作。

library(dplyr)
library(webchem)

dfCHEM %>%
  rowwise() %>%
  mutate(CID = get_cid(CHEM_NAME)[[1]]) %>%
  ungroup()

# # A tibble: 2 x 2
#   CHEM_NAME   CID
#       <chr> <int>
# 1   Aspirin  2244
# 2 Captopril 44093

或使用lapply和unlist。

dfCHEM %>%
  mutate(CID = unlist(lapply(CHEM_NAME, get_cid)))

#   CHEM_NAME   CID
# 1   Aspirin  2244
# 2 Captopril 44093

数据

dfCHEM <- read.table(text = "CHEM_NAME
Aspirin
                     Captopril",
                     header = TRUE, stringsAsFactors = FALSE)

使用基于第一列字符串的 mutate 和 webchem::get_cid() 函数创建 R 数据框列

R dataframe column creation with mutate and webchem::get_cid() function based on the first column string

r

dplyr

webchem