如何提取特定关键字前后的所有数字？

Question

我是 R 的新手，过去 2 个月一直在这个网站上尝试了解更多信息。我想从具有特定关键字的数据集中提取信息，然后从具有该关键字的数据集中提取信息，我想提取该关键字前后的 5 个词。然后我想知道他们在同一句话中靠近他们的数字是多少。

为了解释“为什么”，我有一个票证列表，我想提取票证的所有标题。然后我想从那些请求额外存储的票证列表中知道。如果是，我想知道他们要求多少存储空间，然后我将根据他们要求的存储空间大小创建操作（但那是以后的事情）。

到目前为止我已经完成的代码示例（有点乱，我仍在以 better/cleaner 的方式工作，我对 R 还很陌生）。

我搜索的关键字：存储

Dataframe 引用为：DF、DF2、DF3 等

来自 DF 的列：标题

#Check for keyword#
grep("storage", DF$Title, ignore.case=true)

#Pull words before and after keywords, this is case sensitive for some reason so I have to do it twice and merge the data frames, it also creates a list instead of a data frame so I have to change that into a data frame...Messy I know#
DF2 <- stringr::str_extract_all(DF$Title, "([^\s]+\s){0,5}Storage(\s[^\s]+){0,5}")

#Turn list into dataframe#
DF3 <- do.call(rbind.data.frame, DF2)

#Pull words before and after but in lower case, same as step two#
DF4 <- stringr::str_extract_all(DF$Title, "([^\s]+\s){0,5}storage(\s[^\s]+){0,5}")

#Turn list into dataframe#
DF5 <- do.call(rbind.data.frame, DF4)

#Change column names ( I have to do this to merge them via rbind)
DF6 <- setnames(DF3, c("Keyword")
DF7 <- setnames(DF5, c("Keyword")

#Merge both data frames together#
DF6 <- rbind(DF6,Df7)

我想检查请求的存储量，所以我试图寻找一个引用 GB 或 TB 等的数字。我尝试了很多代码，但很多只是在之后提取数字或数字关键字，不是句子中的所有数字。

我尝试过的无效示例

DFTest <- as.integer(str_match(DF6, "(?i\bGB:?\s*(\d+")[,2])

Answer 1

以下方法将提取特定关键字（本例中我使用 AND）之前或关键字之后的所有数字。您可以在 regex 模式中更改关键字。

library(tidyverse)

df <- data.frame(obs = 1:5, COL_D = c("2019AND", "AND1999", "101AND", "AND12", "20AND1999999"))

df2 <- df %>% 
  mutate(Extracted_Num = str_extract_all(COL_D, regex("\d+(?=AND)|(?<=AND)\d+")))

# obs        COL_D Extracted_Num
# 1   1      2019AND          2019
# 2   2      AND1999          1999
# 3   3       101AND           101
# 4   4        AND12            12
# 5   5 20AND1999999   20, 1999999

如何提取特定关键字前后的所有数字？

How to extract all numbers before and after a specific keyword?

numbers

r

extract

dataframe

stringr