在 R 中删除文本文件中的特殊字符

Question

我在 R 中使用文本文件并使用 readLine 函数和正则表达式从中提取单词。该文件在单词周围使用特殊字符（例如 # sings before and after a word to show it is bolded 或 @ sings before and after a word to show it should be italicized）来表示特殊含义，这搞乱了我的正则表达式。

到目前为止，这是我的 r 代码，它删除了所有空行，然后将我的文本文件组合成一个向量：

    book<-readLines("/Users/Desktop/SAMPLE .txt",encoding="UTF-8")
    #remove all empty lines
    empty_lines = grepl('^\s*$', book)
    book = book[! empty_lines]
    #combine book into one variable
    xBook = paste(book, collapse = '')
    #remove extra white spaces for a single text of the entire book
    updated<-trimws(gsub("\s+"," ",xBook))

当我运行更新时，我看到存储在变量中的整个文件都已更新但带有特殊字符：

updated [1] "It is a truth universally acknowledged, that a #single# man in possession of a good fortune, must be in want of a wife. However little known the feelings or views of such a @man@ may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, @that@ he is considered the rightful property of some one or other of #their# daughters.

如何从我更新的变量中的单词中删除所有前导或尾随的 # 或 @？

我想要的输出只是纯文本，没有指示应加粗或斜体的单词：

updated [1] "It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife. However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered the rightful property of some one or other of their daughters.

Answer 1

gsub("[@#]([a-zA-Z]+)[@#]", "\1", x)

在 R 中删除文本文件中的特殊字符

Removing Special Characters in a Text File in R

regex

r

special-characters

text-files

gsub