R：计算数据集中包含特殊字符（例如 [url] 的术语的数量

Question

我有一个数据集，其中我已将超链接转换为 [url] - 请参阅底部的帖子示例。我只是想通过使用 R.

来计算“[url]”的频率

我尝试了以下但没有成功：

data = read.csv(X: ....... ,tweets.csv)
word_split= strsplit(USER_POST, " ")
sum(stringr::str_count(USER_POST, "[url]"))

这个我也试过了

sum(stringr::str_count(USER_POST, "\b[url]\b"))

结果是0次。但是我在Excel签到的时候，出现了7次左右。谁能指导我做错了什么？提前谢谢你。

在下面编辑更多细节：

USER_ID    USER_POSTS 
123        I like butterflies. 
234        I have found some information in this webpage [url] 
456        Find more information here [url]

Answer 1

如果我正确理解你的问题，这应该是一个可行的解决方案：

library(stringr)
str_count(x, "\[url\]")
[1] 2

这里的关键是要考虑到 [ 和 ] 字符是正则表达式中的元字符。如果您想将它们作为文字字符进行匹配，您需要在 R 中使用双斜杠 \.

对它们进行转义

或者，str_count 允许您将元字符设置为 fixed 文字字符：

str_count(x, fixed("[url]"))
[1] 2

数据：

x <- "USER_ID USER_POSTS 123 I like butterflies. 234 I have found some information in this webpage [url] 456 Find more information here [url]"

R：计算数据集中包含特殊字符（例如 [url] 的术语的数量

R: Count the number of terms that include special characters (e.g. [url] in a dataset

r

stringr