导致此特定错误消息的语法错误是什么?
What syntax error is causing this specific error message?
我在 RStudio
中使用 R
,我有一个 R
脚本可以执行网络抓取。当 运行 这些特定行时,我遇到了一条错误消息:
review<-ta1 %>%
html_node("body") %>%
xml_find_all("//div[contains@class,'location-review-review']")
报错信息如下:
xmlXPathEval: evaluation failed
Error in `*tmp*` - review : non-numeric argument to binary operator
In addition: Warning message:
In xpath_search(x$node, x$doc, xpath = xpath, nsMap = ns, num_results = Inf) :
Invalid predicate [1206]
注意:我的 R
脚本中加载了 dplyr
和 rvest
库。
我在 Whosebug
上查看了以下问题的答案:
二元运算符错误的非数字参数
我觉得我的解决方案与 Richard Border 对上面链接的问题提供的答案有关。但是,我很难根据该答案找出如何更正我的 R
语法。
感谢您调查我的问题。
添加的 ta1 示例:
{xml_document}
<html lang="en" xmlns:og="http://opengraphprotocol.org/schema/">
[1] <head>\n<meta http-equiv="content-type" content="text/html; charset=utf-8">\n<link rel="icon" id="favicon" ...
[2] <body class="rebrand_2017 desktop_web Hotel_Review js_logging" id="BODY_BLOCK_JQUERY_REFLOW" data-tab="TAB ...
我将在这里做一些假设,因为您的 post 没有包含足够的信息来生成可重现的示例。
首先,我猜您正在尝试抓取 TripAdvisor,因为 id 和 class 字段与该网站匹配,并且您的变量名为 ta1
.
其次,我假设您正在尝试获取每条评论的文本和每条评论的星数,因为这是您 classes 中每个相关的可抓取信息似乎在寻找。
我需要先获取我自己的 ta1
变量版本,因为它无法从您编辑的版本中重现。
library(httr)
library(rvest)
library(xml2)
library(magrittr)
library(tibble)
"https://www.tripadvisor.co.uk/" %>%
paste0("Hotel_Review-g186534-d192422-Reviews-") %>%
paste0("Glasgow_Marriott_Hotel-Glasgow_Scotland.html") -> url
ta1 <- url %>% GET %>% read_html
现在为感兴趣的数据编写正确的 xpaths
# xpath for elements whose text contains reviews
xpath1 <- "//div[contains(@class, 'location-review-review-list-parts-Expand')]"
# xpath for the elements whose class indicate the ratings
xpath2 <- "//div[contains(@class, 'location-review-review-')]"
xpath3 <- "/span[contains(@class, 'ui_bubble_rating bubble_')]"
我们可以这样获取评论的文本:
ta1 %>%
xml_find_all(xpath1) %>% # run first query
html_text() %>% # extract text
extract(!equals(., "Read more")) -> reviews # remove "blank" reviews
相关的星级评分如下:
ta1 %>%
xml_find_all(paste0(xpath2, xpath3)) %>%
xml_attr("class") %>%
strsplit("_") %>%
lapply(function(x) x[length(x)]) %>%
as.numeric %>%
divide_by(10) -> stars
我们的结果是这样的:
tibble(rating = stars, review = reviews)
## A tibble: 5 x 2
# rating review
# <dbl> <chr>
#1 1 7 of us attended the Christmas Party on Satu~
#2 4 "We stayed 2 nights over last weekend to att~
#3 3 Had a good stay, but had no provision to kee~
#4 3 Booked an overnight for a Christmas shopping~
#5 4 Attended a charity lunch here on Friday and ~
我在 RStudio
中使用 R
,我有一个 R
脚本可以执行网络抓取。当 运行 这些特定行时,我遇到了一条错误消息:
review<-ta1 %>%
html_node("body") %>%
xml_find_all("//div[contains@class,'location-review-review']")
报错信息如下:
xmlXPathEval: evaluation failed
Error in `*tmp*` - review : non-numeric argument to binary operator
In addition: Warning message:
In xpath_search(x$node, x$doc, xpath = xpath, nsMap = ns, num_results = Inf) :
Invalid predicate [1206]
注意:我的 R
脚本中加载了 dplyr
和 rvest
库。
我在 Whosebug
上查看了以下问题的答案:
二元运算符错误的非数字参数
我觉得我的解决方案与 Richard Border 对上面链接的问题提供的答案有关。但是,我很难根据该答案找出如何更正我的 R
语法。
感谢您调查我的问题。
添加的 ta1 示例:
{xml_document}
<html lang="en" xmlns:og="http://opengraphprotocol.org/schema/">
[1] <head>\n<meta http-equiv="content-type" content="text/html; charset=utf-8">\n<link rel="icon" id="favicon" ...
[2] <body class="rebrand_2017 desktop_web Hotel_Review js_logging" id="BODY_BLOCK_JQUERY_REFLOW" data-tab="TAB ...
我将在这里做一些假设,因为您的 post 没有包含足够的信息来生成可重现的示例。
首先,我猜您正在尝试抓取 TripAdvisor,因为 id 和 class 字段与该网站匹配,并且您的变量名为 ta1
.
其次,我假设您正在尝试获取每条评论的文本和每条评论的星数,因为这是您 classes 中每个相关的可抓取信息似乎在寻找。
我需要先获取我自己的 ta1
变量版本,因为它无法从您编辑的版本中重现。
library(httr)
library(rvest)
library(xml2)
library(magrittr)
library(tibble)
"https://www.tripadvisor.co.uk/" %>%
paste0("Hotel_Review-g186534-d192422-Reviews-") %>%
paste0("Glasgow_Marriott_Hotel-Glasgow_Scotland.html") -> url
ta1 <- url %>% GET %>% read_html
现在为感兴趣的数据编写正确的 xpaths
# xpath for elements whose text contains reviews
xpath1 <- "//div[contains(@class, 'location-review-review-list-parts-Expand')]"
# xpath for the elements whose class indicate the ratings
xpath2 <- "//div[contains(@class, 'location-review-review-')]"
xpath3 <- "/span[contains(@class, 'ui_bubble_rating bubble_')]"
我们可以这样获取评论的文本:
ta1 %>%
xml_find_all(xpath1) %>% # run first query
html_text() %>% # extract text
extract(!equals(., "Read more")) -> reviews # remove "blank" reviews
相关的星级评分如下:
ta1 %>%
xml_find_all(paste0(xpath2, xpath3)) %>%
xml_attr("class") %>%
strsplit("_") %>%
lapply(function(x) x[length(x)]) %>%
as.numeric %>%
divide_by(10) -> stars
我们的结果是这样的:
tibble(rating = stars, review = reviews)
## A tibble: 5 x 2
# rating review
# <dbl> <chr>
#1 1 7 of us attended the Christmas Party on Satu~
#2 4 "We stayed 2 nights over last weekend to att~
#3 3 Had a good stay, but had no provision to kee~
#4 3 Booked an overnight for a Christmas shopping~
#5 4 Attended a charity lunch here on Friday and ~