带有 R 错误的 Web 抓取
Web scraping with R error
我正在尝试抓取 sainsburys.co.uk,我是 运行 R 中的下一个代码
doc <- htmlTreeParse('http://www.sainsburys.co.uk/shop/gb/groceries/fruit-veg/all-fruit#langId=44&storeId=10151&catalogId=10122&categoryId=12545&parent_category_rn=12518&top_category=12518&pageSize=30&orderBy=FAVOURITES_FIRST&searchTerm')
rootNode <- xmlRoot(doc)
但是我有这个错误:
Error in x$children[[1]] : subscript out of bounds
我做错了什么?
您可以尝试 httr
库:
library(XML)
library(httr)
url <- 'http://www.sainsburys.co.uk/shop/gb/groceries/fruit-veg/all-fruit#langId=44&storeId=10151&catalogId=10122&categoryId=12545&parent_category_rn=12518&top_category=12518&pageSize=30&orderBy=FAVOURITES_FIRST&searchTerm'
doc <- content(GET(url),type="text/html")
xmlValue(doc["//title"][[1]])
# [1] "All fruit | Sainsbury's"
我正在尝试抓取 sainsburys.co.uk,我是 运行 R 中的下一个代码
doc <- htmlTreeParse('http://www.sainsburys.co.uk/shop/gb/groceries/fruit-veg/all-fruit#langId=44&storeId=10151&catalogId=10122&categoryId=12545&parent_category_rn=12518&top_category=12518&pageSize=30&orderBy=FAVOURITES_FIRST&searchTerm')
rootNode <- xmlRoot(doc)
但是我有这个错误:
Error in x$children[[1]] : subscript out of bounds
我做错了什么?
您可以尝试 httr
库:
library(XML)
library(httr)
url <- 'http://www.sainsburys.co.uk/shop/gb/groceries/fruit-veg/all-fruit#langId=44&storeId=10151&catalogId=10122&categoryId=12545&parent_category_rn=12518&top_category=12518&pageSize=30&orderBy=FAVOURITES_FIRST&searchTerm'
doc <- content(GET(url),type="text/html")
xmlValue(doc["//title"][[1]])
# [1] "All fruit | Sainsbury's"