我该如何解决这个网页抓取问题
How do I resolve this web scraping issue
我正在尝试用 R 抓取网页。
page <- read_html("https://www.imdb.com/chart/top/")
header_nodes <- html_nodes(page, css = ".titleColumn a" )
rating_nodes <- html_nodes(page, css = "strong")
我正在尝试提取电影名称和评级,但出现此错误:
Error in inDL(x, as.logical(local), as.logical(now), ...) : ICU init failed: U_FILE_ACCESS_ERROR
尝试使用这个:
library(rvest)
url <- 'https://www.imdb.com/chart/top/'
webpage <- url %>% read_html()
title <- webpage %>% html_nodes('td.titleColumn a') %>% html_text()
title
#[1] "The Shawshank Redemption"
#[2] "The Godfather"
#[3] "The Godfather: Part II"
#[4] "The Dark Knight"
#[5] "12 Angry Men"
#[6] "Schindler's List"
#[7] "The Lord of the Rings: The Return of the King"
#...
获得评分:
ratings <- webpage %>%
html_nodes('td.ratingColumn strong') %>%
html_text() %>% as.numeric()
ratings
#[1] 9.2 9.1 9.0 9.0 8.9 8.9 8.9 .....
我正在尝试用 R 抓取网页。
page <- read_html("https://www.imdb.com/chart/top/")
header_nodes <- html_nodes(page, css = ".titleColumn a" )
rating_nodes <- html_nodes(page, css = "strong")
我正在尝试提取电影名称和评级,但出现此错误:
Error in inDL(x, as.logical(local), as.logical(now), ...) : ICU init failed: U_FILE_ACCESS_ERROR
尝试使用这个:
library(rvest)
url <- 'https://www.imdb.com/chart/top/'
webpage <- url %>% read_html()
title <- webpage %>% html_nodes('td.titleColumn a') %>% html_text()
title
#[1] "The Shawshank Redemption"
#[2] "The Godfather"
#[3] "The Godfather: Part II"
#[4] "The Dark Knight"
#[5] "12 Angry Men"
#[6] "Schindler's List"
#[7] "The Lord of the Rings: The Return of the King"
#...
获得评分:
ratings <- webpage %>%
html_nodes('td.ratingColumn strong') %>%
html_text() %>% as.numeric()
ratings
#[1] 9.2 9.1 9.0 9.0 8.9 8.9 8.9 .....