Rvest 返回空值
Rvest returning null values
我正在尝试拼凑 rvest 的使用方式,我以为我明白了,但我收到的所有结果都是空的。
我使用@RonakShah 的示例 () 作为我的基本示例,并认为我会尝试扩展以收集姓名、电话phone 和每天开放的时间:
site = "https://concreteplayground.com/auckland/bars/archie-brothers-cirque-electriq"
get_phone <- function(url) {
webpage <- site %>% read_html()
name <- webpage %>% html_nodes('p.name') %>%html_text() %>% trimws()
telephone <- webpage %>% html_nodes('p.telephone') %>%html_text() %>% trimws()
monday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
tuesday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
wednesday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
thursday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
friday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
saturday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
sunday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
data.frame(telephone, monday, tuesday, wednesday, thursday, friday, saturday, sunday)
}
get_phone(site)
但是我不能让其中任何一个单独工作?我什至无法让它读取日期或错误的 phone 数字。谁能帮忙指出原因?
右键点击网页,selectInspect
查看网页的HMTL。找到您要提取的元素并使用 CSS select 或抓取它。
library(rvest)
site <- "https://concreteplayground.com/auckland/bars/archie-brothers-cirque-electriq"
get_phone <- function(url) {
webpage <- site %>% read_html()
phone <- webpage %>% html_nodes('span[itemprop="telephone"]') %>% html_text()
opening_hours <- webpage %>%
html_nodes('div.open-hours') %>%
html_attr('data-times') %>% jsonlite::fromJSON()
list(phone_number = phone, opening_hours = opening_hours)
}
get_phone(site)
#$phone_number
#[1] "+64 800 888 386"
#$opening_hours
# weekday time_from time_to
#1 1 12:00 00:00
#2 2 12:00 00:00
#3 3 12:00 00:00
#4 4 12:00 00:00
#5 5 12:00 00:00
#6 6 10:00 00:00
#7 0 10:00 00:00
营业时间存储在一个 json 文件中,这很有用,因此我们不必单独抓取它们并将它们绑定在一起。
我正在尝试拼凑 rvest 的使用方式,我以为我明白了,但我收到的所有结果都是空的。
我使用@RonakShah 的示例 (
site = "https://concreteplayground.com/auckland/bars/archie-brothers-cirque-electriq"
get_phone <- function(url) {
webpage <- site %>% read_html()
name <- webpage %>% html_nodes('p.name') %>%html_text() %>% trimws()
telephone <- webpage %>% html_nodes('p.telephone') %>%html_text() %>% trimws()
monday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
tuesday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
wednesday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
thursday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
friday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
saturday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
sunday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
data.frame(telephone, monday, tuesday, wednesday, thursday, friday, saturday, sunday)
}
get_phone(site)
但是我不能让其中任何一个单独工作?我什至无法让它读取日期或错误的 phone 数字。谁能帮忙指出原因?
右键点击网页,selectInspect
查看网页的HMTL。找到您要提取的元素并使用 CSS select 或抓取它。
library(rvest)
site <- "https://concreteplayground.com/auckland/bars/archie-brothers-cirque-electriq"
get_phone <- function(url) {
webpage <- site %>% read_html()
phone <- webpage %>% html_nodes('span[itemprop="telephone"]') %>% html_text()
opening_hours <- webpage %>%
html_nodes('div.open-hours') %>%
html_attr('data-times') %>% jsonlite::fromJSON()
list(phone_number = phone, opening_hours = opening_hours)
}
get_phone(site)
#$phone_number
#[1] "+64 800 888 386"
#$opening_hours
# weekday time_from time_to
#1 1 12:00 00:00
#2 2 12:00 00:00
#3 3 12:00 00:00
#4 4 12:00 00:00
#5 5 12:00 00:00
#6 6 10:00 00:00
#7 0 10:00 00:00
营业时间存储在一个 json 文件中,这很有用,因此我们不必单独抓取它们并将它们绑定在一起。