Rvest 返回空值

Rvest returning null values

我正在尝试拼凑 rvest 的使用方式,我以为我明白了,但我收到的所有结果都是空的。

我使用@RonakShah 的示例 () 作为我的基本示例,并认为我会尝试扩展以收集姓名、电话phone 和每天开放的时间:

site = "https://concreteplayground.com/auckland/bars/archie-brothers-cirque-electriq"

get_phone <- function(url) {
  webpage <- site %>% read_html()
name <- webpage %>% html_nodes('p.name') %>%html_text() %>% trimws()
  telephone <- webpage %>% html_nodes('p.telephone') %>%html_text() %>% trimws()
  monday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  tuesday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  wednesday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  thursday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  friday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  saturday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  sunday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  data.frame(telephone, monday, tuesday, wednesday, thursday, friday, saturday, sunday)
}

get_phone(site)

但是我不能让其中任何一个单独工作?我什至无法让它读取日期或错误的 phone 数字。谁能帮忙指出原因?

右键点击网页,selectInspect查看网页的HMTL。找到您要提取的元素并使用 CSS select 或抓取它。

library(rvest)
site <- "https://concreteplayground.com/auckland/bars/archie-brothers-cirque-electriq"

get_phone <- function(url) {
  webpage <- site %>% read_html()
  phone <- webpage %>% html_nodes('span[itemprop="telephone"]') %>% html_text()
  opening_hours <- webpage %>% 
                    html_nodes('div.open-hours') %>% 
                    html_attr('data-times') %>% jsonlite::fromJSON()
  list(phone_number = phone, opening_hours = opening_hours)
}

get_phone(site)


#$phone_number
#[1] "+64 800 888 386"

#$opening_hours
#  weekday time_from time_to
#1       1     12:00   00:00
#2       2     12:00   00:00
#3       3     12:00   00:00
#4       4     12:00   00:00
#5       5     12:00   00:00
#6       6     10:00   00:00
#7       0     10:00   00:00

营业时间存储在一个 json 文件中,这很有用,因此我们不必单独抓取它们并将它们绑定在一起。