rvest 中 html_text 返回的单独字符串

Separate strings returned by html_text in rvest

我正在尝试使用 rvest 提取酒店的便利设施。

library(rvest)
hotel_url="https://www.tripadvisor.com/Hotel_Review-g187791-d13494726-Reviews-Palazzo_Caruso-Rome_Lazio.html"
amenities<-hotel%>%
    html_node(".hotels-hr-about-amenities-AmenityGroup__amenitiesList--3MdFn")%>%
    html_text()

生成的文本不会将一种设施与另一种设施分开:

[1] "Paid private parking nearbyFree High Speed Internet (WiFi)Coffee shopBicycle toursWalking toursCar hireFax / photocopyingBaggage storageFree internetWifiPublic wifiInternetBreakfast availableBreakfast in the roomConciergeExecutive lounge accessNon-smoking hotelSun terrace24-hour front deskPrivate check-in / check-outLaundry service"

有什么方法可以在便利设施之间添加分隔符(例如“;”)?

您需要在 html 结构中深入一到两层才能将文本作为列表拉出。可以使用 html_children() 函数来做到这一点。
详情见评论:

library(rvest)
hotel_url="https://www.tripadvisor.com/Hotel_Review-g187791-d13494726-Reviews-
Palazzo_Caruso-Rome_Lazio.html"
hotel<-read_html(hotel_url)

amenities<-hotel%>%
  html_node(".hotels-hr-about-amenities-AmenityGroup__amenitiesList--3MdFn")%>% 
       html_children()

#last child node is the unhighlighted amenities
#get text for highlighted amenities
highlighted<-amenities[xml_length(amenities)==1] %>% html_text()
#drill down 1 more level for unhighlighted amenities
unhighlighted<-amenities[xml_length(amenities)>1] %>% html_children() %>% html_text()



> highlighted
[1] "Paid private parking nearby"     "Free High Speed Internet (WiFi)" "Coffee shop"                     "Bicycle tours"                  
[5] "Walking tours"                   "Car hire"                        "Fax / photocopying"              "Baggage storage"                
> unhighlighted
 [1] "Free internet"                "Wifi"                         "Public wifi"                  "Internet"                    
 [5] "Breakfast available"          "Breakfast in the room"        "Concierge"                    "Executive lounge access"     
 [9] "Non-smoking hotel"            "Sun terrace"                  "24-hour front desk"           "Private check-in / check-out"
[13] "Laundry service"