R：从 HTML 中抓取一些信息

Question

我有一个 HTML 格式的文件：

<div id='1' class='location element' style='width:100px; top:5068px; left: 3332px;'><div class='position'></div><div class='time'></div><div class='age'></div>Name</div>

我想从第一个 div（在本例中为“位置”）和名称中检索字符串。

到目前为止，我可以使用身份证号码检索姓名。

html_file%>% 
  html_nodes("#1") %>%
  html_text()

如何检索第一个字段 'class'？谢谢

Answer 1

使用html_attr:

library(rvest)
library(dplyr)
html_file%>% 
    html_nodes("#1") %>%
    html_attr("class")

[1] "location element"

注意：如果您使用 html_attrs()，您可以获取所有属性，也可以从那里获取：

library(rvest)
library(dplyr)
html_file%>% 
    html_nodes("#1") %>%
    html_attrs()

[[1]]
                                      id                                    class 
                                     "1"                       "location element" 
                                   style 
"width:100px; top:5068px; left: 3332px;"

R：从 HTML 中抓取一些信息

R: Scrape some info from an HTML

r

rvest