路透社使用 rvest 在 R 中抓取数据,找到 CSS 选择器

reuters data scraping in R with rvest, find CSS selector

是的,我知道有类似的问题,我已经阅读了答案并尝试了我可以实现的答案。所以,如果这个问题很愚蠢,请提前道歉:)

我正在从路透社抓取公司董事会成员的年龄以获取公司名单。 这是 link:http://www.reuters.com/finance/stocks/companyOfficers?symbol=MSFT

我正在使用 rvest 库和 selectorgadget 来查找合适的 CSS selector。 这是代码:

library(rvest)
d = read_html("http://www.reuters.com/finance/stocks/companyOfficers?symbol=GAZP.RTS")

d %>% html_nodes("#companyNews:nth-child(1) td:nth-child(2)") %>% html_text()

结果是

character(0)

我觉得我说错了CSSselect或者。你能告诉我如何 select table 吗?

您需要使用 html_session 才能正确加载数据:

library(rvest)

url <- 'http://www.reuters.com/finance/stocks/companyOfficers?symbol=MSFT.O'
site <- html_session(url) %>% read_html()

site %>% html_node('#companyNews:first-child table') %>% html_table()

##                     Name Age Since                                  Current Position
## 1          John Thompson  66  2014                 Independent Chairman of the Board
## 2         Bradford Smith  57  2015                    President, Chief Legal Officer
## 3          Satya Nadella  48  2014                 Chief Executive Officer, Director
## 4          William Gates  60  2014          Founder and Technology Advisor, Director
## 5               Amy Hood  43  2013 Chief Financial Officer, Executive Vice President
## 6  Christopher Capossela  45  2014 Executive Vice President, Chief Marketing Officer
## 7         Kathleen Hogan  49  2014        Executive Vice President - Human Resources
## 8       Margaret Johnson  54  2014   Executive Vice President - Business Development
## 9           Ifeanyi Amah  NA  2016                          Chief Technology Officer
## 10         Keith Lorizio  NA  2016              Vice President - North America Sales
## 11       Teri List-Stoll  53  2014                              Independent Director
## 12       G. Mason Morfit  40  2014                              Independent Director
## 13         Charles Noski  63  2003                              Independent Director
## 14          Helmut Panke  69  2003                              Independent Director
## 15        Charles Scharf  50  2014                              Independent Director
## 16          John Stanton  60  2014                              Independent Director
## 17             Chris Suh  NA    NA              General Manager - Investor Relations