如何在循环为运行时获取 link？

Question

library(rvest);library(tidyverse)
urls <- str_c("https://news.ycombinator.com/news?p", seq(1,2,1))

    title <- urls %>% 
      map(
        gettitle <- function(df){
          read_html(df) %>% 
            html_nodes("a.storylink") %>% 
            html_text() %>% 
            enframe(name = NULL)
        }
      ) %>%  
      bind_rows()

这将是一个只有一列的数据框。我想创建一个新列并粘贴属于每行标题的 url。

# A tibble: 6 x 2
  value                                                       url                                  
  <chr>                                                       <chr>                                
1 1k True Fans? Try 100                                       https://news.ycombinator.com/news?p=1
2 FLIF – Free Lossless Image Format                           https://news.ycombinator.com/news?p=1
3 Critical Bluetooth Vulnerability in Android (CVE-2020-0022) https://news.ycombinator.com/news?p=1
4 The Rapid Growth of Io_uring                                https://news.ycombinator.com/news?p=1
5 Show HN: Building an open-source language-learning platform https://news.ycombinator.com/news?p=1
6 TV Backlight Compensation                                   https://news.ycombinator.com/news?p=1

Answer 1

这是适合您的一种方法。当您遍历每一页时，您可以创建一个包含两列的数据框。 map_dfr()绑定两个数据框。

library(rvest)
library(tidyverse)

map_dfr(.x = paste("https://news.ycombinator.com/news?p", 1:2, sep = ""),
        .f = function(x){tibble(url = x,
                                title = read_html(x) %>% 
                                        html_nodes("a.storylink") %>% 
                                        html_text()
                            )})

   url                                  title                                                                       
   <chr>                                <chr>                                                                       
 1 https://news.ycombinator.com/news?p1 1k True Fans? Try 100                                                       
 2 https://news.ycombinator.com/news?p1 Critical Bluetooth Vulnerability in Android (CVE-2020-0022)                 
 3 https://news.ycombinator.com/news?p1 FLIF – Free Lossless Image Format                                           
 4 https://news.ycombinator.com/news?p1 The Rapid Growth of Io_uring                                                
 5 https://news.ycombinator.com/news?p1 Show HN: Building an open-source language-learning platform                 
 6 https://news.ycombinator.com/news?p1 Why Google Might Prefer Dropping a B Business                            
 7 https://news.ycombinator.com/news?p1 TV Backlight Compensation                                                   
 8 https://news.ycombinator.com/news?p1 This person does not exist                                                  
 9 https://news.ycombinator.com/news?p1 Angular 9.0                                                                 
10 https://news.ycombinator.com/news?p1 Before the DNS: how yours truly upstaged the NIC's official HOSTS.TXT (2004)

如果要添加 hnuser，请再添加一列。简单来说，您可以执行以下操作。

map_dfr(.x = paste("https://news.ycombinator.com/news?p", 1:2, sep = ""),
        .f = function(x){tibble(url = x,
                                title = read_html(x) %>% 
                                        html_nodes("a.storylink") %>% 
                                        html_text(),
                                hnuser = read_html(x) %>% 
                                        html_nodes("a.hnuser") %>% 
                                        html_text()
                            )})

如何在循环为运行时获取 link？

How do I get the link while the loop is running?

r

web-scraping

rvest

如何在循环为 运行 时获取 link？

How do I get the link while the loop is running?

r

web-scraping

rvest

如何在循环为运行时获取 link？