如何在循环为 运行 时获取 link?
How do I get the link while the loop is running?
library(rvest);library(tidyverse)
urls <- str_c("https://news.ycombinator.com/news?p", seq(1,2,1))
title <- urls %>%
map(
gettitle <- function(df){
read_html(df) %>%
html_nodes("a.storylink") %>%
html_text() %>%
enframe(name = NULL)
}
) %>%
bind_rows()
这将是一个只有一列的数据框。我想创建一个新列并粘贴属于每行标题的 url。
# A tibble: 6 x 2
value url
<chr> <chr>
1 1k True Fans? Try 100 https://news.ycombinator.com/news?p=1
2 FLIF – Free Lossless Image Format https://news.ycombinator.com/news?p=1
3 Critical Bluetooth Vulnerability in Android (CVE-2020-0022) https://news.ycombinator.com/news?p=1
4 The Rapid Growth of Io_uring https://news.ycombinator.com/news?p=1
5 Show HN: Building an open-source language-learning platform https://news.ycombinator.com/news?p=1
6 TV Backlight Compensation https://news.ycombinator.com/news?p=1
这是适合您的一种方法。当您遍历每一页时,您可以创建一个包含两列的数据框。 map_dfr()
绑定两个数据框。
library(rvest)
library(tidyverse)
map_dfr(.x = paste("https://news.ycombinator.com/news?p", 1:2, sep = ""),
.f = function(x){tibble(url = x,
title = read_html(x) %>%
html_nodes("a.storylink") %>%
html_text()
)})
url title
<chr> <chr>
1 https://news.ycombinator.com/news?p1 1k True Fans? Try 100
2 https://news.ycombinator.com/news?p1 Critical Bluetooth Vulnerability in Android (CVE-2020-0022)
3 https://news.ycombinator.com/news?p1 FLIF – Free Lossless Image Format
4 https://news.ycombinator.com/news?p1 The Rapid Growth of Io_uring
5 https://news.ycombinator.com/news?p1 Show HN: Building an open-source language-learning platform
6 https://news.ycombinator.com/news?p1 Why Google Might Prefer Dropping a B Business
7 https://news.ycombinator.com/news?p1 TV Backlight Compensation
8 https://news.ycombinator.com/news?p1 This person does not exist
9 https://news.ycombinator.com/news?p1 Angular 9.0
10 https://news.ycombinator.com/news?p1 Before the DNS: how yours truly upstaged the NIC's official HOSTS.TXT (2004)
如果要添加 hnuser,请再添加一列。简单来说,您可以执行以下操作。
map_dfr(.x = paste("https://news.ycombinator.com/news?p", 1:2, sep = ""),
.f = function(x){tibble(url = x,
title = read_html(x) %>%
html_nodes("a.storylink") %>%
html_text(),
hnuser = read_html(x) %>%
html_nodes("a.hnuser") %>%
html_text()
)})
library(rvest);library(tidyverse)
urls <- str_c("https://news.ycombinator.com/news?p", seq(1,2,1))
title <- urls %>%
map(
gettitle <- function(df){
read_html(df) %>%
html_nodes("a.storylink") %>%
html_text() %>%
enframe(name = NULL)
}
) %>%
bind_rows()
这将是一个只有一列的数据框。我想创建一个新列并粘贴属于每行标题的 url。
# A tibble: 6 x 2
value url
<chr> <chr>
1 1k True Fans? Try 100 https://news.ycombinator.com/news?p=1
2 FLIF – Free Lossless Image Format https://news.ycombinator.com/news?p=1
3 Critical Bluetooth Vulnerability in Android (CVE-2020-0022) https://news.ycombinator.com/news?p=1
4 The Rapid Growth of Io_uring https://news.ycombinator.com/news?p=1
5 Show HN: Building an open-source language-learning platform https://news.ycombinator.com/news?p=1
6 TV Backlight Compensation https://news.ycombinator.com/news?p=1
这是适合您的一种方法。当您遍历每一页时,您可以创建一个包含两列的数据框。 map_dfr()
绑定两个数据框。
library(rvest)
library(tidyverse)
map_dfr(.x = paste("https://news.ycombinator.com/news?p", 1:2, sep = ""),
.f = function(x){tibble(url = x,
title = read_html(x) %>%
html_nodes("a.storylink") %>%
html_text()
)})
url title
<chr> <chr>
1 https://news.ycombinator.com/news?p1 1k True Fans? Try 100
2 https://news.ycombinator.com/news?p1 Critical Bluetooth Vulnerability in Android (CVE-2020-0022)
3 https://news.ycombinator.com/news?p1 FLIF – Free Lossless Image Format
4 https://news.ycombinator.com/news?p1 The Rapid Growth of Io_uring
5 https://news.ycombinator.com/news?p1 Show HN: Building an open-source language-learning platform
6 https://news.ycombinator.com/news?p1 Why Google Might Prefer Dropping a B Business
7 https://news.ycombinator.com/news?p1 TV Backlight Compensation
8 https://news.ycombinator.com/news?p1 This person does not exist
9 https://news.ycombinator.com/news?p1 Angular 9.0
10 https://news.ycombinator.com/news?p1 Before the DNS: how yours truly upstaged the NIC's official HOSTS.TXT (2004)
如果要添加 hnuser,请再添加一列。简单来说,您可以执行以下操作。
map_dfr(.x = paste("https://news.ycombinator.com/news?p", 1:2, sep = ""),
.f = function(x){tibble(url = x,
title = read_html(x) %>%
html_nodes("a.storylink") %>%
html_text(),
hnuser = read_html(x) %>%
html_nodes("a.hnuser") %>%
html_text()
)})