使用 R 抓取 SoFifa.com 时玩家的国籍 css 到 select 的什么元素?
What css element to select for Nationality of a player while scraping SoFifa.com using R?
所以我一直在尝试使用 rvest 包来抓取 SoFifa.com 玩家详细信息。加倍努力从 table 中逐列抓取。这是我卡住的地方。我无法获得玩家国籍。也许我选择了错误的 CSS 元素。尝试使用选择器小工具工具,但仍然没有成功。以下是代码。任何帮助将不胜感激!!
#Website link to be scraped with selected columns.
link= "https://sofifa.com/"
#Name of each player. This works perfectly fine as all names are retrived
Name <- link %>% read_html() %>%
html_nodes(".nowrap") %>%
html_text()
#Nationality is not displayed. While inspecting this section, I observed that title of the element < a rel=nofollow> under <div class="bp3-text-overflow-ellipsis">
needs to be selected. Need help to how to do that!!
Nationality <- link %>% read_html() %>%
html_nodes(".flag") %>%
html_text()
#Tried .flag as the selector gadget suggested it but still it doesnt retrieve the Nationality for a player
您可以组合两个属性来获得您想要的。
尝试:
#<a rel="nofollow" href="/players?na=14" title="England">...</a>
# the *= in css selectors means that attribute contains a certain text
# here is the css selecot
#.bp3-text-overflow-ellipsis a[rel="nofollow"][href*="players?"]
page <- read_html(link)
Nationality <- page %>% html_nodes('.bp3-text-overflow-ellipsis a[rel="nofollow"][href*="players?"]') %>% html_attr('title')
print(Nationality )
输出:
[1] "Italy" "England" "Togo" "France"
[5] "Ghana" "Brazil" "Norway" "Spain"
[9] "Nigeria" "Argentina" "Spain" "England"
[13] "Portugal" "England" "Denmark" "England"
[17] "Italy" "Argentina" "England" "Portugal"
[21] "Argentina" "Norway" "Brazil" "Norway"
[25] "Netherlands" "Germany" "England" "Uruguay"
[29] "United States" "Argentina" "Netherlands" "Czech Republic"
[33] "Brazil" "France" "Argentina" "Brazil"
[37] "Poland" "Brazil" "Italy" "Portugal"
[41] "Netherlands" "Netherlands" "Netherlands" "Morocco"
[45] "Argentina" "Spain" "Argentina" "France"
[49] "Netherlands" "Brazil" "Argentina" "France"
[53] "Canada" "Canada" "Switzerland" "Brazil"
[57] "Germany" "Netherlands" "Jamaica" "France"
所以我一直在尝试使用 rvest 包来抓取 SoFifa.com 玩家详细信息。加倍努力从 table 中逐列抓取。这是我卡住的地方。我无法获得玩家国籍。也许我选择了错误的 CSS 元素。尝试使用选择器小工具工具,但仍然没有成功。以下是代码。任何帮助将不胜感激!!
#Website link to be scraped with selected columns.
link= "https://sofifa.com/"
#Name of each player. This works perfectly fine as all names are retrived
Name <- link %>% read_html() %>%
html_nodes(".nowrap") %>%
html_text()
#Nationality is not displayed. While inspecting this section, I observed that title of the element < a rel=nofollow> under <div class="bp3-text-overflow-ellipsis">
needs to be selected. Need help to how to do that!!
Nationality <- link %>% read_html() %>%
html_nodes(".flag") %>%
html_text()
#Tried .flag as the selector gadget suggested it but still it doesnt retrieve the Nationality for a player
您可以组合两个属性来获得您想要的。 尝试:
#<a rel="nofollow" href="/players?na=14" title="England">...</a>
# the *= in css selectors means that attribute contains a certain text
# here is the css selecot
#.bp3-text-overflow-ellipsis a[rel="nofollow"][href*="players?"]
page <- read_html(link)
Nationality <- page %>% html_nodes('.bp3-text-overflow-ellipsis a[rel="nofollow"][href*="players?"]') %>% html_attr('title')
print(Nationality )
输出:
[1] "Italy" "England" "Togo" "France"
[5] "Ghana" "Brazil" "Norway" "Spain"
[9] "Nigeria" "Argentina" "Spain" "England"
[13] "Portugal" "England" "Denmark" "England"
[17] "Italy" "Argentina" "England" "Portugal"
[21] "Argentina" "Norway" "Brazil" "Norway"
[25] "Netherlands" "Germany" "England" "Uruguay"
[29] "United States" "Argentina" "Netherlands" "Czech Republic"
[33] "Brazil" "France" "Argentina" "Brazil"
[37] "Poland" "Brazil" "Italy" "Portugal"
[41] "Netherlands" "Netherlands" "Netherlands" "Morocco"
[45] "Argentina" "Spain" "Argentina" "France"
[49] "Netherlands" "Brazil" "Argentina" "France"
[53] "Canada" "Canada" "Switzerland" "Brazil"
[57] "Germany" "Netherlands" "Jamaica" "France"