从推特列表中获取推特屏幕名称

Obtaining twitter screen names from a twitter list

我很想使用 R 从特定的推特列表中获取用户名和全名的列表。 我在任何包中都看不到函数,但这段代码有效

library(XML)
library(httr)


url.name <- "https://twitter.com/TwitterUK/lists/premier-league-players/members"
url.get=GET(url.name)
url.content=content(url.get, as="text")
pagehtml <- htmlParse(url.content)

screenNames <-xpathSApply(pagehtml, '//*/span[@class="username js-action-profile-name"]',xmlValue)
realName <- xpathSApply(pagehtml, '//*/strong[@class="fullname js-action-profile-name"]',xmlValue)

但是,它只提供前 20 个值(?屏幕上显示的内容),而列表要长得多

如果有rvest解决方案,也欢迎

干杯

如果你想使用 R 和 Twitter,你应该看看 twitteR package. It doesn't have a function to retrieve the information you want, but we can take advantage of its internal functions to use OAuth, and then send the correct API call。使用 API 调用的优点是您不依赖于解析 HTML 页面,您实际上是在做开发人员应该做的事情。

下面的代码假定您已经使用 setup_twitter_oauth() 进行了身份验证,您可以很容易地找到这方面的教程,因为它是包的基础知识。一旦通过身份验证,让我们加载我们需要的包:

library(rjson)
library(httr)
# library(twitteR) Should have been loaded already of course

现在要执行 API 调用,我们将使用 POST。 URL 有一个 slug 参数,它是 Twitter 列表名称,还有一个 owner_screen_name 参数,它是列表的 Twitter 帐户所有者。我们将使用内部 twitteR:::get_oauth_sig() 来验证调用。

twlist <- "premier-league-players"
twowner <- "TwitterUK"
api.url <- paste0("https://api.twitter.com/1.1/lists/members.json?slug=",
           twlist, "&owner_screen_name=", twowner, "&count=5000")
response <- POST(api.url, config(token=twitteR:::get_oauth_sig()))
#Count = 5000 is the number of names per result page,
#        which for this case simplifies things to one page.

这个 returns 一个 JSON 响应,我们可以使用 fromJSON:

阅读
response.list <- fromJSON(content(response, as = "text", encoding = "UTF-8"))

现在,我们有了一个列表,其中每个元素都是一个 Twitter 列表成员的 Twitter 数据。提取他们的名字和 user_names:

users.names <- sapply(response.list$users, function(i) i$name)
users.screennames <- sapply(response.list$users, function(i) i$screen_name)

它们是:

> head(users.names)
[1] "Peter Crouch"         "barry bannan"         "Jose Leonardo Ulloa "
    "Paul McShane"         "nacho monreal"        "James Ward-Prowse"
> head(users.screennames)
[1] "petercrouch"   "bazzabannan25" "Ciclone1923"   "pmacca15"
    "_nachomonreal" "Prowsey16"

现在这段代码最好的部分是它打开了来自 R 的几乎整个推特API,作为一个已经过身份验证的请求。您可以查看响应列表和子列表,了解每个查询的所有可用信息。

Molx 的解决方案似乎不再有效。问题似乎出在

api.url <- paste0("https://api.twitter.com/1.1/lists/members.json?slug=",
           twlist, "&owner_screen_name=", twowner, "&count=5000")

对于我尝试过的任何 twlist 或 twwner,此 URL 似乎无效。 编辑:问题来自我认为得到的身份验证

{"errors":[{"code":215,"message":"Bad Authentication data."}]}

我想我已经通过此认证

## Twitter authentication, 
consumer_key = "xxxxx"
consumer_secret = "xxx"
access_token = "xxxxx"
access_secret = "xxx"
setup_twitter_oauth(consumer_key, consumer_secret, access_token,
access_secret)

问题出在哪里?

编辑:当我输入 get_oauth_sig() 时,我得到以下结果

> twitteR:::get_oauth_sig()
<Token>
NULL
<oauth_app> twitter
  key:    XXXXXXX
  secret: <hidden>
<credentials> oauth_token, oauth_token_secret
---

这正常吗?

Molx 的解决方案似乎不再有效。问题似乎出在

api.url <- paste0("https://api.twitter.com/1.1/lists/members.json?slug=",
           twlist, "&owner_screen_name=", twowner, "&count=5000")

对于我尝试过的任何 twlist 或 twwner,此 URL 似乎无效。 编辑:问题来自我认为得到的身份验证

{"errors":[{"code":215,"message":"Bad Authentication data."}]}

我想我已经通过此认证

## Twitter authentication, 
consumer_key = "xxxxx"
consumer_secret = "xxx"
access_token = "xxxxx"
access_secret = "xxx"
setup_twitter_oauth(consumer_key, consumer_secret, access_token,
access_secret)

问题出在哪里?

编辑:当我输入 get_oauth_sig() 时,我得到以下结果

> twitteR:::get_oauth_sig()
<Token>
NULL
<oauth_app> twitter
  key:    XXXXXXX
  secret: <hidden>
<credentials> oauth_token, oauth_token_secret
---

这正常吗?

编辑:我通过用 GET

替换 POST 来解决问题
library(rjson)
library(twitteR)
consumer_key = "xxxxx"
consumer_secret = "xxx"
access_token = "xxxxx"
access_secret = "xxx"
setup_twitter_oauth(consumer_key, consumer_secret, access_token,
access_secret)
https://twitter.com/ivalerio/lists/justice?lang=fr
twlist <- "d-put-s-2017-2022"
twowner <- "ivalerio"
api.url <- paste0("https://api.twitter.com/1.1/lists/members.json?slug=",
           twlist, "&owner_screen_name=", twowner, "&count=5000")
response <- GET(api.url, config(token=twitteR:::get_oauth_sig()))
#Count = 5000 is the number of names per result page,
#        which for this case simplifies things to one page.
# This returns a JSON response which we can read using fromJSON:
response.list <- fromJSON(content(response, as = "text", encoding = "UTF-8"))
# Now, we have a list where each element is the Twitter data of one Twitter-list member. To extract their names and user_names:
users.names <- sapply(response.list$users, function(i) i$name)
users.screennames <- sapply(response.list$users, function(i) i$screen_name)
# Which are:
head(users.names)