Tweepy:获取 Twitter 帐户样本的所有朋友:如何处理受保护的用户

Tweepy: get all friends of a sample of twitter accounts: how to handle protected users

我想查找一个 Twitter 帐户的朋友样本的所有朋友(指一个正在关注的 Twitter 用户),看看他们还有哪些其他共同朋友。问题是我不知道如何处理受保护的帐户,我一直 运行 进入这个错误:

tweepy.error.TweepError: Not authorized.

这是我的代码:

...
screen_name = ----
file_name = "followers_data/follower_ids-" + screen_name + ".txt"
with open(file_name) as file:
ids = file.readlines()

num_samples = 30
ids = [x.strip() for x in ids]
friends = [[] for i in range(num_samples)]

for i in range(0, num_samples):
    id = random.choice(ids)
    for friend in tweepy.Cursor(api.friends_ids, id).items():
        print(friend)
        friends[i].append(friend)

我有一个帐户 screen_name 中所有朋友的列表,我从中加载朋友 ID。然后我想抽取其中的一些并查找他们的朋友。

我也试过这样的:

def limit_handled(cursor, name):
    try:
        yield cursor.next()
    except tweepy.TweepError:
        print("Something went wrong... ", name)
        pass

for i in range(0, num_samples):
    id = random.choice(ids)
    items = tweepy.Cursor(api.friends_ids, id).items()
    for friend in limit_handled(items, id):
        print(friend)
        friends[i].append(friend)

但是在继续下一个样本之前,每个样本朋友似乎只存储了一个朋友。我是 Python 和 Tweepy 的新手,所以如果有任何奇怪的地方,请告诉我。

首先,关于命名的几点意见。名称 fileid 受到保护,因此您应该避免使用它们来命名变量 - 我已经更改了这些。

其次,当您初始化 tweepy API 时,如果您使用 wait_on_rate_limit=True,它会很聪明地处理速率限制,如果您使用 [=],它会在由于速率限制而延迟时通知您14=].

当您设置 friends = [[] for i in range(num_samples)] 时,您也会丢失一些信息,因为您将无法将找到的朋友与他们相关的帐户相关联。您可以改用字典,它将使用的每个 ID 与找到的朋友相关联,以便更好地处理。

我修改后的代码如下:

import tweepy
import random

consumer_key = '...'
consumer_secret = '...'
access_token = '...'
access_token_secret = '...'

# OAuth process, using the keys and tokens
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

# Creation of the actual interface, using authentication. Use rate limits.
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

screen_name = '----'
file_name = "followers_data/follower_ids-" + screen_name + ".txt"
with open(file_name) as f:
    ids = [x.strip() for x in f.readlines()]

num_samples = 30
friends = dict()

# Initialise i
i = 0

# We want to check that i is less than our number of samples, but we also need to make
# sure there are IDs left to choose from.
while i <= num_samples and ids:
    current_id = random.choice(ids)

    # remove the ID we're testing from the list, so we don't pick it again.
    ids.remove(current_id)

    try:
        # try to get friends, and add them to our dictionary value if we can
        # use .get() to cope with the first loop.
        for page in tweepy.Cursor(api.friends_ids, current_id).pages():
            friends[current_id] = friends.get(current_id, []) + page
        i += 1
    except tweepy.TweepError:
        # we get a tweep error when we can't view a user - skip them and move onto the next.
        # don't increment i as we want to replace this user with someone else.
        print 'Could not view user {}, skipping...'.format(current_id)

输出是一个字典,friends,每个用户的用户 ID 和朋友项目的键。