Tweepy:获取 Twitter 帐户样本的所有朋友:如何处理受保护的用户
Tweepy: get all friends of a sample of twitter accounts: how to handle protected users
我想查找一个 Twitter 帐户的朋友样本的所有朋友(指一个正在关注的 Twitter 用户),看看他们还有哪些其他共同朋友。问题是我不知道如何处理受保护的帐户,我一直 运行 进入这个错误:
tweepy.error.TweepError: Not authorized.
这是我的代码:
...
screen_name = ----
file_name = "followers_data/follower_ids-" + screen_name + ".txt"
with open(file_name) as file:
ids = file.readlines()
num_samples = 30
ids = [x.strip() for x in ids]
friends = [[] for i in range(num_samples)]
for i in range(0, num_samples):
id = random.choice(ids)
for friend in tweepy.Cursor(api.friends_ids, id).items():
print(friend)
friends[i].append(friend)
我有一个帐户 screen_name
中所有朋友的列表,我从中加载朋友 ID。然后我想抽取其中的一些并查找他们的朋友。
我也试过这样的:
def limit_handled(cursor, name):
try:
yield cursor.next()
except tweepy.TweepError:
print("Something went wrong... ", name)
pass
for i in range(0, num_samples):
id = random.choice(ids)
items = tweepy.Cursor(api.friends_ids, id).items()
for friend in limit_handled(items, id):
print(friend)
friends[i].append(friend)
但是在继续下一个样本之前,每个样本朋友似乎只存储了一个朋友。我是 Python 和 Tweepy 的新手,所以如果有任何奇怪的地方,请告诉我。
首先,关于命名的几点意见。名称 file
和 id
受到保护,因此您应该避免使用它们来命名变量 - 我已经更改了这些。
其次,当您初始化 tweepy API 时,如果您使用 wait_on_rate_limit=True
,它会很聪明地处理速率限制,如果您使用 [=],它会在由于速率限制而延迟时通知您14=].
当您设置 friends = [[] for i in range(num_samples)]
时,您也会丢失一些信息,因为您将无法将找到的朋友与他们相关的帐户相关联。您可以改用字典,它将使用的每个 ID 与找到的朋友相关联,以便更好地处理。
我修改后的代码如下:
import tweepy
import random
consumer_key = '...'
consumer_secret = '...'
access_token = '...'
access_token_secret = '...'
# OAuth process, using the keys and tokens
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
# Creation of the actual interface, using authentication. Use rate limits.
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
screen_name = '----'
file_name = "followers_data/follower_ids-" + screen_name + ".txt"
with open(file_name) as f:
ids = [x.strip() for x in f.readlines()]
num_samples = 30
friends = dict()
# Initialise i
i = 0
# We want to check that i is less than our number of samples, but we also need to make
# sure there are IDs left to choose from.
while i <= num_samples and ids:
current_id = random.choice(ids)
# remove the ID we're testing from the list, so we don't pick it again.
ids.remove(current_id)
try:
# try to get friends, and add them to our dictionary value if we can
# use .get() to cope with the first loop.
for page in tweepy.Cursor(api.friends_ids, current_id).pages():
friends[current_id] = friends.get(current_id, []) + page
i += 1
except tweepy.TweepError:
# we get a tweep error when we can't view a user - skip them and move onto the next.
# don't increment i as we want to replace this user with someone else.
print 'Could not view user {}, skipping...'.format(current_id)
输出是一个字典,friends
,每个用户的用户 ID 和朋友项目的键。
我想查找一个 Twitter 帐户的朋友样本的所有朋友(指一个正在关注的 Twitter 用户),看看他们还有哪些其他共同朋友。问题是我不知道如何处理受保护的帐户,我一直 运行 进入这个错误:
tweepy.error.TweepError: Not authorized.
这是我的代码:
...
screen_name = ----
file_name = "followers_data/follower_ids-" + screen_name + ".txt"
with open(file_name) as file:
ids = file.readlines()
num_samples = 30
ids = [x.strip() for x in ids]
friends = [[] for i in range(num_samples)]
for i in range(0, num_samples):
id = random.choice(ids)
for friend in tweepy.Cursor(api.friends_ids, id).items():
print(friend)
friends[i].append(friend)
我有一个帐户 screen_name
中所有朋友的列表,我从中加载朋友 ID。然后我想抽取其中的一些并查找他们的朋友。
我也试过这样的:
def limit_handled(cursor, name):
try:
yield cursor.next()
except tweepy.TweepError:
print("Something went wrong... ", name)
pass
for i in range(0, num_samples):
id = random.choice(ids)
items = tweepy.Cursor(api.friends_ids, id).items()
for friend in limit_handled(items, id):
print(friend)
friends[i].append(friend)
但是在继续下一个样本之前,每个样本朋友似乎只存储了一个朋友。我是 Python 和 Tweepy 的新手,所以如果有任何奇怪的地方,请告诉我。
首先,关于命名的几点意见。名称 file
和 id
受到保护,因此您应该避免使用它们来命名变量 - 我已经更改了这些。
其次,当您初始化 tweepy API 时,如果您使用 wait_on_rate_limit=True
,它会很聪明地处理速率限制,如果您使用 [=],它会在由于速率限制而延迟时通知您14=].
当您设置 friends = [[] for i in range(num_samples)]
时,您也会丢失一些信息,因为您将无法将找到的朋友与他们相关的帐户相关联。您可以改用字典,它将使用的每个 ID 与找到的朋友相关联,以便更好地处理。
我修改后的代码如下:
import tweepy
import random
consumer_key = '...'
consumer_secret = '...'
access_token = '...'
access_token_secret = '...'
# OAuth process, using the keys and tokens
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
# Creation of the actual interface, using authentication. Use rate limits.
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
screen_name = '----'
file_name = "followers_data/follower_ids-" + screen_name + ".txt"
with open(file_name) as f:
ids = [x.strip() for x in f.readlines()]
num_samples = 30
friends = dict()
# Initialise i
i = 0
# We want to check that i is less than our number of samples, but we also need to make
# sure there are IDs left to choose from.
while i <= num_samples and ids:
current_id = random.choice(ids)
# remove the ID we're testing from the list, so we don't pick it again.
ids.remove(current_id)
try:
# try to get friends, and add them to our dictionary value if we can
# use .get() to cope with the first loop.
for page in tweepy.Cursor(api.friends_ids, current_id).pages():
friends[current_id] = friends.get(current_id, []) + page
i += 1
except tweepy.TweepError:
# we get a tweep error when we can't view a user - skip them and move onto the next.
# don't increment i as we want to replace this user with someone else.
print 'Could not view user {}, skipping...'.format(current_id)
输出是一个字典,friends
,每个用户的用户 ID 和朋友项目的键。