使用 Tweepy 过滤 Twitter 数据

Question

我使用了 Marco Bonzanini 关于挖掘 Twitter 数据的教程：https://marcobonzanini.com/2015/03/02/mining-twitter-data-with-python-part-1/

class MyListener(StreamListener):

    def on_data(self, data):
        try:
            with open('python.json', 'a') as f:
                f.write(data)
                return True
        except BaseException as e:
            print(&quot;Error on_data: %s&quot; % str(e))
        return True

    def on_error(self, status):
        print(status)
        return True

并使用过滤器方法的 "follow" 参数来检索此特定 ID 生成的推文：

twitter_stream = Stream(auth, MyListener())
twitter_stream.filter(follow=["63728193"#random Twitter ID])

然而，它似乎没有完成任务，因为它不仅 returns 由 ID 创建的推文和转推，而且还包括每条提到该 ID 的推文（即转推）。那不是我想要的。

我确定一定有办法做到这一点，因为 Twitter 提供的 json 文件中有一个 "screen_name" 字段。 screen_name 字段给出了推文创建者的姓名。我只需要找到如何过滤此 screen_neame 字段上的数据。

Answer 1

此行为是设计使然。引用 Twitter streaming API docs:

For each user specified, the stream will contain:

Tweets created by the user.

Tweets which are retweeted by the user.

Replies to any Tweet created by the user.

Retweets of any Tweet created by the user.

Manual replies, created without pressing a reply button (e.g. “@twitterapi I agree”).

为了您的目的处理它的最佳方法是在收到推文时检查是谁创建的，我相信可以按如下方式完成：

class MyListener(StreamListener):
    def on_data(self, data):
        try:
            if data._json['user']['id'] == "63728193":
                with open('python.json', 'a') as f:
                    f.write(data)
        except BaseException as e:
            print(&quot;Error on_data: %s&quot; % str(e))
        return True

    def on_error(self, status):
        print(status)
        return True

使用 Tweepy 过滤 Twitter 数据

Filtering Twitter data using Tweepy

python

twitter

tweepy

twitter-oauth