如何在 python 的特定日期从用户那里提取推文?

How do I pull tweets from a user for specific dates on python?

我正在尝试从路透社 (@reuters) 推特帐户下载 2019 年 11 月的推文。

我在 python 上使用 tweepy,这是我的代码:

pip install tweepy
import tweepy as tw

#Keys
consumer_key = "..."
consumer_secret = "..."
access_token = "..."
access_token_secret = "..."

# Login
auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)

#Get user's tweets
tweets = tw.Cursor(api.user_timeline,
                   id="reuters",
                   lang="en",
                   since="2019-11-01",
                   until="2019-11-30").items()

all_tweets = [tweet.text for tweet in tweets]

all_tweets[:100]

"until" 参数似乎不起作用,因为我的代码提取的推文包含最新的推文。

tweepy 库只支持 Twitter 的旧 standard search API at this time, and the standard search only covers 7 days of history. In order to search as far back as November 2019, you would need to use either the premium full-archive search API, or the enterprise full-archive search. These APIs are both commercial, but the premium API has a free tier called "sandbox" that would also work. In Python, you could use the search-tweets library

另一个答案中提到的时间线方法也是一种选择,但这取决于从 11 月开始的推文是否在时间线范围内 API,它支持从今天开始的最多 3200 条推文。

下面是我们可以提取特定持续时间和特定用户的推文的两种简单方法。 解决方案 1:使用 TwitterAPI。 正如 andy_piper 所述,您需要高级或沙盒访问权限,高级帐户太贵了。在您不从 Twitter 中提取大量语料库之前,拥有免费的沙箱帐户绰绰有余。您可以简单地使用 https://developer.twitter.com/en/pricing/aaa-all 启用沙箱帐户,这将使您能够访问有限数量的请求。

创建链接到您的 Twitter 帐户的开发环境标签:转到您的 Twitter 帐户中的开发环境并为沙箱创建相应的标签。 配置标签后。下面的代码将提取相应的推文。(相应地更改 maxResults)

from TwitterAPI import TwitterAPI
Product = 'fullarchive'
label = 'Dev'
api = TwitterAPI(consumer_key, consumer_secret, access_token, access_token_secret)
tweets = api.request('tweets/search/%s/:%s' % (Product, label),
{'query' : 'from:reuters', 'maxResults': '10', 'fromDate':'201911010000', 'toDate':'201911300000'}) 

for tweet in tweets:
  print(tweet['id'])

解决方案 2:使用 GetOldTweet3 api,我不喜欢这种方式,因为不确定许可证,但即使没有 Twitter 开发者帐户也能正常工作,但对 Twitter 的隐私政策有点怀疑,无论如何,这是代码。

import GetOldTweets3 as got
username = 'reuters'
count = 100
tweetCriteria = got.manager.TweetCriteria().setUsername(username)\
                                    .setMaxTweets(count).setSince("2019-11-01")\
                                       .setUntil("2019-11-30")\
tweets = got.manager.TweetManager.getTweets(tweetCriteria)
for tweet in tweets:
  print(tweet.id,tweet.author_id,tweet.date)

参考: https://pypi.org/project/GetOldTweets3/ https://github.com/geduldig/TwitterAPI/blob/master/examples/premium_search.py

我有答案。不付费就无法做到这一点。

import tweepy
import csv
import pandas as pd
####input your credentials here
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)

# Open/Create a file to append data
csvFile = open('tweets.csv', 'a')
#Use csv Writer
csvWriter = csv.writer(csvFile)

# tracklist = ["Womens Day", "internationalwomensday", "internationalwomensday2021", "internationalwomensday21","women's day", "international women's day", "IWD", "womensday", "WomensDay", "HappyInternationalWomensDay","Happy Women's Day", "HappyWomensDay", "happywomensday", "happyinternationalwomensday", "Women", "women"]
# tracklist = ''.join(str(e) for e in tracklist)
# import pdb; pdb.set_trace()
count = 0

# for tweet in tweepy.Cursor(api.search,q="Womens Day OR internationalwomensday OR internationalwomensday2021 OR internationalwomensday21 OR women's day OR international women's day OR IWD or womensday OR WomensDay OR HappyInternationalWomensDay OR Happy Women's Day OR HappyWomensDay OR happywomensday OR happyinternationalwomensday OR Women OR women",count=10000,
#                            lang="en",
#                            since="2021-03-06", 
#                            include_rts=False).items():
#     print (tweet.created_at, tweet.text)
#     csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])


for tweet in tweepy.Cursor(api.search,q="Womens Day OR internationalwomensday OR internationalwomensday2021 OR internationalwomensday21 OR women's day OR international women's day OR IWD OR HappyInternationalWomensDay OR Happy Women's Day OR HappyWomensDay OR happywomensday OR happyinternationalwomensday",
                           count=100000,
                           include_rts=False,
                           country_code=True,
                           coordinates=True,
                           lang="en",
                           since="2021-03-06",
                           until="2021-03-10"
                           ).items():
    print (tweet.created_at, tweet.text)
    csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])