如何获取具有给定主题标签位置的推文

How to get tweets with given position of hashtag

我尝试只获取带有 #not 标签的推文,但只有当标签位于推文末尾且不在文本中时。我正在使用 tweepy.Cursor

此代码已经有效。它为我提供带有#not 的推文,但不关心#not 的位置。

import tweepy
consumer_key = 'consumer key'
consumer_secret = 'consumer secret'
access_token = 'access token'
access_token_secret = 'access token secret'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)

for tweet in tweepy.Cursor(api.search,q="#not",count=5,
                           lang="en",
                           since="2017-04-03").items():
    print (tweet.created_at, tweet.text)

编辑:您可以使用正则表达式来检查您的主题标签是否在一组尾随的主题标签中:

import tweepy
import re

consumer_key = 'consumer key'
consumer_secret = 'consumer secret'
access_token = 'access token'
access_token_secret = 'access token secret'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)

# Regular expression to check if tweet ends with our hashtag and maybe more hashtags
rgx = re.compile(r"#not(\s+#\w+)*$", re.IGNORECASE)
for tweet in tweepy.Cursor(api.search,q="#not",count=5,
                           lang="en",
                           since="2017-04-03").items():
    # Keep only tweets with the hashtag at the end
    if rgx.search(tweet.text):
        print (tweet.created_at, tweet.text)

您可以过滤推文以仅保留符合您要求的推文:

import tweepy
consumer_key = 'consumer key'
consumer_secret = 'consumer secret'
access_token = 'access token'
access_token_secret = 'access token secret'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)

for tweet in tweepy.Cursor(api.search,q="#not",count=5,
                           lang="en",
                           since="2017-04-03").items():
    # Keep only tweets with the hashtag at the end
    if tweet.text.lower().endswith('#not'):
        print (tweet.created_at, tweet.text)