Twitter 抓取旧推文

Twitter scraping of older tweets

我正在做一个项目,我需要从推特上获取推文,我使用推特 API 但它只提供 7-9 天前的推文,但我想要几个月前的推文作为出色地。所以我决定使用 Beautifulsoup 和后来的 selenium 来抓取 Twitter,但在解析时它不会返回元素,而是返回整个网页的 veiwsource。请帮忙!!

import requests
from bs4 import Beautifulsoup
f=requests.get("https://twitter.com/search?q=%23......%20until%3A2020-02-07%20since%3A2020-01-01&src=typed_query").text
soup = BeautifulSoup(f,'html.parser')

print(soup)

name = soup.find_all('span', class_="css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0")

print(name)

打印汤的输出....我不知道怎么说,但它是视图源而不是实际的 html 代码

{"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},t.t=function(e,n){if(1&n&&(e=t(e)),8&n)return e;if(4&n&&"object"==typeof e&&e&&e.__esModule)return e;var d=Object.create(null);if(t.r(d),Object.defineProperty(d,"default",{enumerable:!0,value:e}),2&n&&"string"!=typeof e)for(var o in e)t.d(d,o,function(n){return e[n]}.bind(null,o));return d},t.n=function(e){var n=e&&e.__esModule?function(){return e.default}:function(){return e};return t.d(n,"a",n),n},t.o=function(e,n){return Object.prototype.hasOwnProperty.call(e,n)},t.p="https://abs.twimg.com/responsive-web/web/",t.oe=function(e){throw e};var i=window.webpackJsonp=window.webpackJsonp||[],c=i.push.bind(i);i.push=n,i=i.slice();for(var l=0;l<i.length;l++)n(i[l]);var u=c;d()}([]),window.__SCRIPTS_LOADED__.runtime=!0;
//# sourceMappingURL=runtime.cc3200a4.js.map

Selenium 输出也一样

from selenium import webdriver
PATH = "C:\Program Files\chromedriver.exe"
driver = webdriver.Chrome(PATH) 
driver.get("https://twitter.com")

email = driver.find_element_by_name('session[username_or_email]')
password = driver.find_element_by_name('session[password]')

email.send_keys('......')
password.send_keys("......")
password.send_keys(Keys.RETURN)
time.sleep(1)

driver.get('https://twitter.com/search?q=%23....%20until%3A2020-02-07%20since%3A2020-01-01&src=typed_query')
time.sleep(1)

print(driver.page_source)

GetOldTweets3 使您能够提取历史推文并根据多个条件(即时间范围、位置、句柄或搜索查询)进行过滤,而无需任何 API 关键先决条件。

例如

  import GetOldTweets3 as got

  # Tweet params
  search_term = 'china trade war'
  start_date = '2017-01-01'
  end_date = '2020-01-01'
  
  # Define historical tweets criteria
  tweet_criteria = got.manager.TweetCriteria().setUsername('reuters') \
                                            .setQuerySearch(search_term) \
                                            .setSince(start_date) \
                                            .setUntil(end_date) \
                                            
  # Return tweets based on tweet criteria
  tweets = got.manager.TweetManager.getTweets(tweet_criteria)
    
  tweets.text
 

请注意,您可以通过 tweet 变量访问更多推文属性,例如主题标签、转推等,例如:

other_tweet_attributes = [[tweet.username, tweet.hashtags for tweet in tweets]]