如何使用 Beautiful Soup4 从 Twitter 用户配置文件中获取位置?
How to get location from Twitter user profile using Beautiful Soup4?
所以,我正在尝试获取给定 Twitter 帐户的个人资料中的位置文本
handles = ['IndieWire' , 'AFP', 'UN']
for x in handles:
url= "https://twitter.com/" + x
try:
html = req.get(url)
except Exception as e:
print(f"Failed to fetch page for url {url} due to: {e}")
continue
soup = BeautifulSoup(html.text,'html.parser')
try:
label = soup.find('span',{'class':"ProfileHeaderCard-locationText"})
label_formatted = label.string.lstrip()
label_formatted = label_formatted.rstrip()
if label_formatted != "":
location_list.append(label_formatted)
print(x + ' : ' + label_formatted)
else:
location_list.append(label_formatted)
print(x + ' : ' + 'Not found')
except AttributeError:
try:
label2 = soup.findAll('span',{"class":"ProfileHeaderCard-locationText"})[0].get_text()
label2 = str(label2)
label2_formatted = label2.lstrip()
label2_formatted = label2_formatted.rstrip()
location_list.append(label_formatted)
print(x + ' : ' + label2_formatted)
except:
print(x + ' : ' + 'Not found')
except:
print(x + ' : ' + 'Not found')
几个月前我使用这段代码时,它曾经有效。在检查了 Twitter 页面源后,我现在稍微更改了它,但我仍然无法获取位置。希望你能帮忙
使用移动版 Twitter 获取位置。
例如:
import requests
from bs4 import BeautifulSoup
handles = ['IndieWire' , 'AFP', 'UN']
ref = 'https://twitter.com/{h}'
headers = {'Referer': '',}
url = 'https://mobile.twitter.com/i/nojs_router?path=/{h}'
for h in handles:
headers['Referer'] = ref.format(h=h)
soup = BeautifulSoup( requests.post(url.format(h=h), headers=headers).content, 'html.parser' )
loc = soup.select_one('.location')
if loc:
print(h, loc.text)
else:
print(h, 'Not Found')
打印:
IndieWire New York, NY
AFP France
UN New York, NY
所以,我正在尝试获取给定 Twitter 帐户的个人资料中的位置文本
handles = ['IndieWire' , 'AFP', 'UN']
for x in handles:
url= "https://twitter.com/" + x
try:
html = req.get(url)
except Exception as e:
print(f"Failed to fetch page for url {url} due to: {e}")
continue
soup = BeautifulSoup(html.text,'html.parser')
try:
label = soup.find('span',{'class':"ProfileHeaderCard-locationText"})
label_formatted = label.string.lstrip()
label_formatted = label_formatted.rstrip()
if label_formatted != "":
location_list.append(label_formatted)
print(x + ' : ' + label_formatted)
else:
location_list.append(label_formatted)
print(x + ' : ' + 'Not found')
except AttributeError:
try:
label2 = soup.findAll('span',{"class":"ProfileHeaderCard-locationText"})[0].get_text()
label2 = str(label2)
label2_formatted = label2.lstrip()
label2_formatted = label2_formatted.rstrip()
location_list.append(label_formatted)
print(x + ' : ' + label2_formatted)
except:
print(x + ' : ' + 'Not found')
except:
print(x + ' : ' + 'Not found')
几个月前我使用这段代码时,它曾经有效。在检查了 Twitter 页面源后,我现在稍微更改了它,但我仍然无法获取位置。希望你能帮忙
使用移动版 Twitter 获取位置。
例如:
import requests
from bs4 import BeautifulSoup
handles = ['IndieWire' , 'AFP', 'UN']
ref = 'https://twitter.com/{h}'
headers = {'Referer': '',}
url = 'https://mobile.twitter.com/i/nojs_router?path=/{h}'
for h in handles:
headers['Referer'] = ref.format(h=h)
soup = BeautifulSoup( requests.post(url.format(h=h), headers=headers).content, 'html.parser' )
loc = soup.select_one('.location')
if loc:
print(h, loc.text)
else:
print(h, 'Not Found')
打印:
IndieWire New York, NY
AFP France
UN New York, NY