使用 Python 获取 Instagram 个人资料的名称和 post 的日期
Get the name of Instagram profile and the date of post with Python
我正在学习 python3
,我尝试解决一个简单的任务。我想从 instagram link 获取 帐户名称 和 日期 post。
import requests
from bs4 import BeautifulSoup
html = requests.get('https://www.instagram.com/p/BuPSnoTlvTR')
soup = BeautifulSoup(html.text, 'lxml')
item = soup.select_one("meta[property='og:description']")
name = item.find_previous_sibling().get("content").split("•")[0]
print(name)
此代码有时与 link 类似 https://www.instagram.com/kingtop
但我需要它也能处理 post 这样的图像 https://www.instagram.com/p/BuxB00KFI-x/
这就是我所能做的,但这是行不通的。而且我也无法获得日期。
你有什么想法?感谢您的帮助。
我找到了获取帐户名称的方法。现在我正在尝试找到一种获取上传日期的方法
import requests
from bs4 import BeautifulSoup
import urllib.request
import urllib.error
import time
from multiprocessing import Pool
from requests.exceptions import HTTPError
start = time.time()
file = open('users.txt', 'r', encoding="ISO-8859-1")
urls = file.readlines()
for url in urls:
url = url.strip ('\n')
try:
req = requests.get(url)
req.raise_for_status()
except HTTPError as http_err:
output = open('output2.txt', 'a')
output.write(f'не найдена\n')
except Exception as err:
output = open('output2.txt', 'a')
output.write(f'не найдены\n')
else:
output = open('output2.txt', 'a')
soup = BeautifulSoup(req.text, "lxml")
the_url = soup.select("[rel='canonical']")[0]['href']
the_url2=the_url.replace('https://www.instagram.com/','')
head, sep, tail = the_url2.partition('/')
output.write (head+'\n')
我正在学习 python3
,我尝试解决一个简单的任务。我想从 instagram link 获取 帐户名称 和 日期 post。
import requests
from bs4 import BeautifulSoup
html = requests.get('https://www.instagram.com/p/BuPSnoTlvTR')
soup = BeautifulSoup(html.text, 'lxml')
item = soup.select_one("meta[property='og:description']")
name = item.find_previous_sibling().get("content").split("•")[0]
print(name)
此代码有时与 link 类似 https://www.instagram.com/kingtop 但我需要它也能处理 post 这样的图像 https://www.instagram.com/p/BuxB00KFI-x/
这就是我所能做的,但这是行不通的。而且我也无法获得日期。 你有什么想法?感谢您的帮助。
我找到了获取帐户名称的方法。现在我正在尝试找到一种获取上传日期的方法
import requests
from bs4 import BeautifulSoup
import urllib.request
import urllib.error
import time
from multiprocessing import Pool
from requests.exceptions import HTTPError
start = time.time()
file = open('users.txt', 'r', encoding="ISO-8859-1")
urls = file.readlines()
for url in urls:
url = url.strip ('\n')
try:
req = requests.get(url)
req.raise_for_status()
except HTTPError as http_err:
output = open('output2.txt', 'a')
output.write(f'не найдена\n')
except Exception as err:
output = open('output2.txt', 'a')
output.write(f'не найдены\n')
else:
output = open('output2.txt', 'a')
soup = BeautifulSoup(req.text, "lxml")
the_url = soup.select("[rel='canonical']")[0]['href']
the_url2=the_url.replace('https://www.instagram.com/','')
head, sep, tail = the_url2.partition('/')
output.write (head+'\n')