你如何用 Python 模仿 Instagram 上的无限滚动?
How do you mimic the infinite scroll on Instagram with Python?
我编写了一个 Python 小程序来抓取 Instagram 配置文件以提取数据并显示各种统计数据。我能够从配置文件的前 9 张照片中收集数据(或者在初始加载时出现的许多照片),但我无法加载其他照片(由于无限滚动机制)。我在网上阅读过关于无限滚动的网络抓取,人们说你需要复制加载额外图像的请求。到目前为止我一直无法复制请求,有人能帮忙吗?
谢谢!
无需重新编写所有代码,已经编写了很多库来复制所有请求。
一个这样的图书馆是https://github.com/ping/instagram_private_api
使用这个库的解决方案,
from instagram_private_api import Client, ClientCompatPatch
user_name = 'YOUR_USERNAME'
password = 'YOUR_PASSWORD'
username_to_scrape = 'USERNAME_TO_SCRAPE'
all_posts = []
api = Client(user_name, password)
posts = api.username_feed(username_to_scrape) #Gets the first 12 posts
# Extract the value *next_max_id* from the above response, this is needed to load the next 12 posts
next_max_id = posts["next_max_id"]
all_posts = all_posts + posts
#
next_page_posts = api.username_feed(track_username, max_id = next_max_id)
这只是一个帮助您入门的简单示例。
更新:保存和加载 Cookie
#Saving cookies
cookies = api.cookie_jar.dump()
with open("cookies.pkl", "wb") as save_cookies:
save_cookies.write(cookies)
#Loading cookies
with open("cookies.pkl", "rb") as read_cookies:
cookies = read_cookies.read()
#Pass cookies to Client to resume session
api = Client(user_name, password, cookie = cookies)
我编写了一个 Python 小程序来抓取 Instagram 配置文件以提取数据并显示各种统计数据。我能够从配置文件的前 9 张照片中收集数据(或者在初始加载时出现的许多照片),但我无法加载其他照片(由于无限滚动机制)。我在网上阅读过关于无限滚动的网络抓取,人们说你需要复制加载额外图像的请求。到目前为止我一直无法复制请求,有人能帮忙吗?
谢谢!
无需重新编写所有代码,已经编写了很多库来复制所有请求。
一个这样的图书馆是https://github.com/ping/instagram_private_api
使用这个库的解决方案,
from instagram_private_api import Client, ClientCompatPatch
user_name = 'YOUR_USERNAME'
password = 'YOUR_PASSWORD'
username_to_scrape = 'USERNAME_TO_SCRAPE'
all_posts = []
api = Client(user_name, password)
posts = api.username_feed(username_to_scrape) #Gets the first 12 posts
# Extract the value *next_max_id* from the above response, this is needed to load the next 12 posts
next_max_id = posts["next_max_id"]
all_posts = all_posts + posts
#
next_page_posts = api.username_feed(track_username, max_id = next_max_id)
这只是一个帮助您入门的简单示例。
更新:保存和加载 Cookie
#Saving cookies
cookies = api.cookie_jar.dump()
with open("cookies.pkl", "wb") as save_cookies:
save_cookies.write(cookies)
#Loading cookies
with open("cookies.pkl", "rb") as read_cookies:
cookies = read_cookies.read()
#Pass cookies to Client to resume session
api = Client(user_name, password, cookie = cookies)