使用 BeautifulSoup <span> 进行网页抓取
Web scraping with BeautifulSoup <span>
我正在尝试打印标签内的信息。
但是我有一张空白的照片。
有网址:https://mubi.com/it/films/25-watts/cast?type=cast
我正在尝试打印所有演员的姓名。
这是我的代码:
import random
import requests
from bs4 import BeautifulSoup
url ='https://mubi.com/it/films/25-watts/cast?type=cast' #vincitori
def main():
response = requests.get(url)
html = response.text
soup1 = BeautifulSoup(html, 'html.parser')
cast = soup1.find_all('span', {'class' : 'css-1marmfu e1a7pc1u9'})
for tag in cast:
print(tag)
if __name__ == '__main__':
main()
感谢支持;)
您在页面上看到的数据是通过 JavaScript 从外部 URL 加载的(因此 beautifulsoup
看不到它)。可以使用requests
模块模拟Ajax请求:
import json
import requests
from bs4 import BeautifulSoup
url = "https://mubi.com/it/films/25-watts/cast?type=cast"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = json.loads(soup.select_one("#__NEXT_DATA__").contents[0])
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
film = data["props"]["initialProps"]["pageProps"]["film"]
cast_url = "https://api.mubi.com/v3/films/{}/cast_members?sort=relevance&type=cast&page=1"
cast = requests.get(
cast_url.format(film["id"]),
headers={"CLIENT": "web", "Client-Country": "US"},
).json()
# print(json.dumps(cast, indent=4))
for m in cast["cast_members"]:
print("{:<30} {:<30}".format(m["name"], m["primary_type"] or "-"))
打印:
Daniel Hendler Actor
Jorge Temponi Actor
Alfonso Tort Actor
Valentín Rivero -
Federico Veiroj Director
Valeria Mendieta -
Roberto Suárez Actor
Gonzalo Eyherabide -
Robert Moré Actor
Ignacio Mendy -
我正在尝试打印标签内的信息。 但是我有一张空白的照片。
有网址:https://mubi.com/it/films/25-watts/cast?type=cast
我正在尝试打印所有演员的姓名。
这是我的代码:
import random
import requests
from bs4 import BeautifulSoup
url ='https://mubi.com/it/films/25-watts/cast?type=cast' #vincitori
def main():
response = requests.get(url)
html = response.text
soup1 = BeautifulSoup(html, 'html.parser')
cast = soup1.find_all('span', {'class' : 'css-1marmfu e1a7pc1u9'})
for tag in cast:
print(tag)
if __name__ == '__main__':
main()
感谢支持;)
您在页面上看到的数据是通过 JavaScript 从外部 URL 加载的(因此 beautifulsoup
看不到它)。可以使用requests
模块模拟Ajax请求:
import json
import requests
from bs4 import BeautifulSoup
url = "https://mubi.com/it/films/25-watts/cast?type=cast"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = json.loads(soup.select_one("#__NEXT_DATA__").contents[0])
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
film = data["props"]["initialProps"]["pageProps"]["film"]
cast_url = "https://api.mubi.com/v3/films/{}/cast_members?sort=relevance&type=cast&page=1"
cast = requests.get(
cast_url.format(film["id"]),
headers={"CLIENT": "web", "Client-Country": "US"},
).json()
# print(json.dumps(cast, indent=4))
for m in cast["cast_members"]:
print("{:<30} {:<30}".format(m["name"], m["primary_type"] or "-"))
打印:
Daniel Hendler Actor
Jorge Temponi Actor
Alfonso Tort Actor
Valentín Rivero -
Federico Veiroj Director
Valeria Mendieta -
Roberto Suárez Actor
Gonzalo Eyherabide -
Robert Moré Actor
Ignacio Mendy -