尝试使用 beautifulsoup 抓取 soundcloud

Question

我正在尝试抓取 soundcloud 和其他音乐平台的数据，但我似乎被困在 soundcloud 因为我得到 None、AttributeError 或 []，但是当我尝试抓取一个常规网站（非音乐）时。我得到数据。我做错了什么请帮助。

from bs4 import BeautifulSoup
import requests

html_text = requests.get('https://soundcloud.com/jujubucks').text
soup = BeautifulSoup(html_text,'lxml')
song = soup.find('li', class_='soundList__item')
print(song)

这个代码returns这个。

None or AttributeError.

Answer 1

查看原始输出（代码中的变量汤）。

此代码提取原始歌曲标题：

from bs4 import BeautifulSoup
import requests

html_text = requests.get('https://soundcloud.com/jujubucks').text
soup = BeautifulSoup(html_text, 'lxml')
song = soup.find_all('h2', itemprop='name')
print(song)

以上代码输出列表中的项目示例：

<h2 itemprop="name"><a href="/jujubucks/squad-too-deep-ft-cool-prince" itemprop="url">Squad Too Deep Ft. Cool Prince (Outro)</a>

但是如果没有 selenium 或 scrapy，您无法从该网站抓取所有数据，它们使用动态加载的内容。

尝试使用 beautifulsoup 抓取 soundcloud

Trying to webscrape soundcloud with beautifulsoup

html

python

beautifulsoup

data-mining

web-scraping