如何使用 BeautifulSoup 提取这些链接?
How to extract these links with BeautifulSoup?
我正在尝试从 <a class="hwd_sound sound audio_play_button icon-volume-up ptr" title="Pronunciation for " data-src-mp3="https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3" data-lang="en_GB"></a>
中提取下载链接。我成功 re.findall('data-src-mp3="(.*?)"', str(content1))
.
我想请求一个 BeautifulSoup
的方法,这使得我的代码需要更少的库。以下是我的完整代码:
import requests
from bs4 import BeautifulSoup
import re
url = 'https://www.collinsdictionary.com/dictionary/english-french/graduate'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
soup = BeautifulSoup(requests.get(url, headers = headers).content, 'html.parser')
for tag in soup.select('''
script,
.hcdcrt,
#ad_contentslot_1,
#ad_contentslot_2,
div.h2_entry,
div.copyright,
div.example-info,
div.share-overlay,
div.popup-overlay,
span.xr'''):
tag.extract()
content1 = ''.join(map(str, soup.select_one('.cB.cB-def.dictionary.biling').contents))
sound_url = re.findall('data-src-mp3="(.*?)"', str(content1))
print(sound_url)
我的方法的结果
['https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0071410.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/fr_bachelier.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/63854.mp3']
非常感谢!
试试这个:
import requests
from bs4 import BeautifulSoup
url = 'https://www.collinsdictionary.com/dictionary/english-french/graduate'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
for tag in soup.select('''
script,
.hcdcrt,
#ad_contentslot_1,
#ad_contentslot_2,
div.h2_entry,
div.copyright,
div.example-info,
div.share-overlay,
div.popup-overlay,
span.xr'''):
tag.extract()
content1 = ''.join(map(str, soup.select_one('.cB.cB-def.dictionary.biling').contents))
sound_url = [link.get('data-src-mp3') for link in soup.select('a[data-src-mp3$=".mp3"]')]
print(sound_url)
打印:
['https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0071410.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/fr_bachelier.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/63854.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/EN-US-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/AR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/PT-BR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/ZH-CN-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/HR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/CS-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/DA-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/NL-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/ES-ES-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FI-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/DE-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/EL-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/IT-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/JA-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/KO-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/NO-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/PL-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/PT-PT-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/RU-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/ES-419-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/SV-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/TH-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/TR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/VI-W0037420.mp3']
我正在尝试从 <a class="hwd_sound sound audio_play_button icon-volume-up ptr" title="Pronunciation for " data-src-mp3="https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3" data-lang="en_GB"></a>
中提取下载链接。我成功 re.findall('data-src-mp3="(.*?)"', str(content1))
.
我想请求一个 BeautifulSoup
的方法,这使得我的代码需要更少的库。以下是我的完整代码:
import requests
from bs4 import BeautifulSoup
import re
url = 'https://www.collinsdictionary.com/dictionary/english-french/graduate'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
soup = BeautifulSoup(requests.get(url, headers = headers).content, 'html.parser')
for tag in soup.select('''
script,
.hcdcrt,
#ad_contentslot_1,
#ad_contentslot_2,
div.h2_entry,
div.copyright,
div.example-info,
div.share-overlay,
div.popup-overlay,
span.xr'''):
tag.extract()
content1 = ''.join(map(str, soup.select_one('.cB.cB-def.dictionary.biling').contents))
sound_url = re.findall('data-src-mp3="(.*?)"', str(content1))
print(sound_url)
我的方法的结果
['https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0071410.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/fr_bachelier.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/63854.mp3']
非常感谢!
试试这个:
import requests
from bs4 import BeautifulSoup
url = 'https://www.collinsdictionary.com/dictionary/english-french/graduate'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
for tag in soup.select('''
script,
.hcdcrt,
#ad_contentslot_1,
#ad_contentslot_2,
div.h2_entry,
div.copyright,
div.example-info,
div.share-overlay,
div.popup-overlay,
span.xr'''):
tag.extract()
content1 = ''.join(map(str, soup.select_one('.cB.cB-def.dictionary.biling').contents))
sound_url = [link.get('data-src-mp3') for link in soup.select('a[data-src-mp3$=".mp3"]')]
print(sound_url)
打印:
['https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0071410.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/fr_bachelier.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/63854.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/EN-US-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/AR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/PT-BR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/ZH-CN-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/HR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/CS-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/DA-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/NL-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/ES-ES-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FI-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/DE-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/EL-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/IT-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/JA-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/KO-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/NO-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/PL-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/PT-PT-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/RU-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/ES-419-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/SV-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/TH-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/TR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/VI-W0037420.mp3']