从 Mixcloud 播放列表 Python Selenium 中提取 URL

Extract URL from Mixcloud playlist Python Selenium

我需要一些帮助,从 mixcloud.com 用户页面的锚点 href 标签中提取 URL。我知道该页面是使用 javascript 生成的,我正在使用 selenium 来解决这个问题,我已经成功地使用了与 Youtube 播放列表类似的方法,但我无法让它工作。这是我试图提取 url 的混音 url。

https://www.mixcloud.com/caimanblack/


<div class="AudioCard__DetailsContainer-sc-1ltw4p1-6 euvMwc">
<div class="AudioCardTitle__Container-sc-1kxsru9-1 hGblkL">
<div class="AudioCardPlayButton__PlayButtonContainer-sc-1iib1iv-0 diYcBm AudioCardTitle__PlayButton-sc-1kxsru9-2 dDAfgc" title="Play">
<div class="AudioCardPlayButton__PlayButtonIconContainer-sc-1iib1iv-3 izFLOx">
<svg width="24" height="24" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
<title>Icon / 24 / Play Solid</title>
<path fill="#1E2337" d="M20 10.67L7.9 4 6 4.9v14.42l1.9.68L20 13.33z" fill-rule="evenodd"></path></svg></div>
<svg version="1.1" xmlns="http://www.w3.org/2000/svg" class="AudioCardPlayButton__PlayButtonRings-sc-1iib1iv-2 iIGcCU">
<circle class="ring-listened" cx="50%" cy="50%" r="22" style="stroke-dashoffset: 34.5575px; stroke-dasharray: 0px, 138.23px; stroke: rgb(243, 178, 166);"></circle>
<circle class="ring-remaining" cx="50%" cy="50%" r="22" style="stroke-dashoffset: 172.788px; stroke-dasharray: 138.23px, 0px;"></circle></svg></div>
<div class="AudioCardTitle__DetailsContainer-sc-1kxsru9-3 cTqEgM">
<a class="AudioCardTitle__PlainLink-sc-1kxsru9-0 AudioCardTitle__TitleLink-sc-1kxsru9-4 jKwuem" href="/caimanblack/93-94-dark-jungle-mix-5/">93-94 Dark Jungle Mix 5</a>
<div class="AudioCardTitle__OwnerText-sc-1kxsru9-5 gxeIb">by&nbsp;
<span class="hovercard-anchor AudioCardTitle__OwnerHovercard-sc-1kxsru9-7 cicNsQ">
<a class="AudioCardTitle__PlainLink-sc-1kxsru9-0 AudioCardTitle__OwnerLink-sc-1kxsru9-6 YOGda" href="/caimanblack/">Caiman Black</a>

这是我尝试过的方法,但我一无所获。

for item in range(20): 
    WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.TAG_NAME, "body"))).send_keys(Keys.END)


mixes = driver.find_elements_by_class_name('styles__SectionContainer-obazx4-0 fKqoOc')


for mix in mixes:
    link = mix.find_element_by_xpath('.//*[@class="AudioCardTitle__Container-sc-1kxsru9-1 hGblkL"]')
    print(link)

你能试试这个吗? :

mixes = driver.find_elements_by_xpath("//a[contains(@class,'AudioCardTitle')]")
for mix in mixes:
    print(mix.get_attribute("href"))

获取所有链接 使用 WebDriverWait() 并等待 visibility_of_all_elements_located() 和以下 CSS 选择器。

driver.get("https://www.mixcloud.com/caimanblack/")
AllLinks=[item.get_attribute("href") for item in WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"a[class^='AudioCardTitle__PlainLink'][class$='jKwuem']")))]
print(AllLinks)
print("Total Links : {}".format(len(AllLinks)))

控制台输出:

['https://www.mixcloud.com/caimanblack/93-94-dark-jungle-mix-5/', 'https://www.mixcloud.com/caimanblack/93-95-intelligent-db-mix-2/', 'https://www.mixcloud.com/caimanblack/94-95-jungle-db-rollers-mix-2/', 'https://www.mixcloud.com/caimanblack/94-95-jungle-db-rollers-mix-1/', 'https://www.mixcloud.com/caimanblack/94-95-jungle-db-mix-1/', 'https://www.mixcloud.com/caimanblack/94-95-jungle-mix-1/', 'https://www.mixcloud.com/caimanblack/93-95-intelligent-db-mix-1/', 'https://www.mixcloud.com/caimanblack/94-96-intelligent-db-mix-4/', 'https://www.mixcloud.com/caimanblack/94-96-intelligent-db-mix-3/', 'https://www.mixcloud.com/caimanblack/dub-7-king-tubby-others/', 'https://www.mixcloud.com/caimanblack/94-jungle-mix-2/', 'https://www.mixcloud.com/caimanblack/94-jungle-mix-1/', 'https://www.mixcloud.com/caimanblack/94-jungle-death/', 'https://www.mixcloud.com/caimanblack/94-94-jungle-mix-4/', 'https://www.mixcloud.com/caimanblack/93-94-jungle-mix-3/', 'https://www.mixcloud.com/caimanblack/96-99-dark-tech-db-3/', 'https://www.mixcloud.com/caimanblack/96-99-dark-tech-db-2/', 'https://www.mixcloud.com/caimanblack/96-99-dark-tech-db-1/', 'https://www.mixcloud.com/caimanblack/dub-6-king-tubby-the-upsetters-augustus-pablo-others/', 'https://www.mixcloud.com/caimanblack/dub-5-augustus-pablo-revolutionaries-aggrovators-others/', 'https://www.mixcloud.com/caimanblack/dub-4-king-tubby-the-upsetters-linval-thompson-others/', 'https://www.mixcloud.com/caimanblack/94-96-intelligent-db-mix-2/', 'https://www.mixcloud.com/caimanblack/94-96-intelligent-db-mix-1/', 'https://www.mixcloud.com/caimanblack/93-96-deep-jungle-mix-3/', 'https://www.mixcloud.com/caimanblack/93-96-deep-jungle-mix-2/', 'https://www.mixcloud.com/caimanblack/93-96-deep-jungle-mix-1/', 'https://www.mixcloud.com/caimanblack/96-99-jazz-funk-drum-bass-mix-1/', 'https://www.mixcloud.com/caimanblack/dub-90s-00s-mix-3/', 'https://www.mixcloud.com/caimanblack/dub-90s-00s-mix-2/', 'https://www.mixcloud.com/caimanblack/dub-90s-00s-mix-1/']
Total Links : 30

你可以使用下面的 xpath 结果也是一样的。

driver.get("https://www.mixcloud.com/caimanblack/")
AllLinks=[item.get_attribute("href") for item in WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.XPATH,"//a[starts-with(@class,'AudioCardTitle__PlainLink') and contains(@class,'jKwuem')]")))]
print(AllLinks)
print("Total Links : {}".format(len(AllLinks))) 

现在如果你想迭代你可以使用这个。

for item in AllLinks:
    print(item)