使用美汤从标签中提取'href'
Extract 'href' from tag using beautiful soup
我在 html 中有以下标签,我只想提取 href 内容,即 Quatermass_2_Vintage_Movie_Poster-61-10782 和 Hard Day's Night
<span class="small">
Ref.No:10782<br/>
<a href="Quatermass_2_Vintage_Movie_Poster-61-10782" title="Click for more details and a larger picture of Quatermass 2">
Click for more details and a larger picture of <b>Quatermass 2</b>
</a>
</span>, <span class="small">
Ref.No:10781<br/>
<a href="Hard_Day__039_s_Night_Vintage_Movie_Poster-61-10781" title="Click for more details and a larger picture of Hard Day's Night">
Click for more details and a larger picture of <b>Hard Day's Night</b>
</a>
</span>
以下 python 代码使我能够仅找到整个标签
html = ['table2.html']
with open("table2.html", "r") as f:
contents = f.read()
soup = BeautifulSoup(contents, "lxml")
for name in soup.find_all("span", {"class": "small"}):
print(name)
但是无法 select href only。我试过了
for name in soup.find_all("span", {"class": "small"}.get(href)):
print(name)
我也试过将 href 引用放在打印语句中
for name in soup.find_all("span", {"class": "small"}:
print(name.get('href'))
有好心人帮忙吗?
获得 span
标签后,您需要找到 a
标签,然后获取 href
属性。
像这样的东西会起作用:
for name in soup.find_all("span", {"class": "small"}):
print(name.find("a").get("href"))
您可以使用正则表达式来提取值,如下所示:
import re
input = "adde <a href=\"coedd.com\" > algo</a>";
patt= "href=\"[a-zA-Z0-9_\-\.]+\""
search = re.findall(patt, input, re.I)
print search
这个return一个巧合的数组。
希望对你有用。
此致。
我在 html 中有以下标签,我只想提取 href 内容,即 Quatermass_2_Vintage_Movie_Poster-61-10782 和 Hard Day's Night
<span class="small">
Ref.No:10782<br/>
<a href="Quatermass_2_Vintage_Movie_Poster-61-10782" title="Click for more details and a larger picture of Quatermass 2">
Click for more details and a larger picture of <b>Quatermass 2</b>
</a>
</span>, <span class="small">
Ref.No:10781<br/>
<a href="Hard_Day__039_s_Night_Vintage_Movie_Poster-61-10781" title="Click for more details and a larger picture of Hard Day's Night">
Click for more details and a larger picture of <b>Hard Day's Night</b>
</a>
</span>
以下 python 代码使我能够仅找到整个标签
html = ['table2.html']
with open("table2.html", "r") as f:
contents = f.read()
soup = BeautifulSoup(contents, "lxml")
for name in soup.find_all("span", {"class": "small"}):
print(name)
但是无法 select href only。我试过了
for name in soup.find_all("span", {"class": "small"}.get(href)):
print(name)
我也试过将 href 引用放在打印语句中
for name in soup.find_all("span", {"class": "small"}:
print(name.get('href'))
有好心人帮忙吗?
获得 span
标签后,您需要找到 a
标签,然后获取 href
属性。
像这样的东西会起作用:
for name in soup.find_all("span", {"class": "small"}):
print(name.find("a").get("href"))
您可以使用正则表达式来提取值,如下所示:
import re
input = "adde <a href=\"coedd.com\" > algo</a>";
patt= "href=\"[a-zA-Z0-9_\-\.]+\""
search = re.findall(patt, input, re.I)
print search
这个return一个巧合的数组。
希望对你有用。
此致。