获取 <span> 个属性值
Getting <span> attribute values
我有一大块 html 代码,我想提取名为“data-content”的跨度属性的每个值
import requests
from bs4 import BeautifulSoup
with open("C:\Users\stasiek\Desktop\Atom-PYTHON\Python-Udemy\web-scraping\strona.html") as raw_resuls:
results = BeautifulSoup(raw_resuls, "html.parser")
for element in results.find_all("span"):
print(element['data-content'])
此代码仅 returns 此文件中第一个“数据内容”的值(只有一个单词)然后抛出错误:
File "niemiecki.py", line 10, in <module>
print(element['data-content'])
File "C:\Users\stasiek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\bs4\element.py", line 1406, in __getitem__
return self.attrs[key]
KeyError: 'data-content'
知道我做错了什么吗?
Select 仅具有上述属性的,例如
from bs4 import BeautifulSoup
from io import BytesIO
data = b'''\
<body>
<span data-content="foo">1</span>
<span>2</span>
<span data-content="bar">3</span>
<span>4</span>
<span>5</span>
</body>
'''
f = BytesIO(data)
soup = BeautifulSoup(f, 'html.parser')
for span in soup.select('span[data-content]'):
print(span['data-content'])
我有一大块 html 代码,我想提取名为“data-content”的跨度属性的每个值
import requests
from bs4 import BeautifulSoup
with open("C:\Users\stasiek\Desktop\Atom-PYTHON\Python-Udemy\web-scraping\strona.html") as raw_resuls:
results = BeautifulSoup(raw_resuls, "html.parser")
for element in results.find_all("span"):
print(element['data-content'])
此代码仅 returns 此文件中第一个“数据内容”的值(只有一个单词)然后抛出错误:
File "niemiecki.py", line 10, in <module>
print(element['data-content'])
File "C:\Users\stasiek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\bs4\element.py", line 1406, in __getitem__
return self.attrs[key]
KeyError: 'data-content'
知道我做错了什么吗?
Select 仅具有上述属性的,例如
from bs4 import BeautifulSoup
from io import BytesIO
data = b'''\
<body>
<span data-content="foo">1</span>
<span>2</span>
<span data-content="bar">3</span>
<span>4</span>
<span>5</span>
</body>
'''
f = BytesIO(data)
soup = BeautifulSoup(f, 'html.parser')
for span in soup.select('span[data-content]'):
print(span['data-content'])