按子字符串查找 div class 然后提取整个 class 名称
Find div class by substring then extract entire class name
我正在尝试查找所有包含子字符串 'auction-results' 的 div,然后提取 class 名称。这是一个例子:
<div class="auction-results high-bid has-price"></div>
我可以像这样找到所有包含 'auction-results' 的 div:
results = soup.select("div[class*=auction-results]")
type(results)
results
Out: [<div class="auction-results high-bid has-price">
<i class="icon"></i>
<span class="lot-price"> 0,000</span>
</div>]
Out: bs4.element.ResultSet
我想要的是将整个 class 名称 'auction-results high-bid has-price' 存储在 pandas 列中,如下所示:
class_text = ['auction-results high-bid has-price']
'auction-results high-bid has-price'
scraped_data = pd.DataFrame({'class_text': class_text})
scraped_data
class_text
0 auction-results high-bid has-price
我还没有找到解决办法,希望有人能帮帮我,谢谢!
这样试试:
columns = ['class_text']
rows = []
for result in results:
rows.append(' '.join(result['class']))
scraped_data = pd.DataFrame([rows],columns=columns)
scraped_data
输出:
class_text
0 auction-results high-bid has-price
请参阅下面的示例。您可以将其视为 html 文档并使用 lxml 解析全名值。
from lxml import html
results = '<div class="auction-results high-bid has-price"><i class="icon"></i><span class="lot-price">0,000</span></div>'
tree = html.fromstring(results)
name = tree.xpath("//div[contains(@class,'auction-results')]/@class")
print(name)
它打印完整的 class 名称
['auction-results high-bid has-price']
我正在尝试查找所有包含子字符串 'auction-results' 的 div,然后提取 class 名称。这是一个例子:
<div class="auction-results high-bid has-price"></div>
我可以像这样找到所有包含 'auction-results' 的 div:
results = soup.select("div[class*=auction-results]")
type(results)
results
Out: [<div class="auction-results high-bid has-price">
<i class="icon"></i>
<span class="lot-price"> 0,000</span>
</div>]
Out: bs4.element.ResultSet
我想要的是将整个 class 名称 'auction-results high-bid has-price' 存储在 pandas 列中,如下所示:
class_text = ['auction-results high-bid has-price']
'auction-results high-bid has-price'
scraped_data = pd.DataFrame({'class_text': class_text})
scraped_data
class_text
0 auction-results high-bid has-price
我还没有找到解决办法,希望有人能帮帮我,谢谢!
这样试试:
columns = ['class_text']
rows = []
for result in results:
rows.append(' '.join(result['class']))
scraped_data = pd.DataFrame([rows],columns=columns)
scraped_data
输出:
class_text
0 auction-results high-bid has-price
请参阅下面的示例。您可以将其视为 html 文档并使用 lxml 解析全名值。
from lxml import html
results = '<div class="auction-results high-bid has-price"><i class="icon"></i><span class="lot-price">0,000</span></div>'
tree = html.fromstring(results)
name = tree.xpath("//div[contains(@class,'auction-results')]/@class")
print(name)
它打印完整的 class 名称
['auction-results high-bid has-price']