Beautiful Soup：如何从此结构中提取文本：

Question

我想访问 title = ""

中的时间戳文本

并得到这个字符串“23.12.2019 13:05:24”

[<div class="pull_right date details" title="23.12.2019 13:05:24">
 13:05
        </div>]

我已经知道如何访问此 div 中的正确文本。但碰巧它只是一个小时。完整的时间戳正是我所需要的。

我目前正在使用这个结构：

ltimestamp = []
for tag in divTag:
    tdTags = tag.find_all("div", {"class": "pull_right date details"})    
for tag in tdTags:
    ltimestamp.append(tag.text)

Answer 1

当你有这个元素 <div class="pull_right date details" title="23.12.2019 13:05:24">13:05</div>

如果你想获得 '13:05' => 标签内的值，你可以这样做 print(tdTags.text)

要获取属性 ('title') 的值 ('23.12.2019 13:05:24')，请执行此操作 print(tdTags['title'])

我会带着 beautifulsoup link 回来解释这个问题。以前在某处读过

文档url： https://www.crummy.com/software/BeautifulSoup/bs4/doc/#attributes

Beautiful Soup：如何从此结构中提取文本：

Beautiful Soup: How to extract text from this structure:

python

beautifulsoup

text-mining

web-scraping