如何找到具有特定 id 名称的 div 并使用 lxml 遍历其子项?
How to find a div with a specific id name and iterate over its children using lxml?
我正在使用 Python lxml 客户端,我尝试了以下代码来解析并获取我想要的元素,但它只是 returns 空:
from lxml import html
tree = html.fromstring(html_content)
posts = tree.xpath('//*[@id="posts"]/div')
for post in posts:
print post
HTML 代码如下所示:
<div>
<div>
...
<div id="posts">
<div>
<div class="post">
<a href="">User 1</a>
<div class="content"> Content 1</div>
</div>
<div class="post">
<a href="">User 2</a>
<div class="content"> Content 2</div>
</div>
...
</div>
</div>
...
我想遍历每个 post
以便访问 <a>
标签和 <div>
内容。我要打印:
User 1
Content 1
User 2
Content 2
...
使用类似语法 class post
定位标签可能更容易:
posts = tree.xpath('//*[@id="posts"]/div/*[@class="post"]')
for post in posts:
print post.find('a').text
print post.find('div').text # add .strip() to clean the leading space
输出:
User 1
Content 1
User 2
Content 2
我正在使用 Python lxml 客户端,我尝试了以下代码来解析并获取我想要的元素,但它只是 returns 空:
from lxml import html
tree = html.fromstring(html_content)
posts = tree.xpath('//*[@id="posts"]/div')
for post in posts:
print post
HTML 代码如下所示:
<div>
<div>
...
<div id="posts">
<div>
<div class="post">
<a href="">User 1</a>
<div class="content"> Content 1</div>
</div>
<div class="post">
<a href="">User 2</a>
<div class="content"> Content 2</div>
</div>
...
</div>
</div>
...
我想遍历每个 post
以便访问 <a>
标签和 <div>
内容。我要打印:
User 1
Content 1
User 2
Content 2
...
使用类似语法 class post
定位标签可能更容易:
posts = tree.xpath('//*[@id="posts"]/div/*[@class="post"]')
for post in posts:
print post.find('a').text
print post.find('div').text # add .strip() to clean the leading space
输出:
User 1
Content 1
User 2
Content 2