如何找到具有特定 id 名称的 div 并使用 lxml 遍历其子项?

How to find a div with a specific id name and iterate over its children using lxml?

我正在使用 Python lxml 客户端,我尝试了以下代码来解析并获取我想要的元素,但它只是 returns 空:

from lxml import html
tree = html.fromstring(html_content)
posts = tree.xpath('//*[@id="posts"]/div')
for post in posts:
    print post

HTML 代码如下所示:

<div>
  <div>
    ...
     <div id="posts">
         <div>
             <div class="post"> 
                 <a href="">User 1</a>
                 <div class="content"> Content 1</div>
             </div>
             <div class="post"> 
                 <a href="">User 2</a>
                 <div class="content"> Content 2</div>
             </div>
             ...
         </div>
     </div>
   ...

我想遍历每个 post 以便访问 <a> 标签和 <div> 内容。我要打印:

 User 1
 Content 1

 User 2
 Content 2

 ...

使用类似语法 class post 定位标签可能更容易:

posts = tree.xpath('//*[@id="posts"]/div/*[@class="post"]')
for post in posts:
    print post.find('a').text
    print post.find('div').text # add .strip() to clean the leading space

输出:

User 1
 Content 1

User 2
 Content 2