lxml遍历子div
lxml iterate through child divs
我有下面的 html 代码,我在其中解析 ratings/text 等。
如何使用 lxml 和树遍历所有 divs class 包含 "posting item" div "wrap"?
在下面我同时选择了所有 post div,
forumposts = tree.xpath("//div[@class='wrap']//div[contains(@class, 'posting item')]")
# here i want to iterate through posting items
# so i should have 1 text/rating to process in the parse function
for post in forumposts:
parse(post)
HTML:
<div class="wrap">
<div class="posting item theme-international" data-postingid="1035091361">
<div class="thread">
<div class="js-ratings ratings">
<div class="js-ratings-counts ratings-counts" data-closable-
target="ratinglog-1-1035091361"
onclick="ForumLoader.toggleRatinglog(1035091361, 1)">
<span class="js-ratings-negative-count ratings-negative-
count">6</span>
<span class="js-ratings-positive-count ratings-positive-
count">7</span>
</div>
</div>
<div class="text">
<a href="xyz" rel="nofollow">
<strong/>
<span>Posting text 1 </span>
</a>
</div>
</div>
</div>
<div class="posting item theme-international" data-postingid="1035091361">
<div class="thread">
<div class="js-ratings ratings">
<div class="js-ratings-counts ratings-counts" data-closable-
target="ratinglog-1-1035091361"
onclick="ForumLoader.toggleRatinglog(1035091361,
1)">
<span class="js-ratings-negative-count ratings-negative-
count">1</span>
<span class="js-ratings-positive-count ratings-positive-
count">11</span>
</div>
</div>
<div class="text">
<a href="xyz" rel="nofollow">
<strong/>
<span>Posting text 2</span>
</a>
</div>
</div>
</div>
</div>
我不清楚你想要什么,Nico。是这个吗?
>>> from lxml import etree
>>> parser = etree.HTMLParser()
>>> tree = etree.parse(open('nico.htm'), parser)
>>> for s in tree.xpath('//div[@class="wrap"]//div[@class="text"]//span'):
... s.text
...
'Posting text 1 '
'Posting text 2'
我有下面的 html 代码,我在其中解析 ratings/text 等。 如何使用 lxml 和树遍历所有 divs class 包含 "posting item" div "wrap"?
在下面我同时选择了所有 post div,
forumposts = tree.xpath("//div[@class='wrap']//div[contains(@class, 'posting item')]")
# here i want to iterate through posting items
# so i should have 1 text/rating to process in the parse function
for post in forumposts:
parse(post)
HTML:
<div class="wrap">
<div class="posting item theme-international" data-postingid="1035091361">
<div class="thread">
<div class="js-ratings ratings">
<div class="js-ratings-counts ratings-counts" data-closable-
target="ratinglog-1-1035091361"
onclick="ForumLoader.toggleRatinglog(1035091361, 1)">
<span class="js-ratings-negative-count ratings-negative-
count">6</span>
<span class="js-ratings-positive-count ratings-positive-
count">7</span>
</div>
</div>
<div class="text">
<a href="xyz" rel="nofollow">
<strong/>
<span>Posting text 1 </span>
</a>
</div>
</div>
</div>
<div class="posting item theme-international" data-postingid="1035091361">
<div class="thread">
<div class="js-ratings ratings">
<div class="js-ratings-counts ratings-counts" data-closable-
target="ratinglog-1-1035091361"
onclick="ForumLoader.toggleRatinglog(1035091361,
1)">
<span class="js-ratings-negative-count ratings-negative-
count">1</span>
<span class="js-ratings-positive-count ratings-positive-
count">11</span>
</div>
</div>
<div class="text">
<a href="xyz" rel="nofollow">
<strong/>
<span>Posting text 2</span>
</a>
</div>
</div>
</div>
</div>
我不清楚你想要什么,Nico。是这个吗?
>>> from lxml import etree
>>> parser = etree.HTMLParser()
>>> tree = etree.parse(open('nico.htm'), parser)
>>> for s in tree.xpath('//div[@class="wrap"]//div[@class="text"]//span'):
... s.text
...
'Posting text 1 '
'Posting text 2'