XPath 查询查找不在选择器内的元素
XPath Query Finds Elements Not Inside Selector
我的 XPath 查询正在查找甚至不在其中的元素。例如(来自我下面的代码)business_div
包含 HTML:
<div class="foo">
<div>
<table>
...
<a class="bar" href="A">link</a>
</table>
</div>
</div>
当我 运行 XPath 查询 business_div.xpath("//a[@class='bar']/@href").extract()
它 returns:
["A", "B", "D"] # should just be ["A"]
我如何才能查询 business_div
以获得 "A"
?
<div class="foo">
<div>
<table>
...
<a class="bar" href="A">link</a>
</table>
</div>
</div>
<div class="foo">
<div>
<table>
...
<a class="bar" href="B">link</a>
</table>
</div>
</div>
<div class="foo">
<div>
<table>
...
<!-- Some divs will not contain a link. So I cant do a simple query "//div[contains(@class, "foo")]//a[contains(@class, "bar")]/@href" -->
</table>
</div>
</div>
<div class="foo">
<div>
<table>
...
<a class="bar" href="D">link</a>
</table>
</div>
</div>
我的代码:
class MySpider(CrawlSpider):
name = "MySpider"
...
def parse(self, response):
businesses = []
business_divs = response.xpath("//div[contains(@class, 'foo')]")
for business_div in business_divs:
business = MyItem()
business["link"] = business_div.xpath("//a[@class='bar']/@href").extract()
# business["link"] is ["A", "B", "D"]
# I am expecting business["link"] to simply be ["A"]
# in the first loop then ["B"] and so on
稍微改变一下 xpath 就可以解决问题,
business["link"] = business_div.xpath(".//a[@class='bar']/@href").extract()
我的 XPath 查询正在查找甚至不在其中的元素。例如(来自我下面的代码)business_div
包含 HTML:
<div class="foo">
<div>
<table>
...
<a class="bar" href="A">link</a>
</table>
</div>
</div>
当我 运行 XPath 查询 business_div.xpath("//a[@class='bar']/@href").extract()
它 returns:
["A", "B", "D"] # should just be ["A"]
我如何才能查询 business_div
以获得 "A"
?
<div class="foo">
<div>
<table>
...
<a class="bar" href="A">link</a>
</table>
</div>
</div>
<div class="foo">
<div>
<table>
...
<a class="bar" href="B">link</a>
</table>
</div>
</div>
<div class="foo">
<div>
<table>
...
<!-- Some divs will not contain a link. So I cant do a simple query "//div[contains(@class, "foo")]//a[contains(@class, "bar")]/@href" -->
</table>
</div>
</div>
<div class="foo">
<div>
<table>
...
<a class="bar" href="D">link</a>
</table>
</div>
</div>
我的代码:
class MySpider(CrawlSpider):
name = "MySpider"
...
def parse(self, response):
businesses = []
business_divs = response.xpath("//div[contains(@class, 'foo')]")
for business_div in business_divs:
business = MyItem()
business["link"] = business_div.xpath("//a[@class='bar']/@href").extract()
# business["link"] is ["A", "B", "D"]
# I am expecting business["link"] to simply be ["A"]
# in the first loop then ["B"] and so on
稍微改变一下 xpath 就可以解决问题,
business["link"] = business_div.xpath(".//a[@class='bar']/@href").extract()