Xpath scrapy 结果不符合预期

Xpath scrapy result not as expected

我正在尝试获取前面标记的值。这是我正在做的事情:

html 页面的结构:

...
<tr class="destaque no-hover">
    <td class="periodo" colspan="6">2020.1</td>
</tr>
<tr class="linhaPar">
    <td>Text1</td>
    <td align="center">01</td>
    <td align="right">312h</td>
    <td align="center">3T12</td>
</tr>
<tr class="linhaImpar">
    <td>Text2</td>
    <td align="center">01</td>
    <td align="right">12h</td>
    <td align="center">5M12</td>
</tr>
...
<tr class="destaque no-hover">
    <td class="periodo" colspan="6">2016.1</td>
</tr>
<tr class="linhaPar">
    <td>Text7</td>
    <td align="center">01</td>
    <td align="right">2h</td>
    <td align="center">2N12</td>
</tr>
<tr class="linhaImpar">
    <td>Text8</td>
    <td align="center">01</td>
    <td align="right">32h</td>
    <td align="center">4T12</td>
</tr>
...
<tr class="destaque no-hover">
    <td class="periodo" colspan="6">2014.2</td>
</tr>
<tr class="linhaPar">
    <td>TextN-1</td>
    <td align="center">01</td>
    <td align="right">2h</td>
    <td align="center">2N12</td>
</tr>
<tr class="linhaImpar">
    <td>TextN</td>
    <td align="center">01</td>
    <td align="right">32h</td>
    <td align="center">4T12</td>
</tr>

所以,我正在尝试获取其中每一个的信息 tr classes="linhaPar|linhaImpar"

for i in response.xpath('//tr[@class="linhaPar" or @class="linhaImpar"]')
    _aux = i.xpath('./td[1]')

但是,我也需要那些 td[@class="periodo"] 所以,我被 xpath

困住了
# I've tried this, but return a list of elements that matches, not the close one, as I want
    _p = _aux.xpath('./preceding::tr[td[@class="periodo"]')

# I've also tried this, but won't work
    _p = _aux.xpath('./preceding::tr[td[@class="periodo"] and position()=1]')

已解决

可能我在提这个问题的时候不够清楚。 periodo 将不同数量的 tr 放在一起变化。我尝试搜索的每一种方式,return 都给我一个可能的结果列表或 nada。为了解决这个问题,我尝试了建议的解决方案来考虑 "for loop xpath":

中的 periodo
_p = ""
for i in response.xpath('//tr[@class="linhaPar" or @class="linhaImpar" or @class="destaque no-hover"]'):
    # Check if it's a td with period
    if 'destaque no-hover' == i.xpath('./@class').get():
        _p = i.xpath('./td/text()').get()
        continue # Force to go to the next one

这个 XPath:

'//tr[@class="linhaPar" or @class="linhaImpar" or td[@class="periodo"]]' 

假设您希望将其存储在 _p 中(每个 tr 上下文节点一个 periodo):

['2020.1'], ['2020.1'], ['2020.1'], ['2020.1']

使用:

./preceding::td[@class="periodo"][1]

假设您希望将其存储在 _p 中(每组数据一个 periodo):

['2020.1'], [], ['2020.2'], []

使用:

./preceding-sibling::tr[1]/td[1][@class="periodo"]

如果您需要从创建的列表中删除空元素,请在之后使用 filter

对于第二种情况,如@Gilles Quenot 所述,您还可以将上下文节点更改为:

//tr[@class="linhaPar" or @class="linhaImpar" or @class="destaque no-hover"]

并填写您的列表:

_aux = ./td[1][not(@class="periodo")]
_p = ./td[1][@class="periodo"]

或:

_aux = ./td[1][not(starts-with(text(),"2020."))]
_p = ./td[1][starts-with(text(),"2020.")]