如何使用 Scrapy 解析 table 中的特定内容

Question

我正在尝试解析 table 中的某些内容，如下所示：

<table class="dataTbl col-4">
                        <tr>
                            <th scope="row">Rent</th>
                            <td>5.5</td>
                            <th scope="row">Management</th>
                            <td>3.3</td>
                        </tr>
                        <tr>
                            <th scope="row">Deposit</th>
                            <td>No</td>
                            <th scope="row">Other</th>
                            <td>No</td>
                        </tr>
                        <tr>
                            <th scope="row">Other2</th>
                            <td>No</td>
                            <th scope="row">Insurance</th>
                            <td>Yes</td>
                        </tr>
                                            </table>

我的目标是找到特定的行（例如 Rent），如果匹配，则提取下一个 <td> 标签中的内容（例如 5.5）。

但是我如何在 Python 中做到这一点？

我正在使用 Python3/Scrapy 1.3.0.

谢谢

Answer 1

In [9]: Selector(text=html).xpath('//th[text()="Rent"]/following-sibling::td[1]').extract()
Out[9]: ['<td>5.5</td>']

使用 text()="Rent" 标识 th 标签
使用 following-sibling:: 获取它的兄弟并使用 [1] 获取第一个

Answer 2

使用 python 的正则表达式。

r'\>text\<.+\n +\<td\>(\d+\.\d+)'

在您的情况下，按租金更改文本。此外，this 是调试正则表达式的有用网页。

如何使用 Scrapy 解析 table 中的特定内容

How to parse specific conents from table with Scrapy

regex

scrapy

web-scraping

python-3.x

scrapy-spider