Python：tr 和 td 标签中 table 的强大 xpath，消除不需要的数据

Question

我需要一种可靠的方法来获取此 url "http://www.screener.com/v2/stocks/view/5131"

的 xpath

但是，中间的理想数据前有太多空白space，不够稳健。

我需要的部分是下面 html 中的 11.48,9.05,11.53:

 <div class="table-responsive">
                        <table class="table table-hover">
                            <tr>
                                <th>Financial Year</th>
                                <th class="number">Revenue ('000)</th>
                                <th class="number">Net ('000)</th>
                                <th class="number">EPS</th>
                                <th></th>
                            </tr>

                                                                    <tr>
                                    <td>30 Nov, 2017</td>
                                    <td class="number">205,686</td>
                                    <td class="number">52,812</td>
                                    <td class="number">11.48</td>
                                    <td></td>
                                </tr>

                                                                    <tr>
                                    <td>30 Nov, 2016</td>
                                    <td class="number">191,301</td>
                                    <td class="number">41,598</td>
                                    <td class="number">9.05</td>
                                    <td></td>
                                </tr>

                                                                    <tr>
                                    <td>30 Nov, 2015</td>
                                    <td class="number">225,910</td>
                                    <td class="number">51,082</td>
                                    <td class="number">11.53</td>
                                    <td></td>
                                </tr>

我的代码如下

from lxml import html
import requests
page = requests.get('http://www.screener.com/v2/stocks/view/5131')
output = html.fromstring(page.content)
output.xpath('//tr/td/following-sibling::td/text()')

如何更改代码，以便它可以稳健地从上表中获取三个数字？

我只想要输出 11.48,9.05,11.53但我无法删除表中的太多数据

Answer 1

尝试以下 XPath 以获得所需的输出：

//div[@id="annual"]//tr/td[position() = last() - 1]/text()

Python：tr 和 td 标签中 table 的强大 xpath，消除不需要的数据

Python: robust xpath for table in tr and td tags, eliminate unwanted data

html

python

xpath

lxml