MorningStar 上的行文本搜索列 return 需要 XPath 帮助

Question

需要从 Html 中提取一些数据，如下所示：

<div class="r_tbar0 positionrelative">
    <h3>Financials</h3>
</div>
<table class="r_table1 text2" cellspacing="0" cellpadding="0">
    <thead>
        <tr>
            <th scope="row" align="left"></th>
            <th scope="col" id="Y0" align="right">2007-12</th>
            <th scope="col" id="Y1" align="right">2008-12</th>
            <!--More columns here-->
            <th scope="col" id="Y9" align="right">2016-12</th>
            <th scope="col" id="Y10" align="right">TTM</th>
        </tr>
    </thead>
    <tbody>
        <tr class="hr">
            <td colspan="12"></td>
        </tr>
        <tr>
            <th class="row_lbl" scope="row" id="i0">Revenue&nbsp;<span>USD Mil</span></th>
            <td headers="Y0 i0" align="right">5,858</td>
            <td headers="Y1 i0" align="right">5,808</td>
            <!--More cells here-->
            <td headers="Y9 i0" align="right">4,272</td>
            <td headers="Y10 i0" align="right">4,955</td>
        </tr>
        <tr class="hr">
            <td colspan="12"></td>
        </tr>
        <tr>
            <th class="row_lbl" scope="row" id="i1">Gross Margin %</th>
            <td headers="Y0 i1" align="right">37.4</td>
            <td headers="Y1 i1" align="right">39.9</td>
            <!--More cells here-->
            <td headers="Y9 i1" align="right">23.4</td>
            <td headers="Y10 i1" align="right">33.5</td>
        </tr>
        <!--More rows here-->
        <tr class="hr">
            <td colspan="12"></td>
        </tr>
    </tbody>
</table>

我希望通过搜索 "Revenue" 行然后查看 2007 列来从关键比率页面中提取 2007 年收入数据 XPATH。

2007 年收入的 XPATH 位置：

//*[@id="financials"]/table/tbody/tr[2]/td[1]

tr[2] 表示 Revenue 对齐的行。但是，如果我有一个查看多只股票的程序，我想确保 tr[2] 仍然查看收入。

我已经尝试了以下 XPATH 的多个版本，其中 returns 一个 NULL 值。（我正在使用 XPATH 助手 google chrome 扩展名）

//*[@id="financials"]/table/tbody/tr[contains(text(),'Revenue')]/td[1]

收入行的外部 html 代码：

<th class="row_lbl" scope="row" id="i0">Revenue&nbsp;<span>USD Mil</span></th>

2007 年收入的外部 html 代码：

<td align="right" headers="Y0 i0" class="">5,858</td>

更新

基于我写的以下答案：

//*[@id='financials']//td[contains(@headers,'i0')][1]

拉取2017年收入数据5,858

Answer 1

在 "Financials" table 中，"Revenue" 是 th，而不是 tr。您可以通过引用 td 标签的 header 属性来获取 table 的一列或一行中的所有单元格。列为 Y0..Yn，行为 i0..in，例如：

第一列有 header Y0:

//*[@id='financials']//td[contains(@headers,'Y0')]

第一行有 header i0:

//*[@id='financials']//td[contains(@headers,'i0')]

以此类推

MorningStar 上的行文本搜索列 return 需要 XPath 帮助

XPath help needed for row text search column return on MorningStar

xpath

finance

web-scraping

python-3.x