从 span 标题中抓取信息

Question

我的 html 看起来像这样：

    <h3>Current Guide Price <span title="92">   92
    </span></h3>

我想获取的信息是 92。

这是我需要获取相同数据的另一个 html 页面：

    <h3>Current Guide Price <span title="4,161">    4,161
    </span></h3>

我需要从该页面获取 4,161。

这里是 link 页面以供参考： http://services.runescape.com/m=itemdb_oldschool/viewitem?obj=1613

我尝试过的：

/h3/span[@title="92"]@title

/h3/span[@title="92"]/text()

/div[@class="stats"]/h3/span[@title="4,161"]@title

由于我需要的信息在实际的 span 标签中，因此很难以动态方式获取数据以用于许多不同的页面。

Answer 1

from lxml import html
import requests


baseUrl = 'http://services.runescape.com/m=itemdb_oldschool/viewitem?obj=2355'
page = requests.get(baseUrl)

tree = html.fromstring(page.content)
price = tree.xpath('//h3/span')
price2 = tree.xpath('//h3/span/@title')
for p in price:
    print(p.text.strip())
for p2 in price2:
    print(p2)

两种情况下的输出都是92。

从 span 标题中抓取信息

Scrape info from a span title

xpath

lxml

python-3.x