Python 正在从 URL 中检索值

Question

我正在尝试编写一个 python 脚本来检查 money.rediff.com 特定股票价格并将其打印出来。我知道这可以通过他们的 API 轻松完成，但我想了解 urllib2 的工作原理，所以我尝试以老式的方式进行。但是，我坚持如何使用 urllib。网上很多教程问我要"Inspect element"的值，我需要return然后拆分字符串得到。但是，视频中的所有示例都具有易于拆分 HTML 标签的值，但我的是这样的：

<div class="f16">
<span id="ltpid" class="bold" style="color: rgb(0, 0, 0); background: rgb(255, 255, 255);">6.66</span> &nbsp; 
<span id="change" class="green">+0.50</span> &nbsp; 

<span id="ChangePercent" style="color: rgb(130, 130, 130); font-weight: normal;">+8.12%</span>
</div>

我只需要Line2中的“6.66”就可以了。我该怎么做呢？我对 Urllib2 和 Python 非常陌生。所有帮助将不胜感激。提前致谢。

Answer 1

你当然可以只用 urllib2 或者一个正则表达式来做到这一点，但我鼓励你使用更好的工具，即 requests and Beautiful Soup.

这是获取 "Tata Motors Ltd." 报价的完整程序：

from bs4 import BeautifulSoup
import requests

html = requests.get('http://money.rediff.com/companies/Tata-Motors-Ltd/10510008').content

soup = BeautifulSoup(html, 'html.parser')
quote = float(soup.find(id='ltpid').get_text())

print(quote)

编辑

这是一个 Python 2 版本，仅使用 urllib2 和 re:

import re
import urllib2

html = urllib2.urlopen('http://money.rediff.com/companies/Tata-Motors-Ltd/10510008').read()

quote = float(re.search('<span id="ltpid"[^>]*>([^<]*)', html).group(1))

print quote

Answer 2

BeautifulSoup有利于html解析

from bs4 import BeautifulSoup

##Use your urllib code to get the source code of the page
source = (Your get code here)
soup = BeautifulSoup(source)
##This assumes the id 'ltpid' is the one you are looking for all the time
span = soup.find('span', id="ltpid")
float(span.text)  #will return 6.66

Answer 3

使用 BeautifulSoup 而不是正则表达式来解析 HTML。

Python 正在从 URL 中检索值

Python retrieving value from URL

html

python

urllib2