在 Raspberry Pi 中阅读 URL

Question

我想读取 URL 中的数据。例如，如果我有这个 URL:

http://robolab.in/home-automation.html#ON

我想读取状态 'ON'，留下 URL 的其余部分。如何做到这一点？

Answer 1

您正在尝试做的是所谓的网页抓取。在 python 使用 urllib/urllib2 库你可以实现这个目标。

import urllib

try:
    html=urllib.urlopen('http://robolab.in/home-automation.html#ON')
    htmltext=html.read()
except:
    print 'error opening link'

print htmltext

这将打印您的浏览器向您显示的 html 文本。现在这只是一个字符串......你可以随心所欲地操纵它。但是，如果您安装了 BeautifulSoup，您可以编写如下代码：

from bs4 import BeautifulSoup

soup=BeautifulSoup(htmltext)
for script in soup(["script", "style"]):
    script.extract()
text = soup.get_text()
print text

使用此代码并根据您的 url 我得到了这个：

Robolab Technologies
Home Automation

OFF

你可以轻松地继续

status=''
text=text.strip()
for index,line in enumerate(text):
    if index>3:
        status = line
if 'ON' in status:
    print "it's on"
else:
    print "it's off"

在 Raspberry Pi 中阅读 URL

Read a URL in Raspberry Pi

python

webserver

python-2.7

web

raspberry-pi