Python 从 google 搜索中获取温度的脚本

Question

我正在制作一个 python 脚本，它将通过搜索关键字温度从 google 获取温度。我发现温度值存储在 span id="wob_tm" 从此检查元素代码->

<div>
<div class="vk_bk sol-tmp" style="float:left;margin-top:-3px;font-size:64px"><span id="wob_tm" class="wob_t" style="display:inline">
  18
</span><span id="wob_ttm" class="wob_t" style="display:none"> … </span>
</div>

可以看出温度18在span id="wob_tm"内。所以，我的 python 脚本是->

    from bs4 import BeautifulSoup
import requests,sys,webbrowser    

str="temperature"
res = requests.get('http://google.com/search?q=%s'%str)
res.raise_for_status()
examplesoup= BeautifulSoup(res.text,"lxml")    
linkelems=examplesoup.findAll("span",{"id":"wob_tm"})
print linkelems.string.strip()

它给了我这个错误- AttributeError: 'NoneType' 对象没有属性 'string' 如何纠正？这意味着linkelems没有元素。

Answer 1

您正在打印的 0 是 span 标签内容的长度，而不是内容本身。 string 属性将为您提供 div 标签的内容：

from bs4 import BeautifulSoup
s = """<div>
<div class="vk_bk sol-tmp" style="float:left;margin-top:-3px;font-size:64px">
<span id="wob_tm" class="wob_t" style="display:inline">
18
</span><span id="wob_ttm" class="wob_t" style="display:none"> … </span>
</div>"""
soup = BeautifulSoup(s)
temperature = soup.find("span", id="wob_tm")
print(temperature.string.strip())
# 18

Answer 2

我运行这段代码（使用Python3和bs4）得到了span标签的字符串。

from bs4 import BeautifulSoup
html_snippet = """<div>
<div class="vk_bk sol-tmp" style="float:left;margin-top:-3px;font-size:64px"><span id="wob_tm" class="wob_t" style="display:inline">18</span><span id="wob_ttm" class="wob_t" style="display:none"> ... </span></div>"""

soup = BeautifulSoup(html_snippet)
temp = soup.find("span", id='wob_tm')

print(temp.string)

Answer 3

根据一些实验，Google 发送的结果似乎会根据它认为您使用的浏览器而略有不同。例如，当我使用 Firefox 时，我会看到带有 id 'wob_tm' 的跨度，但当您的代码运行时，默认情况下不会。（我确实得到了具有温度的 class wob_t 的跨度，但我也得到了其他 10 个 wob_t 跨度）。尝试将用户代理设置为流行的浏览器，如下所示：

str="temperature"

headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1'
}

res = requests.get('http://www.google.com/search?q=%s' % str, headers=headers)
res.raise_for_status()
examplesoup=BeautifulSoup(res.text,'lxml')
linkelems=examplesoup.findAll('span', {'id': 'wob_tm'}) # This now has an element in it

Answer 4

确保您使用的是 user-agent，这样 Google 就不会将您的请求视为 python-requests，这是默认的 requests User-Agent。如果只需要提取温度数据，可以使用.select_one() bs4方法。

>>> soup.select_one('#wob_tm').text
'85°F'

提取更多的代码和示例in the online IDE:

from bs4 import BeautifulSoup
import requests, lxml

headers = {
  "User-Agent":
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

params = {
  "q": "london weather",
  "hl": "en",
}

response = requests.get('https://www.google.com/search', headers=headers, params=params).text
soup = BeautifulSoup(response, 'lxml')

tempature = soup.select_one('#wob_tm').text
print(f'Tempature: {tempature}')

---
# Tempature: 73°F

或者，您可以使用 SerpApi 中的 Google Direct Answer Box API。这是付费 API 和免费计划。

要集成的代码：

from serpapi import GoogleSearch
import os

params = {
  "engine": "google",
  "q": "london weather",
  "api_key": os.getenv("API_KEY"),
  "hl": "en",
}

search = GoogleSearch(params)
results = search.get_dict()

loc = results['answer_box']['location']
weather_date = results['answer_box']['date']
weather = results['answer_box']['weather']
temp = results['answer_box']['temperature']
unit = results['answer_box']['unit']
precipitation = results['answer_box']['precipitation']
humidity = results['answer_box']['humidity']
wind = results['answer_box']['wind']

forecast = results['answer_box']['forecast']

print(f'{loc}\n{weather_date}\n{weather}\n{temp}\n{unit}\n{precipitation}\n{humidity}\n{wind}\n{forecast}')

---------
'''
London, UK
Wednesday 1:00 PM
Partly cloudy
73°F
0%
55%
7 mph

[{'day': 'Wednesday', 'weather': 'Partly cloudy', 'temperature': {'high': '74', 'low': '59'}, 'thumbnail': 'https://ssl.gstatic.com/onebox/weather/48/partly_cloudy.png'}..]
'''

Disclaimer, I work for SerpApi.

Python 从 google 搜索中获取温度的脚本

Python script to get temperature from google search

python

lxml

beautifulsoup

python-requests