如何在 Google Play 搜索中抓取所有 App Store 应用
How to scrape all App Store apps on a Google Play Search
我正在尝试使用 find_all()
,但在查找特定信息的标签时似乎遇到了问题。
我很想构建一个包装器,这样我就可以从应用商店中提取数据,例如标题、发布者等(public HTML 信息)。
代码不正确,我知道。我能找到的最接近 div
标识符的是 "c4"
.
任何见解都有帮助。
# Imports
import requests
from bs4 import BeautifulSoup
# Data Defining
url = "https://play.google.com/store/search?q=weather%20app"
# Getting HTML
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
soup.get_text()
results = soup.find_all(id="c4")
我期待不同天气应用程序和信息的输出:
Weather App 1
Develop Company 1
Google Weather App
Develop Company 2
Bing Weather App
Bing Developers
我从 url
得到以下输出
from bs4 import BeautifulSoup
import requests
url='https://play.google.com/store/search?q=weather%20app'
req=requests.get(url)
soup = BeautifulSoup(req.content, 'html.parser')
cards= soup.find_all("div",class_="vU6FJ p63iDd")
for card in cards:
app_name= card.find("div",class_="WsMG1c nnK0zc").text
company = card.find("div",class_="KoLSrc").text
print("Name: " + app_name)
print("Company: " + company)
输出:
Name: Weather app
Company: Accurate Weather Forecast & Weather Radar Map
Name: AccuWeather: Weather Radar
Company: AccuWeather
Name: Weather Forecast - Accurate Local Weather & Widget
Company: Weather Forecast & Widget & Radar
Name: 1Weather Forecasts & Radar
Company: OneLouder Apps
Name: MyRadar Weather Radar
Company: ACME AtronOmatic LLC
Name: Weather data & microclimate : Weather Underground
Company: Weather Underground
Name: Weather & Widget - Weawow
Company: weawow weather app
Name: Weather forecast
Company: smart-pro android apps
Name: The Secret World of Weather: How to Read Signs in Every Cloud, Breeze, Hill, Street, Plant, Animal, and Dewdrop
Company: Tristan Gooley
Name: The Weather Machine: A Journey Inside the Forecast
Company: Andrew Blum
Name: The Mobile Mind Shift: Engineer Your Business to Win in the Mobile Moment
Company: Julie Ask
Name: Together: The Healing Power of Human Connection in a Sometimes Lonely World
Company: Vivek H. Murthy
Name: The Meadow
Company: James Galvin
Name: The Ancient Egyptian Culture Revealed, 2nd edition
Company: Moustafa Gadalla
Name: The Ancient Egyptian Culture Revealed, 2nd edition
Company: Moustafa Gadalla
Name: Chaos Theory
Company: Introbooks Team
Name: Survival Training: Killer Tips for Toughness and Secret Smart Survival Skills
Company: Wesley Jones
Name: Kiasunomics 2: Economic Insights for Everyday Life
Company: Ang Swee Hoon
Name: Summary of We Are The Weather by Jonathan Safran Foer
Company: QuickRead
Name: Learn Swift by Building Applications: Explore Swift programming through iOS app development
Company: Emil Atanasov
Name: Weather Hazard Warning Application in Car-to-X Communication: Concepts, Implementations, and Evaluations
Company: Attila Jaeger
Name: Mobile App Development with Ionic, Revised Edition: Cross-Platform Apps with Ionic,
Angular, and Cordova
Company: Chris Griffith
Name: Good Application Makes a Good Roof Better: A Simplified Guide: Installing Laminated
Asphalt Shingles for Maximum Life & Weather Protection
Company: ARMA Asphalt Roofing Manufacturers Association
Name: The Secret World of Weather: How to Read Signs in Every Cloud, Breeze, Hill, Street, Plant, Animal, and Dewdrop
Company: Tristan Gooley
Name: The Weather Machine: A Journey Inside the Forecast
Company: Andrew Blum
Name: Space Physics and Aeronomy, Space Weather Effects and Applications
Company: Book 5
Name: How to Build Android Apps with Kotlin: A hands-on guide to developing, testing, and
publishing your first apps with Android
Company: Alex Forrester
Name: Android 6 for Programmers: An App-Driven Approach, Edition 3
Company: Paul J. Deitel
注意 基于极其动态生成的标识符(例如 class 名称)工作只是部分可靠。
因此,该策略应该基于更多的常量标识符,例如 tags
及其结构,或者在某些情况下,ids
:
for e in soup.select('a[href^="/store/apps/details?id"]:has(div[title])'):
data.append({
'title': e.select_one('div[title]').get('title'),
'company':e.find_next('a').text,
'url':'https://play.google.com'+e.get('href')
})
例子
另外 注意 真正的应用程序搜索应该参考 https://play.google.com/store/search?q=weather&c=apps 并且要获得所有这些应用程序,您必须处理动态呈现/加载的内容和滚动 - 就是这样为什么这个例子基于 selenium
:
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
url = 'https://play.google.com/store/search?q=weather&c=apps'
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
driver.get(url)
wait = WebDriverWait(driver, 10)
while True:
last_height = driver.execute_script("return window.pageYOffset + window.innerHeight")
e = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'a[href="https://policies.google.com/privacy"]')))[-1]
driver.execute_script("arguments[0].scrollIntoView();",e)
time.sleep(0.5)
if last_height == driver.execute_script("return window.pageYOffset + window.innerHeight"):
break
else:
last_height = driver.execute_script("return window.pageYOffset + window.innerHeight")
soup = BeautifulSoup(driver.page_source)
data = []
for e in soup.select('a[href^="/store/apps/details?id"]:has(div[title])'):
data.append({
'title': e.select_one('div[title]').get('title'),
'company':e.find_next('a').text,
'url':'https://play.google.com'+e.get('href')
})
print(pd.DataFrame(data).to_csv('app.csv', index=False)
输出
...
确保您使用的是 user-agent
to act as a "real" user request as sometimes you can receive a different HTML with different elements and selectors and some sort of an error because of not passing user-agent
to request headers。
Check what's your user-agent
并尽可能更新它,因为如果 user-agent
是旧的,例如使用 Chrome 70 版本,网站可能会阻止请求。
此外,请查看 SelectorGadget Chrome 扩展程序,通过在浏览器中单击所需的元素来直观地抓取 CSS 选择器。
代码和full example in the online IDE:
from bs4 import BeautifulSoup
import requests, json, lxml, re
# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
params = {
"q": "weather", # search query
"c": "apps" # display list of apps
}
# https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Safari/537.36",
}
html = requests.get("https://play.google.com/store/search", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(html.text, "lxml")
apps_data = []
for app in soup.select(".mpg5gc"):
title = app.select_one(".nnK0zc").text
company = app.select_one(".b8cIId.KoLSrc").text
description = app.select_one(".b8cIId.f5NCO a").text
app_link = f'https://play.google.com{app.select_one(".b8cIId.Q9MA7b a")["href"]}'
developer_link = f'https://play.google.com{app.select_one(".b8cIId.KoLSrc a")["href"]}'
app_id = app.select_one(".b8cIId a")["href"].split("id=")[1]
developer_id = app.select_one(".b8cIId.KoLSrc a")["href"].split("id=")[1]
try:
# https://regex101.com/r/SZLPRp/1
rating = re.search(r"\d{1}\.\d{1}", app.select_one(".pf5lIe div[role=img]")["aria-label"]).group(0)
except:
rating = None
thumbnail = app.select_one(".yNWQ8e img")["data-src"]
apps_data.append({
"title": title,
"description": description,
"rating": float(rating) if rating else rating, # float if rating is not None else rating or None
"app_link": app_link,
"developer_link": developer_link,
"app_id": app_id,
"developer_id": developer_id,
"thumbnail": thumbnail
})
print(json.dumps(apps_data, indent=2, ensure_ascii=False))
部分输出:
[
{
"title": "Weather app",
"company": "Accurate Weather Forecast & Weather Radar Map",
"description": "The weather channel, tiempo weather forecast, weather radar & weather map",
"rating": 4.6,
"app_link": "https://play.google.com/store/apps/details?id=com.weather.forecast.weatherchannel",
"developer_link": "https://play.google.com/store/apps/developer?id=Accurate+Weather+Forecast+%26+Weather+Radar+Map",
"app_id": "com.weather.forecast.weatherchannel",
"developer_id": "Accurate+Weather+Forecast+%26+Weather+Radar+Map",
"thumbnail": "https://play-lh.googleusercontent.com/GdXjVGXQ90eVNpb1VoXWGT3pff2M9oe3yDdYGIsde7W9h3s2S6FDLfo1uO-gljBZ1QXO=s128-rw"
},
{
"title": "The Weather Channel - Radar",
"company": "The Weather Channel",
"description": "Weather Forecast & Snow Radar: local rain tracker, weather maps & alerts",
"rating": 4.6,
"app_link": "https://play.google.com/store/apps/details?id=com.weather.Weather",
"developer_link": "https://play.google.com/store/apps/dev?id=5938833519207566184",
"app_id": "com.weather.Weather",
"developer_id": "5938833519207566184",
"thumbnail": "https://play-lh.googleusercontent.com/RV3DftXlA7WUV7w-BpE8zM0X7Y4RQd2vBvZVv6A01DEGb_eXFRjLmUhSqdbqrEl9klI=s128-rw"
},
{
"title": "Weather - By Xiaomi",
"company": "Xiaomi Inc.",
"description": "Always with you, rain or shine. Get temperature, forecast, AQI for you city.",
"rating": 4.4,
"app_link": "https://play.google.com/store/apps/details?id=com.miui.weather2",
"developer_link": "https://play.google.com/store/apps/dev?id=5113340212256272297",
"app_id": "com.miui.weather2",
"developer_id": "5113340212256272297",
"thumbnail": "https://play-lh.googleusercontent.com/sAZ2AZ16r5ThHiYCTWg8x1UUNQOhsxexRaDrDZKDlUy1hoZlggen6QogpJmQk8BwmgI=s128-rw"
}, ... other results
]
另一种解决方案是使用 SerpApi 中的 Google Play Store API。这是付费 API 和免费计划。
不同之处在于不需要从头开始创建解析器、维护它、找出如何提取数据、绕过 Google 或其他搜索引擎的块。
要集成的代码:
from serpapi import GoogleSearch
import json
params = {
"api_key": "API KEY", # your serpapi api key
"engine": "google_play", # search engine
"hl": "en", # language
"store": "apps", # apps search
"gl": "us", # country to search from. Different country displays different.
"q": "weather" # search query
}
search = GoogleSearch(params) # where data extracts
results = search.get_dict() # JSON -> Python dictionary
apps_data = []
for apps in results["organic_results"]:
for app in apps["items"]:
apps_data.append({
"title": app.get("title"),
"link": app.get("link"),
"description": app.get("description"),
"product_id": app.get("product_id"),
"rating": app.get("rating"),
"thumbnail": app.get("thumbnail"),
})
print(json.dumps(apps_data, indent=2, ensure_ascii=False))
部分输出(contains other data you can see in the Playground):
[
{
"title": "Weather app",
"link": "https://play.google.com/store/apps/details?id=com.weather.forecast.weatherchannel",
"description": "The weather channel, tiempo weather forecast, weather radar & weather map",
"product_id": "com.weather.forecast.weatherchannel",
"rating": 4.7,
"thumbnail": "https://play-lh.googleusercontent.com/GdXjVGXQ90eVNpb1VoXWGT3pff2M9oe3yDdYGIsde7W9h3s2S6FDLfo1uO-gljBZ1QXO=s128-rw"
},
{
"title": "The Weather Channel - Radar",
"link": "https://play.google.com/store/apps/details?id=com.weather.Weather",
"description": "Weather Forecast & Snow Radar: local rain tracker, weather maps & alerts",
"product_id": "com.weather.Weather",
"rating": 4.6,
"thumbnail": "https://play-lh.googleusercontent.com/RV3DftXlA7WUV7w-BpE8zM0X7Y4RQd2vBvZVv6A01DEGb_eXFRjLmUhSqdbqrEl9klI=s128-rw"
},
{
"title": "AccuWeather: Weather Radar",
"link": "https://play.google.com/store/apps/details?id=com.accuweather.android",
"description": "Your local weather forecast, storm tracker, radar maps & live weather news",
"product_id": "com.accuweather.android",
"rating": 4.0,
"thumbnail": "https://play-lh.googleusercontent.com/EgDT3XrIaJbhZjINCWsiqjzonzqve7LgAbim8kHXWgg6fZnQebqIWjE6UcGahJ6yugU=s128-rw"
},
{
"title": "Weather by WeatherBug",
"link": "https://play.google.com/store/apps/details?id=com.aws.android",
"description": "The Most Accurate Weather Forecast. Alerts, Radar, Maps & News from WeatherBug",
"product_id": "com.aws.android",
"rating": 4.7,
"thumbnail": "https://play-lh.googleusercontent.com/_rZCkobaGZzXN3iquPr4u2KOe7C-ljnrSkBfw6sVL1kpUfq3sBl5MoRJEisBSnxaD-M=s128-rw"
}, ... other results
]
我还有一个专门的 Scrape Google Play Search Apps in Python 博客 post,其中的 step-by-step 解释对这个答案来说太过分了。
Disclaimer, I work for SerpApi.
我正在尝试使用 find_all()
,但在查找特定信息的标签时似乎遇到了问题。
我很想构建一个包装器,这样我就可以从应用商店中提取数据,例如标题、发布者等(public HTML 信息)。
代码不正确,我知道。我能找到的最接近 div
标识符的是 "c4"
.
任何见解都有帮助。
# Imports
import requests
from bs4 import BeautifulSoup
# Data Defining
url = "https://play.google.com/store/search?q=weather%20app"
# Getting HTML
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
soup.get_text()
results = soup.find_all(id="c4")
我期待不同天气应用程序和信息的输出:
Weather App 1
Develop Company 1
Google Weather App
Develop Company 2
Bing Weather App
Bing Developers
我从 url
得到以下输出from bs4 import BeautifulSoup
import requests
url='https://play.google.com/store/search?q=weather%20app'
req=requests.get(url)
soup = BeautifulSoup(req.content, 'html.parser')
cards= soup.find_all("div",class_="vU6FJ p63iDd")
for card in cards:
app_name= card.find("div",class_="WsMG1c nnK0zc").text
company = card.find("div",class_="KoLSrc").text
print("Name: " + app_name)
print("Company: " + company)
输出:
Name: Weather app
Company: Accurate Weather Forecast & Weather Radar Map
Name: AccuWeather: Weather Radar
Company: AccuWeather
Name: Weather Forecast - Accurate Local Weather & Widget
Company: Weather Forecast & Widget & Radar
Name: 1Weather Forecasts & Radar
Company: OneLouder Apps
Name: MyRadar Weather Radar
Company: ACME AtronOmatic LLC
Name: Weather data & microclimate : Weather Underground
Company: Weather Underground
Name: Weather & Widget - Weawow
Company: weawow weather app
Name: Weather forecast
Company: smart-pro android apps
Name: The Secret World of Weather: How to Read Signs in Every Cloud, Breeze, Hill, Street, Plant, Animal, and Dewdrop
Company: Tristan Gooley
Name: The Weather Machine: A Journey Inside the Forecast
Company: Andrew Blum
Name: The Mobile Mind Shift: Engineer Your Business to Win in the Mobile Moment
Company: Julie Ask
Name: Together: The Healing Power of Human Connection in a Sometimes Lonely World
Company: Vivek H. Murthy
Name: The Meadow
Company: James Galvin
Name: The Ancient Egyptian Culture Revealed, 2nd edition
Company: Moustafa Gadalla
Name: The Ancient Egyptian Culture Revealed, 2nd edition
Company: Moustafa Gadalla
Name: Chaos Theory
Company: Introbooks Team
Name: Survival Training: Killer Tips for Toughness and Secret Smart Survival Skills
Company: Wesley Jones
Name: Kiasunomics 2: Economic Insights for Everyday Life
Company: Ang Swee Hoon
Name: Summary of We Are The Weather by Jonathan Safran Foer
Company: QuickRead
Name: Learn Swift by Building Applications: Explore Swift programming through iOS app development
Company: Emil Atanasov
Name: Weather Hazard Warning Application in Car-to-X Communication: Concepts, Implementations, and Evaluations
Company: Attila Jaeger
Name: Mobile App Development with Ionic, Revised Edition: Cross-Platform Apps with Ionic,
Angular, and Cordova
Company: Chris Griffith
Name: Good Application Makes a Good Roof Better: A Simplified Guide: Installing Laminated
Asphalt Shingles for Maximum Life & Weather Protection
Company: ARMA Asphalt Roofing Manufacturers Association
Name: The Secret World of Weather: How to Read Signs in Every Cloud, Breeze, Hill, Street, Plant, Animal, and Dewdrop
Company: Tristan Gooley
Name: The Weather Machine: A Journey Inside the Forecast
Company: Andrew Blum
Name: Space Physics and Aeronomy, Space Weather Effects and Applications
Company: Book 5
Name: How to Build Android Apps with Kotlin: A hands-on guide to developing, testing, and
publishing your first apps with Android
Company: Alex Forrester
Name: Android 6 for Programmers: An App-Driven Approach, Edition 3
Company: Paul J. Deitel
注意 基于极其动态生成的标识符(例如 class 名称)工作只是部分可靠。
因此,该策略应该基于更多的常量标识符,例如 tags
及其结构,或者在某些情况下,ids
:
for e in soup.select('a[href^="/store/apps/details?id"]:has(div[title])'):
data.append({
'title': e.select_one('div[title]').get('title'),
'company':e.find_next('a').text,
'url':'https://play.google.com'+e.get('href')
})
例子
另外 注意 真正的应用程序搜索应该参考 https://play.google.com/store/search?q=weather&c=apps 并且要获得所有这些应用程序,您必须处理动态呈现/加载的内容和滚动 - 就是这样为什么这个例子基于 selenium
:
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
url = 'https://play.google.com/store/search?q=weather&c=apps'
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
driver.get(url)
wait = WebDriverWait(driver, 10)
while True:
last_height = driver.execute_script("return window.pageYOffset + window.innerHeight")
e = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'a[href="https://policies.google.com/privacy"]')))[-1]
driver.execute_script("arguments[0].scrollIntoView();",e)
time.sleep(0.5)
if last_height == driver.execute_script("return window.pageYOffset + window.innerHeight"):
break
else:
last_height = driver.execute_script("return window.pageYOffset + window.innerHeight")
soup = BeautifulSoup(driver.page_source)
data = []
for e in soup.select('a[href^="/store/apps/details?id"]:has(div[title])'):
data.append({
'title': e.select_one('div[title]').get('title'),
'company':e.find_next('a').text,
'url':'https://play.google.com'+e.get('href')
})
print(pd.DataFrame(data).to_csv('app.csv', index=False)
输出
...
确保您使用的是 user-agent
to act as a "real" user request as sometimes you can receive a different HTML with different elements and selectors and some sort of an error because of not passing user-agent
to request headers。
Check what's your user-agent
并尽可能更新它,因为如果 user-agent
是旧的,例如使用 Chrome 70 版本,网站可能会阻止请求。
此外,请查看 SelectorGadget Chrome 扩展程序,通过在浏览器中单击所需的元素来直观地抓取 CSS 选择器。
代码和full example in the online IDE:
from bs4 import BeautifulSoup
import requests, json, lxml, re
# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
params = {
"q": "weather", # search query
"c": "apps" # display list of apps
}
# https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Safari/537.36",
}
html = requests.get("https://play.google.com/store/search", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(html.text, "lxml")
apps_data = []
for app in soup.select(".mpg5gc"):
title = app.select_one(".nnK0zc").text
company = app.select_one(".b8cIId.KoLSrc").text
description = app.select_one(".b8cIId.f5NCO a").text
app_link = f'https://play.google.com{app.select_one(".b8cIId.Q9MA7b a")["href"]}'
developer_link = f'https://play.google.com{app.select_one(".b8cIId.KoLSrc a")["href"]}'
app_id = app.select_one(".b8cIId a")["href"].split("id=")[1]
developer_id = app.select_one(".b8cIId.KoLSrc a")["href"].split("id=")[1]
try:
# https://regex101.com/r/SZLPRp/1
rating = re.search(r"\d{1}\.\d{1}", app.select_one(".pf5lIe div[role=img]")["aria-label"]).group(0)
except:
rating = None
thumbnail = app.select_one(".yNWQ8e img")["data-src"]
apps_data.append({
"title": title,
"description": description,
"rating": float(rating) if rating else rating, # float if rating is not None else rating or None
"app_link": app_link,
"developer_link": developer_link,
"app_id": app_id,
"developer_id": developer_id,
"thumbnail": thumbnail
})
print(json.dumps(apps_data, indent=2, ensure_ascii=False))
部分输出:
[
{
"title": "Weather app",
"company": "Accurate Weather Forecast & Weather Radar Map",
"description": "The weather channel, tiempo weather forecast, weather radar & weather map",
"rating": 4.6,
"app_link": "https://play.google.com/store/apps/details?id=com.weather.forecast.weatherchannel",
"developer_link": "https://play.google.com/store/apps/developer?id=Accurate+Weather+Forecast+%26+Weather+Radar+Map",
"app_id": "com.weather.forecast.weatherchannel",
"developer_id": "Accurate+Weather+Forecast+%26+Weather+Radar+Map",
"thumbnail": "https://play-lh.googleusercontent.com/GdXjVGXQ90eVNpb1VoXWGT3pff2M9oe3yDdYGIsde7W9h3s2S6FDLfo1uO-gljBZ1QXO=s128-rw"
},
{
"title": "The Weather Channel - Radar",
"company": "The Weather Channel",
"description": "Weather Forecast & Snow Radar: local rain tracker, weather maps & alerts",
"rating": 4.6,
"app_link": "https://play.google.com/store/apps/details?id=com.weather.Weather",
"developer_link": "https://play.google.com/store/apps/dev?id=5938833519207566184",
"app_id": "com.weather.Weather",
"developer_id": "5938833519207566184",
"thumbnail": "https://play-lh.googleusercontent.com/RV3DftXlA7WUV7w-BpE8zM0X7Y4RQd2vBvZVv6A01DEGb_eXFRjLmUhSqdbqrEl9klI=s128-rw"
},
{
"title": "Weather - By Xiaomi",
"company": "Xiaomi Inc.",
"description": "Always with you, rain or shine. Get temperature, forecast, AQI for you city.",
"rating": 4.4,
"app_link": "https://play.google.com/store/apps/details?id=com.miui.weather2",
"developer_link": "https://play.google.com/store/apps/dev?id=5113340212256272297",
"app_id": "com.miui.weather2",
"developer_id": "5113340212256272297",
"thumbnail": "https://play-lh.googleusercontent.com/sAZ2AZ16r5ThHiYCTWg8x1UUNQOhsxexRaDrDZKDlUy1hoZlggen6QogpJmQk8BwmgI=s128-rw"
}, ... other results
]
另一种解决方案是使用 SerpApi 中的 Google Play Store API。这是付费 API 和免费计划。
不同之处在于不需要从头开始创建解析器、维护它、找出如何提取数据、绕过 Google 或其他搜索引擎的块。
要集成的代码:
from serpapi import GoogleSearch
import json
params = {
"api_key": "API KEY", # your serpapi api key
"engine": "google_play", # search engine
"hl": "en", # language
"store": "apps", # apps search
"gl": "us", # country to search from. Different country displays different.
"q": "weather" # search query
}
search = GoogleSearch(params) # where data extracts
results = search.get_dict() # JSON -> Python dictionary
apps_data = []
for apps in results["organic_results"]:
for app in apps["items"]:
apps_data.append({
"title": app.get("title"),
"link": app.get("link"),
"description": app.get("description"),
"product_id": app.get("product_id"),
"rating": app.get("rating"),
"thumbnail": app.get("thumbnail"),
})
print(json.dumps(apps_data, indent=2, ensure_ascii=False))
部分输出(contains other data you can see in the Playground):
[
{
"title": "Weather app",
"link": "https://play.google.com/store/apps/details?id=com.weather.forecast.weatherchannel",
"description": "The weather channel, tiempo weather forecast, weather radar & weather map",
"product_id": "com.weather.forecast.weatherchannel",
"rating": 4.7,
"thumbnail": "https://play-lh.googleusercontent.com/GdXjVGXQ90eVNpb1VoXWGT3pff2M9oe3yDdYGIsde7W9h3s2S6FDLfo1uO-gljBZ1QXO=s128-rw"
},
{
"title": "The Weather Channel - Radar",
"link": "https://play.google.com/store/apps/details?id=com.weather.Weather",
"description": "Weather Forecast & Snow Radar: local rain tracker, weather maps & alerts",
"product_id": "com.weather.Weather",
"rating": 4.6,
"thumbnail": "https://play-lh.googleusercontent.com/RV3DftXlA7WUV7w-BpE8zM0X7Y4RQd2vBvZVv6A01DEGb_eXFRjLmUhSqdbqrEl9klI=s128-rw"
},
{
"title": "AccuWeather: Weather Radar",
"link": "https://play.google.com/store/apps/details?id=com.accuweather.android",
"description": "Your local weather forecast, storm tracker, radar maps & live weather news",
"product_id": "com.accuweather.android",
"rating": 4.0,
"thumbnail": "https://play-lh.googleusercontent.com/EgDT3XrIaJbhZjINCWsiqjzonzqve7LgAbim8kHXWgg6fZnQebqIWjE6UcGahJ6yugU=s128-rw"
},
{
"title": "Weather by WeatherBug",
"link": "https://play.google.com/store/apps/details?id=com.aws.android",
"description": "The Most Accurate Weather Forecast. Alerts, Radar, Maps & News from WeatherBug",
"product_id": "com.aws.android",
"rating": 4.7,
"thumbnail": "https://play-lh.googleusercontent.com/_rZCkobaGZzXN3iquPr4u2KOe7C-ljnrSkBfw6sVL1kpUfq3sBl5MoRJEisBSnxaD-M=s128-rw"
}, ... other results
]
我还有一个专门的 Scrape Google Play Search Apps in Python 博客 post,其中的 step-by-step 解释对这个答案来说太过分了。
Disclaimer, I work for SerpApi.