Python 抓取:尝试根据用户输入抓取特定数据(phone 详细信息)
Python Scraping: Trying to scrape a specific data (phone details) according to user input
我正在从 www.gsmarena.com 进行网络抓取。我想根据用户输入提取特定数据。此代码 returns 所有 phone 型号和名称,我只想提取具有特定输入的三星 phone 详细信息,如 RAM、ROM、CPU 和颜色.请帮帮我。
提前致谢。
import requests
from bs4 import BeautifulSoup
def link_scan(link_url):
c = 1
source_code=requests.get(link_url)
plain_text=source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.find_all('div',{'class':'brandmenu-v2 light l-box clearfix'}):
for li in link.find_all('li'):
for anc in li.find_all('a'):
anc_src = r'http://www.gsmarena.com/' + anc.get('href')
anc_name = anc.string
print(c, anc_name,"\n", anc_src, "\n")
c += 1
inside_scan(anc_name, anc_src)
def inside_scan(name, hrefs):
i = 1
source_code=requests.get(hrefs)
plain_text=source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.find_all('div',{'class':'makers'}):
for li in link.find_all('li'):
for anc in li.find_all('a'):
for nam in (sp.find('span') for sp in anc.find_all('strong')):
modal_name = nam.string
print("\t", i, "\t", name, modal_name)
i += 1
link_scan(r'http://www.gsmarena.com/')
我建议你找个时间玩 urls。在您的情况下,用户可能会询问特定的手机 phone 制造商,而目标 url 将如下所示:
https://www.gsmarena.com/samsung-phones-9.php
此外,您很幸运,因为您可以获取某个单元格 phone 的详细信息而无需重定向到它的页面。在您的情况下,每个单元格 phone 都引用具有 class 名称的锚标记,如下所示:
<a href="samsung_galaxy_m31s-10333.php">
这意味着您可以解析以“Samsung”开头的链接,以便根据用户的需要过滤查询:
https://www.gsmarena.com/samsung
要获取 CPU、RAM、e.t.c 信息,您必须引用锚标签:
<a href="samsung_galaxy_m31s-10333.php"><img src="https://fdn2.gsmarena.com/vv/bigpic/samsung-galaxy-m31s.jpg" title="Samsung Galaxy M31s Android smartphone. Announced Jul 2020. Features 6.5″ Super AMOLED display, Exynos 9611 chipset, 6000 mAh battery, 128 GB storage, 8 GB RAM, Corning Gorilla Glass 3."><strong><span>Galaxy M31s</span></strong></a>
我正在从 www.gsmarena.com 进行网络抓取。我想根据用户输入提取特定数据。此代码 returns 所有 phone 型号和名称,我只想提取具有特定输入的三星 phone 详细信息,如 RAM、ROM、CPU 和颜色.请帮帮我。 提前致谢。
import requests
from bs4 import BeautifulSoup
def link_scan(link_url):
c = 1
source_code=requests.get(link_url)
plain_text=source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.find_all('div',{'class':'brandmenu-v2 light l-box clearfix'}):
for li in link.find_all('li'):
for anc in li.find_all('a'):
anc_src = r'http://www.gsmarena.com/' + anc.get('href')
anc_name = anc.string
print(c, anc_name,"\n", anc_src, "\n")
c += 1
inside_scan(anc_name, anc_src)
def inside_scan(name, hrefs):
i = 1
source_code=requests.get(hrefs)
plain_text=source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.find_all('div',{'class':'makers'}):
for li in link.find_all('li'):
for anc in li.find_all('a'):
for nam in (sp.find('span') for sp in anc.find_all('strong')):
modal_name = nam.string
print("\t", i, "\t", name, modal_name)
i += 1
link_scan(r'http://www.gsmarena.com/')
我建议你找个时间玩 urls。在您的情况下,用户可能会询问特定的手机 phone 制造商,而目标 url 将如下所示:
https://www.gsmarena.com/samsung-phones-9.php
此外,您很幸运,因为您可以获取某个单元格 phone 的详细信息而无需重定向到它的页面。在您的情况下,每个单元格 phone 都引用具有 class 名称的锚标记,如下所示:
<a href="samsung_galaxy_m31s-10333.php">
这意味着您可以解析以“Samsung”开头的链接,以便根据用户的需要过滤查询:
https://www.gsmarena.com/samsung
要获取 CPU、RAM、e.t.c 信息,您必须引用锚标签:
<a href="samsung_galaxy_m31s-10333.php"><img src="https://fdn2.gsmarena.com/vv/bigpic/samsung-galaxy-m31s.jpg" title="Samsung Galaxy M31s Android smartphone. Announced Jul 2020. Features 6.5″ Super AMOLED display, Exynos 9611 chipset, 6000 mAh battery, 128 GB storage, 8 GB RAM, Corning Gorilla Glass 3."><strong><span>Galaxy M31s</span></strong></a>