正在爬'UserWarning' 怎么办?
Crawling 'UserWarning' What should I do?
我在 google 上找到了这个网络爬虫,并且
1个月前它工作得很好,但现在不工作了。
我不知道发生了什么事。
怎么了?我该如何解决这个问题?
代码
from urllib.request import urlopen
from urllib.request import urlretrieve
from urllib.parse import quote_plus
from bs4 import BeautifulSoup
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
search= input('검색어:')
url = f'https://www.google.com/search?q={quote_plus(search)}&source=inms&tbm=isch&sa=X&ved=2haUKEwid64aF87LoAhUafd4KHcEtBZEQ_AUoAXoECBgQAw&biw=1536&bih=754'
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(url)
for i in range(500):
driver.execute_script("window.scrollBy(0,10000)")
html = driver.page_source
soup = BeautifulSoup(html)
img = soup.select('.rg_i.Q4LuWd.tx8vtf')
n = 1
imgurl = []
for i in img:
try:
imgurl.append(i.attrs['src'])
except KeyError:
imgurl.append(i.attrs["data-src"])
for i in imgurl:
urlretrieve(i,"크롤링 예예/"+ search + str(n)+ ".jpg")
n +=1
print(imgurl)
if (n==15):
break
driver.close()
错误信息
[WDM] - Cache is valid for [03/07/2020]
[WDM] - Looking for [chromedriver 83.0.4103.39 win32] driver in cache
[WDM] - Driver found in cache [C:\Users\u\.wdm\drivers\chromedriver.0.4103.39\win32\chromedriver.exe]
DevTools listening on ws://127.0.0.1:57086/devtools/browser/fc2f441e-49f8-466c-aa17-7e29c3e27ac2
yt.py:17: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
The code that caused this warning is on line 17 of the file yt.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.
soup = BeautifulSoup(html)
谢谢你的帮助。
你需要改变这个
soup = BeautifulSoup(html)
至
soup = BeautifulSoup(html, 'lxml')
并且该警告应该消失
我在 google 上找到了这个网络爬虫,并且
1个月前它工作得很好,但现在不工作了。
我不知道发生了什么事。
怎么了?我该如何解决这个问题?
代码
from urllib.request import urlopen
from urllib.request import urlretrieve
from urllib.parse import quote_plus
from bs4 import BeautifulSoup
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
search= input('검색어:')
url = f'https://www.google.com/search?q={quote_plus(search)}&source=inms&tbm=isch&sa=X&ved=2haUKEwid64aF87LoAhUafd4KHcEtBZEQ_AUoAXoECBgQAw&biw=1536&bih=754'
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(url)
for i in range(500):
driver.execute_script("window.scrollBy(0,10000)")
html = driver.page_source
soup = BeautifulSoup(html)
img = soup.select('.rg_i.Q4LuWd.tx8vtf')
n = 1
imgurl = []
for i in img:
try:
imgurl.append(i.attrs['src'])
except KeyError:
imgurl.append(i.attrs["data-src"])
for i in imgurl:
urlretrieve(i,"크롤링 예예/"+ search + str(n)+ ".jpg")
n +=1
print(imgurl)
if (n==15):
break
driver.close()
错误信息
[WDM] - Cache is valid for [03/07/2020]
[WDM] - Looking for [chromedriver 83.0.4103.39 win32] driver in cache
[WDM] - Driver found in cache [C:\Users\u\.wdm\drivers\chromedriver.0.4103.39\win32\chromedriver.exe]
DevTools listening on ws://127.0.0.1:57086/devtools/browser/fc2f441e-49f8-466c-aa17-7e29c3e27ac2
yt.py:17: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
The code that caused this warning is on line 17 of the file yt.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.
soup = BeautifulSoup(html)
谢谢你的帮助。
你需要改变这个
soup = BeautifulSoup(html)
至
soup = BeautifulSoup(html, 'lxml')
并且该警告应该消失