BeautifulSoup 不工作,收到 NoneType 错误
BeautifulSoup not working, getting NoneType error
我正在使用以下代码(取自retrieve links from web page using python and BeautifulSoup):
import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer
http = httplib2.Http()
status, response = http.request('http://www.nytimes.com')
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
if link.has_attr('href'):
print link['href']
但是,我不明白为什么会收到以下错误消息:
Traceback (most recent call last):
File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in <module>
if link.has_attr('href'):
TypeError: 'NoneType' object is not callable
BeautifulSoup 3.2.0
Python2.7
编辑:
我尝试了可用于类似问题的解决方案 (Type error if link.has_attr('href'): TypeError: 'NoneType' object is not callable),但它给了我以下错误:
Traceback (most recent call last):
File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 12, in <module>
for link in BeautifulSoup(response).find_all('a', href=True):
TypeError: 'NoneType' object is not callable
首先:
from BeautifulSoup import BeautifulSoup, SoupStrainer
您正在使用 BeautifulSoup
version 3 which is no longer maintained. Switch to BeautifulSoup
version 4。通过以下方式安装:
pip install beautifulsoup4
并将您的导入更改为:
from bs4 import BeautifulSoup
还有:
Traceback (most recent call last):
File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in
if link.has_attr('href'):
TypeError: 'NoneType' object is not callable
此处 link
是一个 Tag
实例,它没有 has_attr
方法。这意味着,记住什么是 dot notation means in BeautifulSoup
,它会尝试在 link
元素内搜索元素 has_attr
,结果什么也找不到。换句话说,link.has_attr
是 None
,显然 None('href')
会导致错误。
相反,执行:
soup = BeautifulSoup(response, parse_only=SoupStrainer('a', href=True))
for link in soup.find_all("a", href=True):
print(link['href'])
仅供参考,这是我用来调试您的问题的完整工作代码(使用 requests
):
import requests
from bs4 import BeautifulSoup, SoupStrainer
response = requests.get('http://www.nytimes.com').content
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a', href=True)).find_all("a", href=True):
print(link['href'])
我正在使用以下代码(取自retrieve links from web page using python and BeautifulSoup):
import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer
http = httplib2.Http()
status, response = http.request('http://www.nytimes.com')
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
if link.has_attr('href'):
print link['href']
但是,我不明白为什么会收到以下错误消息:
Traceback (most recent call last):
File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in <module>
if link.has_attr('href'):
TypeError: 'NoneType' object is not callable
BeautifulSoup 3.2.0 Python2.7
编辑:
我尝试了可用于类似问题的解决方案 (Type error if link.has_attr('href'): TypeError: 'NoneType' object is not callable),但它给了我以下错误:
Traceback (most recent call last):
File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 12, in <module>
for link in BeautifulSoup(response).find_all('a', href=True):
TypeError: 'NoneType' object is not callable
首先:
from BeautifulSoup import BeautifulSoup, SoupStrainer
您正在使用 BeautifulSoup
version 3 which is no longer maintained. Switch to BeautifulSoup
version 4。通过以下方式安装:
pip install beautifulsoup4
并将您的导入更改为:
from bs4 import BeautifulSoup
还有:
Traceback (most recent call last): File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in if link.has_attr('href'): TypeError: 'NoneType' object is not callable
此处 link
是一个 Tag
实例,它没有 has_attr
方法。这意味着,记住什么是 dot notation means in BeautifulSoup
,它会尝试在 link
元素内搜索元素 has_attr
,结果什么也找不到。换句话说,link.has_attr
是 None
,显然 None('href')
会导致错误。
相反,执行:
soup = BeautifulSoup(response, parse_only=SoupStrainer('a', href=True))
for link in soup.find_all("a", href=True):
print(link['href'])
仅供参考,这是我用来调试您的问题的完整工作代码(使用 requests
):
import requests
from bs4 import BeautifulSoup, SoupStrainer
response = requests.get('http://www.nytimes.com').content
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a', href=True)).find_all("a", href=True):
print(link['href'])