Web Crawler–––TypeError: coercing to Unicode: need string or buffer, NoneType found

Web Crawler–––TypeError: coercing to Unicode: need string or buffer, NoneType found

我是 python 的新手。 我已经制作了自己的网络爬虫,应该可以抓取 Yelp 进行练习。


我一直收到这个错误,似乎无法通过第一页:

 Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 26, in yelpSpider
  TypeError: coercing to Unicode: need string or buffer, NoneType found

这是我的代码:

import requests
from BeautifulSoup import BeautifulSoup
def yelpSpider(maxPages):
    page = 0
    listURL = []
    listRATE = []
    listAREA = []
    listADDRESS = []
    listType = []
    while page <= maxPages:
        url = 'https://www.yelp.com/search?find_desc=Restaurants&find_loc=Manhattan,+NY&start=0' + str(page)
        sourceCode = requests.get(url)
        plainText = sourceCode.text
        soup = BeautifulSoup(plainText)
        for bizName in soup.findAll('a',{'class':'biz-name js-analytics-click'}):
            href = 'https://www.yelp.com.com' + bizName.get('href')
            listURL.append(href)
        for rating in soup.findAll('img',{'class':'offscreen'}):
            stars = rating.get('alt')
            listRATE.append(stars)
        for area in soup.findAll('span',{'class':'neighborhood-str-list'}):
            listAREA.append(area.string)
        for type in soup.findAll('span',{'class':'category-str-list'}):
            listType.append(type)
        for tracker in range(int(page),int(page) + 10):
            print(listURL[tracker])
            print(' ')
            print(listAREA[tracker] + ' | ' + listRATE[tracker])
        page += 10

yelpSpider(20)

感谢您的帮助!

问题发生在 print(listAREA[tracker] + ' | ' + listRATE[tracker])

当您的 listRATE 变为

时就会发生这种情况
['4.5 star rating',
 '4.5 star rating',
 '4.5 star rating',
 '4.0 star rating',
 '4.0 star rating',
 '4.0 star rating',
 '4.0 star rating',
 '5.0 star rating',
 '4.5 star rating',
 '4.0 star rating',
 None,
 None,
 '4.0 star rating',
 '4.5 star rating',
 '4.0 star rating',
 '3.0 star rating',
 '4.0 star rating',
 '3.5 star rating',
 '4.5 star rating',
 '4.5 star rating',
 '5.0 star rating',
 '4.0 star rating',
 None,
 None]

如您所见,tracker: 10 索引为 None。并且 None 不能用于字符串连接。

因此您有不同的选择,一种是使用 or 条件并将其替换为 ''。您的代码将变为

print((listAREA[tracker] or '') + ' | ' + (listRATE[tracker] or ''))

下一个选项是在打印前修复您的 listRATE

listRATE = list(map(lambda text: text if text is not None else 'N/A', listRATE))

执行上面的命令后你的数组会改变如下

['4.5 star rating',
 '4.5 star rating',
 '4.5 star rating',
 '4.0 star rating',
 '4.0 star rating',
 '4.0 star rating',
 '4.0 star rating',
 '5.0 star rating',
 '4.5 star rating',
 '4.0 star rating',
 'N/A',
 'N/A',
 '4.0 star rating',
 '4.5 star rating',
 '4.0 star rating',
 '3.0 star rating',
 '4.0 star rating',
 '3.5 star rating',
 '4.5 star rating',
 '4.5 star rating',
 '5.0 star rating',
 '4.0 star rating',
 'N/A',
 'N/A']