当运行网络抓取脚本时获取不可订阅

Question

我正在练习网页抓取并正在使用这段代码。我正在尝试 for 循环。

import requests
from bs4 import BeautifulSoup

name=[]
link=[]
address=[]
for i in range (1,11):
  i=str(i)
  url = "https://forum.iktva.sa/exhibitors-list?&page="+i+"&searchgroup=37D5A2A4-exhibitors"
  soup = BeautifulSoup(requests.get(url).content, "html.parser")

  for a in soup.select(".m-exhibitors-list__items__item__header__title__link"):
      company_url = "https://forum.iktva.sa/" + a["href"].split("'")[1]

      soup2 = BeautifulSoup(requests.get(company_url).content, "html.parser")
      n=soup2.select_one(".m-exhibitor-entry__item__header__title").text

      l=soup2.select_one("h4+a")["href"]
      a=soup2.select_one(".m-exhibitor-entry__item__body__contacts__address").text
      name.append(n)
      link.append(l)
      address.append(a)

当我在运行程序时出现此错误：

  l=soup2.select_one("h4+a")["href"]
TypeError: 'NoneType' object is not subscriptable

如果我不确定如何解决问题。

Answer 1

您只需要替换以下代码即可 处理 None

l = soup2.select_one("h4+a")
if l:
    l = l["href"]
else:
    l = "Website not available"

如您所见，因为网站不适用于： https://forum.iktva.sa/exhibitors/sanad

或者您可以像这样处理所有错误：

import requests
from bs4 import BeautifulSoup


def get_object(obj, attr=None):
    try:
        if attr:
            return obj[attr]
        else:
            return obj.text
    except:
        return "Not available"


name = []
link = []
address = []
for i in range(1, 11):
    i = str(i)
    url = f"https://forum.iktva.sa/exhibitors-list?&page={i}&searchgroup=37D5A2A4-exhibitors"
    soup = BeautifulSoup(requests.get(url).text, features="lxml")

    for a in soup.select(".m-exhibitors-list__items__item__header__title__link"):

        company_url = "https://forum.iktva.sa/" + a["href"].split("'")[1]
        soup2 = BeautifulSoup(requests.get(company_url).content, "html.parser")

        n = soup2.select_one(".m-exhibitor-entry__item__header__title").text
        n = get_object(n)

        l = soup2.select_one("h4+a")
        l = get_object(l, 'href')

        a = soup2.select_one(".m-exhibitor-entry__item__body__contacts__address")
        a = get_object(a)

        name.append(n)
        link.append(l)
        address.append(a)

当运行网络抓取脚本时获取不可订阅

Getting a not subscriptable when running a web scraping script

python

ajax

beautifulsoup

web-scraping

python-requests

当 运行 网络抓取脚本时获取不可订阅

Getting a not subscriptable when running a web scraping script

python

ajax

beautifulsoup

web-scraping

python-requests

当运行网络抓取脚本时获取不可订阅