我需要有关列表索引超出范围的帮助

I need help about list index out of range

基本上,我的项目的概念是通过读取一个 csv 文件来抓取电子邮件,该文件包含大约 200 个 url,并且我想从该 (csv) 文件中抓取针对这些 url 的所有电子邮件。 但我面临的问题是索引错误

错误是

/home/jawad/PycharmProjects/beautifulsoup/venv/bin/python /home/jawad/Pycharm/pycharm-community-2021.1/plugins/python-ce/helpers/pydev/pydevd.py --multiproc --qt-support=auto --client 127.0.0.1 --port 42197 --file /home/jawad/PycharmProjects/beautifulsoup/emailhunter/emailhunter/spiders/emailscrapping.py
Connected to pydev debugger (build 211.6693.115)
Traceback (most recent call last):
  File "/home/jawad/PycharmProjects/beautifulsoup/emailhunter/emailhunter/spiders/emailscrapping.py", line 7, in <module>
    a= line[0].split('\t')[4]
IndexError: list index out of range
python-BaseException

我的 python 代码是:

from bs4 import BeautifulSoup
import requests, re
import csv
with open('Clients.csv','r', encoding="utf16", errors='ignore') as csv_file:
    csv_reader = csv.reader(csv_file)
    for line in csv_reader:
        a= line[0].split('\t')[4]
        print(a)
        # for url in a:

        def get_email(a):
            response = requests.get(a, headers={
                'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.107 Safari/537.36',
                'Upgrade-Insecure-Requests': '1', 'x-runtime': '148ms'}, allow_redirects=True).content

            soup = BeautifulSoup(response, "html.parser")
            email = soup(text=re.compile(r'[A-Za-z0-9\.\+_-]+@[A-Za-z0-9\._-]+\.[a-zA-Z]*'))

            _emailtokens = str(email).replace("\t", "").replace("\n", "").split(' ')

            if len(_emailtokens):
                print([match.group(0) for token in _emailtokens for match in
                       [re.search(r"([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)", str(token.strip()))] if
                       match])

    get_email(a)

在索引列表时,您应该始终在 'try - except' 块中捕获 IndexError,或者检查该索引是否在之前的范围内(除非您正在处理调用函数中的 IndexError,或其 'parents' 之一)。

出于调试目的,我会尝试以下操作之一:

try:
    print("Line = {}".format(line))
    print("Line[0] = {}".format(line[0]))
    print("Split line = {}".format(line[0].split('\t'))
    print("5th element = {}".format(line[0].split('\t')[4])
    a = line[0].split('\t')[4]
except IndexError as e:
    print("Index Error: {}".format(e))
print("Line = {}".format(line))
correct = True
if len(line) > 0:
    print("Line[0] = {}".format(line[0]))
else:
    print("Line length = {}".format(len(line))
    correct = False
print("Split line = {}".format(line[0].split('\t'))
if len(line) > 5:
    print("5th element = {}".format(line[0].split('\t')[4])
else:
    print("Only {} elements in tab split list".format(len(line[0].split('\t')))
    correct = False
if correct:
    a = line[0].split('\t')[4]
else:
    raise IndexError