Python 2.7: AttributeError: 'list' object has no attribute 'get'

Python 2.7: AttributeError: 'list' object has no attribute 'get'

我已经构建了一个脚本来抓取英国的法院列表,生成每个法院地址页面的链接列表,然后想从所述页面中抓取地址。

到目前为止它运行良好,但我卡在了 "write to csv" 位。我认为这与 iteritems() 缺少基于 get 方法有关。我知道 iterator 没有与 iterable 相同的方法(我在我的代码中使用迭代器),但它没有无法帮助我解决我的特殊问题。

这是我的代码:

import csv
import time
import random
import requests
from bs4 import BeautifulSoup as bs

# lambda expression to request url and parse it through bs
soup = lambda url: bs((requests.get(url)).text, "html.parser")


def crawl_court_listings(base, buff, char):
    """  """
    # common URL segment + cuffer URL segment + end character -> URL
    url = base + buff + str(chr(char))

    # soup lambda expression -> grab first unordered list
    links = (soup(url)).find('div', {'class', 'content inner cf'}).find('ul')

    # empty dictionary
    results = {}

    # loop through links, get link title and href
    for item in links.find_all('a', href=True):
        court_link = item['href']
        title = item.string

        # generate full court address page url from href
        full_court_link = base + court_link

        # save title and full URL to results
        results[title] = full_court_link

        # increment char var by 1
        char += 1

    # return results dict and incremented char value
    return results, char


def get_court_address(court_name, full_court_link):
    """ """

    # get horrible chunk of poorly formatted address(es)
    address_blob = (soup(full_court_link)).find('div', {'id': 'addresses'}).text

    # clean the blob
    clean_address = ("\n".join(line.strip() for line in address_blob.split("\n")))

    # write to csv
    with open('court_addresses.csv', 'w') as csvfile:
        fieldnames = [court_name, full_court_link, clean_address]
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writerow(fieldnames)


if __name__ == "__main__":

    base = 'https://courttribunalfinder.service.gov.uk/'
    buff = 'courts/'

    # 65 = "A". Starting from Char "A", retrieve list of Titles and Links of for Court Addresses. Return Char +1
    results, char = crawl_court_listings(base, buff, 65)

    # 90 = "Z". Until Z, pass title and list from results into get_court_address(), then wait a few seconds
    while char <= 90:
        for t, l in results.iteritems():
            get_court_address(t, l)
            time.sleep(random.randint(0,5))

当我 运行 这样做时,我得到以下信息:

Traceback (most recent call last):
  File ".\CourtScraper.py", line 63, in <module>
    get_court_address(t, l)
  File ".\CourtScraper.py", line 49, in get_court_address
    writer.writerow(fieldnames)
  File "c:\python27\Lib\csv.py", line 152, in writerow
    return self.writer.writerow(self._dict_to_list(rowdict))
  File "c:\python27\Lib\csv.py", line 149, in _dict_to_list
    return [rowdict.get(key, self.restval) for key in self.fieldnames]
AttributeError: 'list' object has no attribute 'get'

即使出现错误,它也会生成 csv 文件,其中单元格 A1 和 A2 填充了 titlefull-court_link,但没有填充 address。地址(打印时)如下所示:

Write to us:


1st Floor

Piccadilly Exchange

Piccadilly Plaza

Manchester

Greater Manchester

M1 4AH

所以我的第一个想法是我试图将多行文本写入导致错误的单个单元格,但不确定如何确认这一点。我使用 print(type(address)) 返回 unicode 而不是 list,所以我认为这不是导致问题的原因。我不明白它从哪里得到 list 问题相关的,如果这有意义的话。

如果问题是 iteritems() 方法引起的,我该如何解决?

有人可以解释错误并指出解决问题的方向吗?

对于您正在编写的每一行,您需要传入一个 字典 - 您正在传入 header 列表

https://docs.python.org/2/library/csv.html#csv.DictWriter

# write to csv
with open('court_addresses.csv', 'w') as csvfile:
    fieldnames = [court_name, full_court_link, clean_address]
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writerow(fieldnames)
                    ^^^^^^^^^^^  This should be a dict 

字典需要看起来像::

{'court_name': X, 'full_court_link': Y, 'clean_address': Z}

HTH

您的问题在这里:

writer.writerow(fieldnames)

"fieldnames" 是一个 list 字段名称。您需要传递 dict 对 key-value。所以它应该看起来更像这样:

# write to csv
with open('court_addresses.csv', 'w') as csvfile:
    # note - these are strings, not variables
    fieldnames = ['court_name', 'full_court_link', 'clean_address']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writerow({"court_name" : court_name,
                     "full_court_link" : full_court_link},
                     "clean_address" : clean_address})

PSST:你还有一个问题。您是 re-opening 您解析的每个法院的输出文件。您可能想打开该文件一次(在 __main__ 下),然后将句柄传递给 get_court_address()

with open('court_addresses.csv', 'w') as csvfile:
    fieldnames = ['court_name', 'full_court_link', 'clean_address']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writerow({'court_name': court_name, 'full_court_link': full_court_link, 'clean_address': clean_address})