CSV 文本提取 Beautifulsoup

Question

我是 python 的新手，这是我使用 Beautifulsoup 的第一个练习代码。我还没有学到针对特定数据提取问题的创造性解决方案。

这个程序打印得很好，但在提取到 CSV 文件时有些困难。它采用第一个元素，但将所有其他元素留在后面。我只能猜测可能有一些空格、分隔符或其他导致代码在初始文本后停止提取的东西？？？

我试图逐行对每个项目进行 CSV 提取，但显然陷入困境。感谢您提供任何帮助 and/or 建议。

from urllib.request import urlopen  
from bs4 import BeautifulSoup
import csv  

price_page = 'http://www.harryrosen.com/footwear/c/boots'
page = urlopen(price_page)
soup = BeautifulSoup(page, 'html.parser')
product_data = soup.findAll('ul', attrs={'class': 'productInfo'})

for item in product_data:

    brand_name=item.contents[1].text.strip()
    shoe_type=item.contents[3].text.strip()
    shoe_price = item.contents[5].text.strip()
    print (brand_name)
    print (shoe_type)
    print (shoe_price)

with open('shoeprice.csv', 'w') as shoe_prices:
writer = csv.writer(shoe_prices)
writer.writerow([brand_name, shoe_type, shoe_price])

Answer 1

这是解决问题的一种方法：

将结果收集到具有 list comprehension
通过 csv.DictWriter 和单个 .writerows() 调用将结果写入 CSV 文件

实施：

data = [{
    'brand': item.li.get_text(strip=True),
    'type': item('li')[1].get_text(strip=True),
    'price': item.find('li', class_='price').get_text(strip=True)
} for item in product_data]

with open('shoeprice.csv', 'w') as f:
    writer = csv.DictWriter(f, fieldnames=['brand', 'type', 'price'])
    writer.writerows(data)

如果您还想编写 CSV headers，请在 writer.writerows(data) 之前添加 writer.writeheader() 调用。

请注意，您也可以使用常规 csv.writer 和列表（或元组）列表，但我喜欢在这种情况下使用字典的明确性和增加的可读性。

另请注意，我已经改进了循环中使用的定位器 - 我认为使用 .contents 列表并通过索引获取产品 children 不是一个好主意。

Answer 2

with open('shoeprice.csv', 'w') as shoe_prices:
    writer = csv.writer(shoe_prices)
    for item in product_data:
        brand_name=item.contents[1].text.strip()
        shoe_type=item.contents[3].text.strip()
        shoe_price = item.contents[5].text.strip()
        print (brand_name, shoe_type, shoe_price, spe='\n')

        writer.writerow([brand_name, shoe_type, shoe_price])

将打开文件改为外层循环，这样就不需要每次循环都打开文件了。

CSV 文本提取 Beautifulsoup

CSV Text Extraction Beautifulsoup

csv

beautifulsoup

python-3.x

export-to-csv