python BeautifulSoup web Scraping 输出没有写入信息

Question

我是初学者，正在学习BeautifulSoup，想从一个网站获取信息，但是输出没有写任何信息，我不知道我哪里做错了, 天啊, 帮帮我

这是我的代码

from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
driver = webdriver.Chrome("C:\chromedriver\chromedriver.exe")
products=[] #store name of the product
prices=[] #store price of the product
ratings=[] #store rating of the product
driver.get("http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html")
content = driver.page_source
soup = BeautifulSoup(content,features="html.parser")
for a in soup.findAll('a',href=True, attrs={'container-fluid page'}):
    name=a.find('div', attrs={'class':'col-sm-6 product_main'})
    price=a.find('div', attrs={'class':'col-sm-6 product_main'})
    rating=a.find('div', attrs={'class':'star-rating Three'})
    products.append(name.text)
    prices.append(price.text)
    ratings.append(rating.text)
df = pd.DataFrame({'Product Name': products, 'Price': prices, 'Rating': ratings})
df.to_csv('D:\products.csv', index=False, encoding='utf-8')

它没有报告任何错误，我刚得到一个没有任何信息的 csv 文件。

Product Name,Price,Rating

Answer 1

注意您的代码中有一些内容，我建议您保持简单。您的策略应该是 select by id、tag、class - 此顺序从静态提供的信息到更动态的信息。在新代码中使用 find_all() 而不是旧语法 findAll()

主要问题是您的 selection soup.findAll('a',href=True, attrs={'container-fluid page'}) 找不到任何东西，因此结果为空。事实上，这个页面只有一个产品，不需要所有这些列表。

...
soup = BeautifulSoup(content,"html.parser")

df = pd.DataFrame([{
    'Product Name': soup.h1.text, 
    'Price': soup.find('p',{"class": "price_color"}).text, 
    'Rating': soup.find('p',{"class": "star-rating"})['class'][-1].lower()}])
...

例子

不需要使用selenium，也可以看一下requests - 烹调你的soup的过程几乎是一样的：

import requests
from bs4 import BeautifulSoup

URL = 'http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html'
content = requests.get(URL).content

soup = BeautifulSoup(content,"html.parser")

df = pd.DataFrame([{
    'Product Name': soup.h1.text, 
    'Price': soup.find('p',{"class": "price_color"}).text, 
    'Rating': soup.find('p',{"class": "star-rating"})['class'][-1].lower()}])
df
#or to save as csv -> df.to_csv('D:\products.csv', index=False, encoding='utf-8')

输出

Product Name	Price	Rating
A Light in the Attic	£51.77	three

python BeautifulSoup web Scraping 输出没有写入信息

python BeautifulSoup web Scraping output no information written

python

selenium

beautifulsoup

web-scraping

例子

输出