如何在 Python (BeautifulSoup) 中合并 .csv 文件中的两行结果

How to merge two lines of results in .csv file in Python (BeautifulSoup)

我正在尝试从一个网站获取数据,但我在如何处理“索引超出范围”错误或在 .csv 文件中的两个单独行中遇到困难。 我所说的错误“索引超出范围”的意思是,此站点上的某些记录可能有空值,我不知道如何将正确的条件放入循环中。 我使用了一些指南,但无处可去。

my_url = uReq('website', context=ssl.create_default_context(cafile=certifi.where()))

uClient = my_url
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")

containers = page_soup.select('div.header__title, div.info__cta')
container = containers[0]
filename = "products.csv"
f = open(filename,"w")

headers="Product_Name, PriceWithVAT, PriceWithoutVAT, Stock\n"
f.write(headers)

for container in containers:
    
    productName = container.findAll("span", {"class":"sku"})
    name = productName[0].text if container.findAll("span", {"class":"sku"}) else "lack name"
    
    priceWithVAT = container.findAll("span", {"class":"price-intax"})
    price = priceWithVAT[0].text if container.findAll("span", {"class":"price-intax"}) else "lack price"
    
    priceWithoutVAT = container.findAll("span", {"class":"price-extax"})
    priceNot = priceWithoutVAT[0].text if container.findAll("span", {"class":"price-extax"}) else "lack price2"
    
    stock = container.findAll("p", {"class":"stock in-stock"})
    stock = stock[0].text if container.findAll("p", {"class":"stock in-stock"}) else "lack on stock"
    
    f.write(name + "," + price + "," + priceNot + "," + stock + "\n" + "\n")
    
f.close()

然后在 .csv 文件中,我得到了整个页面的结果,每个产品都分为两行,如:

CORRECT,lack price,lack price2,lack on stock

lack name,CORRECT,CORRECT,CORRECT

我的预期输出:

CORRECT, CORRECT, CORRECT, CORRECT

(正确意味着从网站上抓取了正确的数据)

当我删除 if container.findAll("span", {"class":"sku"}) else "lack name" 和循环中的类似内容,它向我显示索引超出范围错误,因为它应该有,因为有一些空值。

你能帮我看看怎么修改密码吗?

这里需要稍微改变一下您的逻辑。我要做的不是将每个 container 作为产品名称,然后是产品信息,而是获取包含所有信息的整个容器。您会注意到每个产品都在 <li> 标签中,在 <ul class="products ..."> 标签下。

因此,让我们首先获取具有以 'products' 开头的 class 的 <ul> 标签。然后从那里获取所有 <li> 标签。然后我们将遍历每一个并提取所需的数据。

正如您所说,有些标签不存在,所以我们将做一个 try/except。它将尝试获取数据,如果失败,它将默认为 except 异常。

此外,pandas 对 use/learn 来说是一个非常好的和有用的库。所以我同意了,而不是像你那样写入 csv 文件。

代码:

import requests
from bs4 import BeautifulSoup
import re

url = 'https://specjal.com/sklep/'
response = requests.get(url)

soup = BeautifulSoup(response.text, "html.parser")

products = soup.find('ul', {'class':re.compile('^products')}).find_all('li')


rows = []
for product in products:
    try:
        productName = product.find('span',{'class':'sku'}).text
    except:
        productName = 'lack name'
    
    try:
        priceWithVAT = product.find('span',{'class':'price-intax'}).text 
    except:
        priceWithVAT = 'lack price'
    
    try:
        priceWithoutVAT = product.find('span',{'class':'price-extax'}).text
    except:
        priceWithoutVAT = 'lack price2'
    
    try:
        stock = int(product.find('p',{'class':'stock in-stock'}).text.split()[0])
    except:
        stock = 'lack on stock'
        # consider changing the above line to stock = 0
    
    row = {
        'productName':productName, 
        'priceWithVAT':priceWithVAT, 
        'priceWithoutVAT':priceWithoutVAT, 
        'stock':stock}
    
    rows.append(row)
    
    
df = pd.DataFrame(rows)
df.to_csv('products.csv', index=False)

输出:

print(df)
             productName   priceWithVAT    priceWithoutVAT          stock
0        ZZ 90*105*4 VAY   14.86zł/szt.   12.08 zł bez VAT             10
1        ZZ 85*100*5 VAY   13.76zł/szt.   11.19 zł bez VAT             10
2         ZZ 80*95*4 VAY   12.66zł/szt.   10.29 zł bez VAT             20
3         ZZ 75*90*4 VAY   11.01zł/szt.    8.95 zł bez VAT             20
4         ZZ 70*85*4 VAY    9.91zł/szt.    8.06 zł bez VAT             20
5         ZZ 65*80*5 VAY    9.36zł/szt.    7.61 zł bez VAT             20
6         ZZ 65*80*4 VAY    9.36zł/szt.    7.61 zł bez VAT             20
7         ZZ 60*75*5 VAY    8.25zł/szt.    6.71 zł bez VAT             14
8         ZZ 55*65*4 VAY    7.71zł/szt.    6.27 zł bez VAT             10
9         ZZ 50*60*4 VAY    6.61zł/szt.    5.37 zł bez VAT             20
10        ZZ 45*55*4 VAY    6.05zł/szt.    4.92 zł bez VAT             20
11        ZZ 40*50*4 VAY    5.39zł/szt.    4.38 zł bez VAT             17
12        ZZ 35*45*4 VAY     4.8zł/szt.     3.9 zł bez VAT             30
13        ZZ 30*40*4 VAY    4.26zł/szt.    3.46 zł bez VAT             20
14            XPA 710 CT   39.61zł/szt.    32.2 zł bez VAT  lack on stock
15           UCP 202 KBF    19.7zł/szt.   16.02 zł bez VAT  lack on stock
16        U298/U291 SET9  188.04zł/szt.  152.88 zł bez VAT  lack on stock
17             U 64*80*8    11.8zł/szt.    9.59 zł bez VAT              2
18              U 6*10*3    2.51zł/szt.    2.04 zł bez VAT              4
19        U 45*53*10 RSB    7.55zł/szt.    6.14 zł bez VAT  lack on stock
20     U 30*40*7 K21 NBR       8zł/szt.     6.5 zł bez VAT              5
21      U 180*200*14 K50   37.74zł/szt.   30.68 zł bez VAT  lack on stock
22     U 16*24*5,5 NI300    8.56zł/szt.    6.96 zł bez VAT             13
23      U 140*160*14 K50   21.92zł/szt.   17.82 zł bez VAT  lack on stock
24      U 140*160*14 K23   23.71zł/szt.   19.28 zł bez VAT              3
25          TR16*4*540MM   38.27zł/szt.   31.11 zł bez VAT  lack on stock
26          TP 600 8M/20   156.7zł/szt.   127.4 zł bez VAT  lack on stock
27             TP 15*1,5   27.56zł/szt.   22.41 zł bez VAT  lack on stock
28           ST 3568 LFT   94.34zł/szt.    76.7 zł bez VAT  lack on stock
29           SC07A87CS32   47.32zł/szt.   38.47 zł bez VAT  lack on stock
30        SC04B19CS31PX2    46.3zł/szt.   37.64 zł bez VAT              3
31                 R28-9   96.05zł/szt.   78.09 zł bez VAT              2
32           R 2-6 ZZ SS   13.47zł/szt.   10.95 zł bez VAT  lack on stock
33         QJ 213 MPA C3  412.06zł/szt.  335.01 zł bez VAT  lack on stock
34               PJ 1219    5.97zł/szt.    4.85 zł bez VAT  lack on stock
35       OW1 115*94*8,1    15.72zł/szt.   12.78 zł bez VAT              2
36       OGNIWO 08B-3 CL    7.23zł/szt.    5.88 zł bez VAT              7
37      NU 2311 ETVP2 C3  408.34zł/szt.  331.98 zł bez VAT  lack on stock
38         NJ 2210 ET C4  195.19zł/szt.  158.69 zł bez VAT              4
39           NJ 209 ETVP  101.89zł/szt.   82.84 zł bez VAT              2
40           NA 4901 CZH   11.64zł/szt.    9.46 zł bez VAT  lack on stock
41          MR 16277 2RS      32zł/szt.   26.02 zł bez VAT              4
42        ŁAŃCUCH 08 B-3   76.38zł/szt.    62.1 zł bez VAT             20
43            KP 16 L100   33.86zł/szt.   27.53 zł bez VAT  lack on stock
44          K 81130 SRBF  132.45zł/szt.  107.68 zł bez VAT              2
45      JL 68145/111 NAF   17.59zł/szt.    14.3 zł bez VAT  lack on stock
46  HTF O 45-7 A G5 N C3     lack price        lack price2  lack on stock
47          HRC 35*45*45   37.08zł/szt.   30.15 zł bez VAT              6
48             HK 3520 B   22.39zł/szt.    18.2 zł bez VAT  lack on stock
49           HGY 15*21*1    0.74zł/szt.     0.6 zł bez VAT              8