从 BeautifulSoup 中删除 NoneType

Remove NoneType from BeautifulSoup

我正在尝试从使用以下代码提取的数字中删除逗号:

with requests.Session() as s:
    url = 'https://www.zoopla.co.uk/for-sale/property/london/paddington/?q=Paddington%2C%20London&results_sort=newest_listings&search_source=home'
    r = s.get(url, headers=req_headers)
    soup = BeautifulSoup(r.content, 'lxml')
    prices = []
    for price in soup.find_all('a', {"class":"listing-results-price text-price"}):
        prices.append(price.text)
        if price is None:
            print('none')
    df['price'] = prices
    df['price'] = df['price'].str.extract('(\d+([\d,]?\d)*(\.\d+)?)', expand=True) #remove extract numbers with commas
    df['price'] = df['price'].replace(',','', inplace = True)

此 returns 一列,其中所有值都是 None。有没有办法消除这个 None 类型错误?

在我运行最后一行之前数据帧如下:

         price
0          NaN
1    1,875,000
2    4,950,000
3      500,000
4      675,000
5      980,000
6      475,000
7      849,950
8    1,050,000
9    1,050,000
10     650,000
11   1,100,000
12   1,300,000
13     895,000
14   1,000,000
15  26,800,000
16   1,600,000
17     695,000
18   2,100,000
19     510,000
20   1,200,000
21   3,000,000
22     599,000
23  26,800,000
24   1,550,000
25     750,000
26   1,600,000
27   1,025,000

df['price'].replace(',','', inplace = True) 替换 inplace,它不会 return 任何东西。

你需要:

df['price'] = df['price'].str.replace(',','')

输出:

0        NaN
1    1875000
2    4950000
3     500000
4     675000
5     980000
6     475000
7     849950
8    1050000
9    1050000

作为参考,看看docs

我建议你在构建数据框之前在数据提取结束时处理它你可以构建你的列表如下:

from bs4 import BeautifulSoup
import requests
url = 'https://www.zoopla.co.uk/for-sale/property/london/paddington/?q=Paddington%2C%20London&results_sort=newest_listings&search_source=home'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
res_lis = [int(price.text.strip().split('\n')[0].replace('£', '').replace(',', '')) for price in soup.find_all('a', {"class":"listing-results-price text-price"}) if price]
print(res_lis)

结果:

[2000000, 549950, 1050000, 500000, 675000, 980000, 475000, 849950, 1050000, 1050000, 650000, 1100000, 1300000, 895000, 1000000, 26800000, 1600000, 695000, 2100000, 510000, 3000000, 1200000, 599000, 26800000, 1550000, 750000, 1600000, 1025000]

如果您construct/manipulate 尽可能多地根据要求存储所有数据,这将是您的数据提取阶段,然后