使用 Python 的链接抓取房地产网站
Real estate website scrape with linksusing Python
我尝试使用 Python 和 Beautifulsoup 抓取一个商业房地产网站,相应的 href 也显示在最终的 csv 列表中。但是 link 列始终显示为空。我如何提取 href 并每周通过整个网站安排此任务 运行?提前致谢!
from bs4 import BeautifulSoup
import requests
from csv import writer
import re
url = "https://objektvision.se/lediga_lokaler/stockholm/city"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
lists = soup.find_all('a', class_ ="ov--list-item d-flex")
with open('lokal_stockholm_city_v11.csv', 'w', encoding='utf8', newline='') as f:
thewriter = writer(f)
header = ['title', 'location', 'area','link']
thewriter.writerow(header)
for list in lists:
title = list.find('div', class_="font-weight-bold text-ov street-address").text.replace('\r\n','')
location = list.find('div', class_="text-ov-dark-grey area-address").text.replace('\r\n','')
area = list.find('div', class_="font-weight-bold size").text.replace('\r\n','')
link =list.find('a', attrs_={'href': re.compile("^https://objektvision.se/Beskriv/")})
info = [title,location, area,link]
thewriter.writerow(info)
The final csv looks like this
专注于 - 获取href
有两点你应该知道 - href
在你的soup
不启动对于域,它们是相对的,你不需要找到 <a>
因为你已经根据你的 ResultSet
.
处理它
所以为了让你 href
直接调用 .get('href)
或 ['href]
并将其与基础 url:
连接
link = 'https://objektvision.se/'+e['href']
例子
注意: 不要使用 list
作为变量名 - 将其更改为 e
for element
from bs4 import BeautifulSoup
import requests
from csv import writer
url = "https://objektvision.se/lediga_lokaler/stockholm/city"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
lists = soup.find_all('a', class_ ="ov--list-item d-flex")
with open('lokal_stockholm_city_v11.csv', 'w', encoding='utf8', newline='') as f:
thewriter = writer(f)
header = ['title', 'location', 'area','link']
thewriter.writerow(header)
for e in lists:
title = e.find('div', class_="font-weight-bold text-ov street-address").text.replace('\r\n','')
location = e.find('div', class_="text-ov-dark-grey area-address").text.replace('\r\n','')
area = e.find('div', class_="font-weight-bold size").text.replace('\r\n','')
link = 'https://objektvision.se/'+e['href']
info = [title,location, area,link]
thewriter.writerow(info)
输出
title
location
area
link
Kungsgatan 49
City , Stockholm
923 m²
https://objektvision.se//Beskriv/218003079?IsPremium=True
Sveavägen 20
City , Stockholm
1 000 - 2 200 m²
https://objektvision.se//Beskriv/218017049?IsPremium=True
Sergelgatan 8-14/Sveavägen 5-9 /Mäste...
City , Stockholm
1 373 m²
https://objektvision.se//Beskriv/218030745?IsPremium=True
Adolf Fredriks Kyrkogata 13
Stockholm
191 m²
https://objektvision.se//Beskriv/218031939
Arena Sergel - Malmskillnadsgatan 36
City , Stockholm
1 - 3 000 m²
https://objektvision.se//Beskriv/218006788
我尝试使用 Python 和 Beautifulsoup 抓取一个商业房地产网站,相应的 href 也显示在最终的 csv 列表中。但是 link 列始终显示为空。我如何提取 href 并每周通过整个网站安排此任务 运行?提前致谢!
from bs4 import BeautifulSoup
import requests
from csv import writer
import re
url = "https://objektvision.se/lediga_lokaler/stockholm/city"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
lists = soup.find_all('a', class_ ="ov--list-item d-flex")
with open('lokal_stockholm_city_v11.csv', 'w', encoding='utf8', newline='') as f:
thewriter = writer(f)
header = ['title', 'location', 'area','link']
thewriter.writerow(header)
for list in lists:
title = list.find('div', class_="font-weight-bold text-ov street-address").text.replace('\r\n','')
location = list.find('div', class_="text-ov-dark-grey area-address").text.replace('\r\n','')
area = list.find('div', class_="font-weight-bold size").text.replace('\r\n','')
link =list.find('a', attrs_={'href': re.compile("^https://objektvision.se/Beskriv/")})
info = [title,location, area,link]
thewriter.writerow(info)
The final csv looks like this
专注于 - 获取href
有两点你应该知道 - href
在你的soup
不启动对于域,它们是相对的,你不需要找到 <a>
因为你已经根据你的 ResultSet
.
所以为了让你 href
直接调用 .get('href)
或 ['href]
并将其与基础 url:
link = 'https://objektvision.se/'+e['href']
例子
注意: 不要使用 list
作为变量名 - 将其更改为 e
for element
from bs4 import BeautifulSoup
import requests
from csv import writer
url = "https://objektvision.se/lediga_lokaler/stockholm/city"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
lists = soup.find_all('a', class_ ="ov--list-item d-flex")
with open('lokal_stockholm_city_v11.csv', 'w', encoding='utf8', newline='') as f:
thewriter = writer(f)
header = ['title', 'location', 'area','link']
thewriter.writerow(header)
for e in lists:
title = e.find('div', class_="font-weight-bold text-ov street-address").text.replace('\r\n','')
location = e.find('div', class_="text-ov-dark-grey area-address").text.replace('\r\n','')
area = e.find('div', class_="font-weight-bold size").text.replace('\r\n','')
link = 'https://objektvision.se/'+e['href']
info = [title,location, area,link]
thewriter.writerow(info)
输出
title | location | area | link |
---|---|---|---|
Kungsgatan 49 | City , Stockholm | 923 m² | https://objektvision.se//Beskriv/218003079?IsPremium=True |
Sveavägen 20 | City , Stockholm | 1 000 - 2 200 m² | https://objektvision.se//Beskriv/218017049?IsPremium=True |
Sergelgatan 8-14/Sveavägen 5-9 /Mäste... | City , Stockholm | 1 373 m² | https://objektvision.se//Beskriv/218030745?IsPremium=True |
Adolf Fredriks Kyrkogata 13 | Stockholm | 191 m² | https://objektvision.se//Beskriv/218031939 |
Arena Sergel - Malmskillnadsgatan 36 | City , Stockholm | 1 - 3 000 m² | https://objektvision.se//Beskriv/218006788 |