将 webscrapped 数据写入 CSV
Writing webscrapped data into CSV
使用下面的代码,我可以从两个网站上抓取产品信息。我的目标是将抓取的数据写入 CSV,其中 A 列用于 class“标签”,B 列用于 class“值”
谁能帮我达到预期的效果?
from bs4 import BeautifulSoup
import requests
import pandas as pd
url_list = ["https://21shares.com/product/abtc", "https://21shares.com/product/aeth/"]
for link in url_list:
r = requests.get(link)
r.encoding = 'uft-8'
html_content = r.text
soup = BeautifulSoup(html_content, "lxml")
datas = soup.find('div', {'class':'product-sidebar-container'})
for data in datas:
soup.findAll("span", {"class": "Label", "Value": True})
print(data.getText(separator=("\n")))
你打错了,应该是'utf-8'。
初始化一个空列表,并在遍历 datas
时创建一个字典以附加到列表中。获得该列表后,很容易将其转换为数据框并使用 pandas
写入 csv。按照下面的示例,您的代码已修改。
from bs4 import BeautifulSoup
import requests
import pandas as pd
url_list = ["https://21shares.com/product/abtc", "https://21shares.com/product/aeth/"]
rows = []
for link in url_list:
r = requests.get(link)
r.encoding = 'uft-8'
html_content = r.text
soup = BeautifulSoup(html_content, "lxml")
datas = soup.find('div', {'class':'product-sidebar-container'})
for data in datas:
for each in soup.findAll("span", {"class": "label"}):
label = each.text
value = each.find_next('span', {'class':'value'}).text
row = {'label':label, 'value':value}
rows.append(row)
df = pd.DataFrame(rows)
df.to_csv('file.csv', index=False)
使用下面的代码,我可以从两个网站上抓取产品信息。我的目标是将抓取的数据写入 CSV,其中 A 列用于 class“标签”,B 列用于 class“值”
谁能帮我达到预期的效果?
from bs4 import BeautifulSoup
import requests
import pandas as pd
url_list = ["https://21shares.com/product/abtc", "https://21shares.com/product/aeth/"]
for link in url_list:
r = requests.get(link)
r.encoding = 'uft-8'
html_content = r.text
soup = BeautifulSoup(html_content, "lxml")
datas = soup.find('div', {'class':'product-sidebar-container'})
for data in datas:
soup.findAll("span", {"class": "Label", "Value": True})
print(data.getText(separator=("\n")))
你打错了,应该是'utf-8'。
初始化一个空列表,并在遍历 datas
时创建一个字典以附加到列表中。获得该列表后,很容易将其转换为数据框并使用 pandas
写入 csv。按照下面的示例,您的代码已修改。
from bs4 import BeautifulSoup
import requests
import pandas as pd
url_list = ["https://21shares.com/product/abtc", "https://21shares.com/product/aeth/"]
rows = []
for link in url_list:
r = requests.get(link)
r.encoding = 'uft-8'
html_content = r.text
soup = BeautifulSoup(html_content, "lxml")
datas = soup.find('div', {'class':'product-sidebar-container'})
for data in datas:
for each in soup.findAll("span", {"class": "label"}):
label = each.text
value = each.find_next('span', {'class':'value'}).text
row = {'label':label, 'value':value}
rows.append(row)
df = pd.DataFrame(rows)
df.to_csv('file.csv', index=False)