如何使用 Python 和 BeautifulSoup 从 html table 中抓取数据？

Question

如果您查看此页面 https://metals-api.com/currencies，则有一个 html table 包含 2 列。我想将 column1 中的所有行提取到 list/array 中。我该怎么做？

import requests
from bs4 import BeautifulSoup

URL = "https://metals-api.com/currencies"
page = requests.get(URL)


soup = BeautifulSoup(page.content, "html.parser")


with open('outpu2t.txt', 'w', encoding='utf-8') as f: 

    f.write(soup.text)

为了澄清我不希望运行一些针对这些代码的获取价格命令，我正在尝试编译一个代码列表，以便我可以将它们添加到我的应用程序的下拉菜单中

Answer 1

如果我理解了问题，那么你可以试试下一个例子

import requests
from bs4 import BeautifulSoup
import pandas as pd
data=[]
URL = "https://metals-api.com/currencies"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")
for code in soup.select('.table tbody tr td:nth-child(1)'):
    code =code.text
    data.append(code)
df=pd.DataFrame(data,columns=['code'])
#df.to_csv('code.csv',index=False)# to store data
print(df)

输出：

     code
0     XAU
1     XAG
2     XPT
3     XPD
4     XCU
..    ...
209  LINK
210   XLM
211   ADA
212   BCH
213   LTC

[214 rows x 1 columns]

Answer 2

我纠正了，我最初只是尝试了 pd.read_html("https://metals-api.com/currencies")，它通常可以正常工作，但显然只要稍加改动它仍然可以正常工作。

import pandas as pd
import requests
URL = "https://metals-api.com/currencies"
page = requests.get(URL)
df = pd.read_html(page.content)[0]
print(df)

输出：

     Code                                               Name
0     XAU  1 Ounce of 24K Gold. Use Carat endpoint to dis...
1     XAG                                             Silver
2     XPT                                           Platinum
3     XPD                                          Palladium
4     XCU                                             Copper
..    ...                                                ...
209  LINK                                          Chainlink
210   XLM                                            Stellar
211   ADA                                            Cardano
212   BCH                                       Bitcoin Cash
213   LTC                                           Litecoin

[214 rows x 2 columns]

如何使用 Python 和 BeautifulSoup 从 html table 中抓取数据？

How do I use Python and BeautifulSoup to scrape data from an html table?

python

beautifulsoup