抓取 td 内的链接
Grabbing links inside the td
下面的脚本有效,但我想添加项目的 href link 以产生更好的数据输出。任何帮助都可以。谢谢。
import requests
from bs4 import BeautifulSoup
headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"}
url = "https://bscscan.com/token/generic-tokenholders2?m=normal&a=0x0D0b63b32595957ae58D4dD60aa5409E79A5Aa96"
s = requests.Session()
r = s.get(url,headers=headers, timeout=5)
soupblockdetails = BeautifulSoup(r.content, 'html.parser')
for row in soupblockdetails.select("tr:has(td)")[:3]: #max value is 50
item1 = row.find_all("td")[0].text[0:].strip()
item2 = row.find_all("td")[1].text[0:].strip()
item3 = row.find_all("td")[2].text[0:].strip()
print ("{:<2} {:<43} {:>25}".format(item1, item2, item3))
当前输出:
1 KIPS: Locked Wallet 1,870.828693386970691791
2 0xe72d1910c07420a99a2649f40910f692cd87309e 6.849012043043023775
3 0x138fe04c8f7da181765bde237ef5e78546677f5f 2.153134069327832213
需要输出:
1 KIPS: Locked Wallet 1,870.828693386970691791 0x81e0ef68e103ee65002d3cf766240ed1c070334d
2 0xe72d1910c07420a99a2649f40910f692cd87309e 6.849012043043023775 0xe72d1910c07420a99a2649f40910f692cd87309e
3 0x138fe04c8f7da181765bde237ef5e78546677f5f 2.153134069327832213 0x138fe04c8f7da181765bde237ef5e78546677f5f
从第二个 <td>
调用 <a>
并使用 .get('href')
提取 href
值 - 仅获取参数值,只需拆分 url:
item4 = row.find_all("td")[1].a.get('href').split('a=')[-1]
在你的循环中:
for row in soupblockdetails.select("tr:has(td)")[:3]: #max value is 50
item1 = row.find_all("td")[0].text[0:].strip()
item2 = row.find_all("td")[1].text[0:].strip()
item3 = row.find_all("td")[2].text[0:].strip()
item4 = row.find_all("td")[1].a.get('href').split('a=')[-1]
print ("{:<2} {:<43} {:>25} {}".format(item1, item2, item3, item4))
输出
1 KIPS: Locked Wallet 1,870.828693386970691791 0x81e0ef68e103ee65002d3cf766240ed1c070334d
2 0xe72d1910c07420a99a2649f40910f692cd87309e 6.849012043043023775 0xe72d1910c07420a99a2649f40910f692cd87309e
3 0x138fe04c8f7da181765bde237ef5e78546677f5f 2.153134069327832213 0x138fe04c8f7da181765bde237ef5e78546677f5f
下面的脚本有效,但我想添加项目的 href link 以产生更好的数据输出。任何帮助都可以。谢谢。
import requests
from bs4 import BeautifulSoup
headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"}
url = "https://bscscan.com/token/generic-tokenholders2?m=normal&a=0x0D0b63b32595957ae58D4dD60aa5409E79A5Aa96"
s = requests.Session()
r = s.get(url,headers=headers, timeout=5)
soupblockdetails = BeautifulSoup(r.content, 'html.parser')
for row in soupblockdetails.select("tr:has(td)")[:3]: #max value is 50
item1 = row.find_all("td")[0].text[0:].strip()
item2 = row.find_all("td")[1].text[0:].strip()
item3 = row.find_all("td")[2].text[0:].strip()
print ("{:<2} {:<43} {:>25}".format(item1, item2, item3))
当前输出:
1 KIPS: Locked Wallet 1,870.828693386970691791
2 0xe72d1910c07420a99a2649f40910f692cd87309e 6.849012043043023775
3 0x138fe04c8f7da181765bde237ef5e78546677f5f 2.153134069327832213
需要输出:
1 KIPS: Locked Wallet 1,870.828693386970691791 0x81e0ef68e103ee65002d3cf766240ed1c070334d
2 0xe72d1910c07420a99a2649f40910f692cd87309e 6.849012043043023775 0xe72d1910c07420a99a2649f40910f692cd87309e
3 0x138fe04c8f7da181765bde237ef5e78546677f5f 2.153134069327832213 0x138fe04c8f7da181765bde237ef5e78546677f5f
从第二个 <td>
调用 <a>
并使用 .get('href')
提取 href
值 - 仅获取参数值,只需拆分 url:
item4 = row.find_all("td")[1].a.get('href').split('a=')[-1]
在你的循环中:
for row in soupblockdetails.select("tr:has(td)")[:3]: #max value is 50
item1 = row.find_all("td")[0].text[0:].strip()
item2 = row.find_all("td")[1].text[0:].strip()
item3 = row.find_all("td")[2].text[0:].strip()
item4 = row.find_all("td")[1].a.get('href').split('a=')[-1]
print ("{:<2} {:<43} {:>25} {}".format(item1, item2, item3, item4))
输出
1 KIPS: Locked Wallet 1,870.828693386970691791 0x81e0ef68e103ee65002d3cf766240ed1c070334d
2 0xe72d1910c07420a99a2649f40910f692cd87309e 6.849012043043023775 0xe72d1910c07420a99a2649f40910f692cd87309e
3 0x138fe04c8f7da181765bde237ef5e78546677f5f 2.153134069327832213 0x138fe04c8f7da181765bde237ef5e78546677f5f