在 Python 中抓取网页抓取下的 td 元素
Grabbing td element under web scraping in Python
我正在尝试使用如下 JS 元素捕获元素
<td align="right" valign="top" class="tabletext2" nowrap="nowrap"> <strong>Program Element Code(s):</strong></td>
网站是 http://www.nsf.gov/awardsearch/showAward?AWD_ID=1227110&HistoricalAwards=false
python 脚本如下所示
i=1300138;
i=str(i);
url= "http://www.nsf.gov/awardsearch/showAward?AWD_ID="+i+"&HistoricalAwards=false";
r = requests.get (url)
#webbrowser.open(url,new =new );
soup = BeautifulSoup(urllib2.urlopen(url).read())
sp=BeautifulSoup(r.content)
gd=sp.findAll('td',{'class':'tabletext2'},nowrap="nowrap")
for item in gd:
print item.text;
if item.text=="Program Element Code(s):":
print item.contents;
但我无法让它工作。我需要获取程序参考代码前面的 ID
任何帮助表示赞赏。谢谢
一种方法是在正确的"class":"tabletext2"
之后抓取下一个td:
url= "http://www.nsf.gov/awardsearch/showAward?AWD_ID=1227110&HistoricalAwards=false"
import requests
from bs4 import BeautifulSoup
r = requests.get(url)
tds = BeautifulSoup(r.content).find_all("td",{"class":"tabletext2"})
print([td.find_next("td").text.strip() for td in tds if td.text.startswith("Program Reference Code(s)")])
[u'131E, 113E, 8048, 7433']
我正在尝试使用如下 JS 元素捕获元素
<td align="right" valign="top" class="tabletext2" nowrap="nowrap"> <strong>Program Element Code(s):</strong></td>
网站是 http://www.nsf.gov/awardsearch/showAward?AWD_ID=1227110&HistoricalAwards=false
python 脚本如下所示
i=1300138;
i=str(i);
url= "http://www.nsf.gov/awardsearch/showAward?AWD_ID="+i+"&HistoricalAwards=false";
r = requests.get (url)
#webbrowser.open(url,new =new );
soup = BeautifulSoup(urllib2.urlopen(url).read())
sp=BeautifulSoup(r.content)
gd=sp.findAll('td',{'class':'tabletext2'},nowrap="nowrap")
for item in gd:
print item.text;
if item.text=="Program Element Code(s):":
print item.contents;
但我无法让它工作。我需要获取程序参考代码前面的 ID 任何帮助表示赞赏。谢谢
一种方法是在正确的"class":"tabletext2"
之后抓取下一个td:
url= "http://www.nsf.gov/awardsearch/showAward?AWD_ID=1227110&HistoricalAwards=false"
import requests
from bs4 import BeautifulSoup
r = requests.get(url)
tds = BeautifulSoup(r.content).find_all("td",{"class":"tabletext2"})
print([td.find_next("td").text.strip() for td in tds if td.text.startswith("Program Reference Code(s)")])
[u'131E, 113E, 8048, 7433']