Python、Beautiful Soup:如何获取想要的元素
Python, Beautiful Soup: how to get the desired element
我正在尝试到达某个元素,解析站点的源代码。
这是我试图解析的部分的片段(这里直到周五),但它在一周中的所有日子都是一样的
<div id="intForecast">
<h2>Forecast for Rome</h2>
<table cellspacing="0" cellpadding="0" id="nonCA">
<tr>
<td onclick="showDetails('1');return false" id="day1" class="on">
<span>Thursday</span>
<div class="intIcon"><img src="http://icons.wunderground.com/graphics/conds/2005/sunny.gif" alt="sunny" /></div>
<div>Clear</div>
<div><span class="hi">H <span>22</span>°</span> / <span class="lo">L <span>11</span>°</span></div>
</td>
<td onclick="showDetails('2');return false" id="day2" class="off">
<span>Friday</span>
<div class="intIcon"><img src="http://icons.wunderground.com/graphics/conds/2005/partlycloudy.gif" alt="partlycloudy" /></div>
<div>Partly Cloudy</div>
<div><span class="hi">H <span>21</span>°</span> / <span class="lo">L <span>15</span>°</span></div>
</td>
</tr>
</table>
</div>
.....所有日子等等
实际上我得到了我的结果,但我认为以一种丑陋的方式:
forecastFriday= soup.find('div',text='Friday').findNext('div').findNext('div').string
现在,如您所见,我深入研究重复 .findNext('div')
的元素,最终到达 .string
我想获取周五"Partly Cloudy"的信息
那么还有更多的 pythonic 方法来做到这一点吗?
谢谢!
只需找到所有 <td>
并遍历它们:
soup = BeautifulSoup(your_html)
div = soup('div',{'id':'intForecast'})[0]
tds = div.find('table').findAll('td')
for td in tds:
day = td('span')[0].text
forecast = td('div')[1].text
print day, forecast
我正在尝试到达某个元素,解析站点的源代码。 这是我试图解析的部分的片段(这里直到周五),但它在一周中的所有日子都是一样的
<div id="intForecast">
<h2>Forecast for Rome</h2>
<table cellspacing="0" cellpadding="0" id="nonCA">
<tr>
<td onclick="showDetails('1');return false" id="day1" class="on">
<span>Thursday</span>
<div class="intIcon"><img src="http://icons.wunderground.com/graphics/conds/2005/sunny.gif" alt="sunny" /></div>
<div>Clear</div>
<div><span class="hi">H <span>22</span>°</span> / <span class="lo">L <span>11</span>°</span></div>
</td>
<td onclick="showDetails('2');return false" id="day2" class="off">
<span>Friday</span>
<div class="intIcon"><img src="http://icons.wunderground.com/graphics/conds/2005/partlycloudy.gif" alt="partlycloudy" /></div>
<div>Partly Cloudy</div>
<div><span class="hi">H <span>21</span>°</span> / <span class="lo">L <span>15</span>°</span></div>
</td>
</tr>
</table>
</div>
.....所有日子等等
实际上我得到了我的结果,但我认为以一种丑陋的方式:
forecastFriday= soup.find('div',text='Friday').findNext('div').findNext('div').string
现在,如您所见,我深入研究重复 .findNext('div')
的元素,最终到达 .string
我想获取周五"Partly Cloudy"的信息
那么还有更多的 pythonic 方法来做到这一点吗? 谢谢!
只需找到所有 <td>
并遍历它们:
soup = BeautifulSoup(your_html)
div = soup('div',{'id':'intForecast'})[0]
tds = div.find('table').findAll('td')
for td in tds:
day = td('span')[0].text
forecast = td('div')[1].text
print day, forecast