正在为 link 解析 table

Question

我已经能够使用 Python 2.7 中的 Beautiful Soup 在 html table 中隔离一行。是一次学习经历，但很高兴能走到这一步。不幸的是，接下来我有点卡住了。

我需要获取 "Select document Remittance Report I format XLS" 输入后的 link。因为这可以改变外观顺序，所以它需要是动态的。我不确定如何找到该输入，然后获取它后面的 link。

我一直在尝试一些 findAll 和 nextSibling 方法，但我对 python 和美味汤的经验不足让我退缩了。 BeautifulSoup 文档很棒，但有点让我头疼。

<tr class="odd">
 <td header="c1">
  Report Download
 </td>
 <td header="c2">
  <input aria-label="Select Report format PDF" id="documentChkBx0" name="documentChkBx" type="checkbox" value="5446"/>
  <a href="/a/document.html?key=5446">
   <img alt="Portable Document Format" src="/img/icons/icon_PDF.gif">
   </img>
  </a>
  <input aria-label="Select Report format XLS" id="documentChkBx1" name="documentChkBx" type="checkbox" value="5447"/>
  <a href="/a/document.html?key=5447">
   <img alt="Excel Spreadsheet Format" src="/img/icons/icon_XLS.gif">
   </img>
  </a>
 </td>
 <td header="c4">
  04/27/2015
 </td>
 <td header="c5">
  05/26/2015
 </td>
 <td header="c6">
  05/26/2015 10:00AM EDT
 </td>
</tr>

Answer 1

通过检查 aria-label 属性找到 input 并获取 following a sibling element:

label = soup.find("input", {"aria-label": "Select Report format XLS"})
link = label.find_next_sibling("a", href=True)["href"]

正在为 link 解析 table

Parsing table for a link

html

python

beautifulsoup