如何使用 python 将网络抓取输出分配给数组？

Question

我想执行此操作并从 title 和 href 属性中获取所有文本。代码运行，我确实获得了所有需要的数据，但我想将输出分配给一个数组，当我尝试分配它时，它只给我 HTML 中属性的最后一个实例.

from bs4 import BeautifulSoup
import urllib

r = urllib.urlopen('http://www.genome.jp/kegg-bin/show_pathway?map=hsa05215&show_description=show').read()
soup = BeautifulSoup((r), "lxml")
for area in soup.find_all('area', href=True):
    print area['href']
for area in soup.find_all('area', title=True):
    print area['title']

如果有帮助，我会这样做，因为稍后我会用这些数据创建一个列表。我刚刚开始学习，所以非常感谢额外的解释。

Answer 1

你需要使用 list comprehensions:

links = [area['href'] for area in soup.find_all('area', href=True)]
titles = [area['title'] for area in soup.find_all('area', title=True)]

如何使用 python 将网络抓取输出分配给数组？

How can I assign web scraping outputs to an array using python?

urllib2

beautifulsoup

web-scraping

python-2.7