用 Beautiful Soup 解析 div children

Question

我正在使用 beautiful soup 在页面上查找和解析街道地址。最后，我想将街道地址写入 excel 文档。

这是我要解析的页面：https://montreal.lufa.com/en/pick-up-points

相关页面有 div 个元素列在 class 下的同一级别。我无法解析 individual 行。相反，我的代码只是吐出 class.

下的所有内容

到目前为止我的代码：

from bs4 import BeautifulSoup
from urllib2 import urlopen
import urllib2

URL = "https://montreal.lufa.com/en/pick-up-points"
html = urllib2.urlopen(URL).read().decode('UTF-8')

soup = BeautifulSoup(html, "html5lib")

business = (soup.find('div', class_="info"))

print (business)

如有任何帮助，我们将不胜感激！

Answer 1

我会执行以下操作：对于每个企业，找到开放日并获得 every previous sibling:

for business in soup.find_all('div', class_="info"):
    days = business.find("div", class_="days")

    print(" ".join(sibling.get_text(strip=True) 
                   for sibling in reversed(days.find_previous_siblings())))

打印：

1600, René-Lévesque west 1600, René-Lévesque west Montreal, Quebec H3H 1P9
555 Chabanel Street West 555 Chabanel Street West Montreal, Quebec H2N 2H8
À la Boîte à Fleurs 3266 Saint-Rose Boulevard Laval, Quebec H7P 4K8
Allez Up Centre d'escalade 1555 St-Patrick Montreal, Quebec H3K 2B7
...
YMCA Cartierville 11885 Laurentien Boulevard Montreal, Quebec H4J 2R5
Zone, Real estate Agency 200 rue St-Jean Longueuil, Quebec J4H 2X5

Answer 2

太棒了，alecxe！这是我为使其在我的机器上运行所做的工作。 . . .

#1)  In Console:  
pip install lxml


#2)  Run script below:
from bs4 import BeautifulSoup
from urllib2 import urlopen
import urllib2

URL = "https://montreal.lufa.com/en/pick-up-points"
html = urllib2.urlopen(URL).read().decode('UTF-8')

soup = BeautifulSoup(html, "lxml")

#business = (soup.find('div', class_="info"))
for business in soup.find_all('div', class_="info"):
    days = business.find("div", class_="days")

    print(" ".join(sibling.get_text(strip=True) 
                   for sibling in reversed(days.find_previous_siblings())))
print (business)

用 Beautiful Soup 解析 div children

Parsing div children with Beautiful Soup

html

python

beautifulsoup

html-parsing

python-3.x