Select 多个元素 BeautifulSoup 并单独管理它们

Question

我正在使用BeautifulSoup解析一个诗歌网页。诗歌分为 h3 诗名，.line 每行诗。我可以获得这两个元素并将它们添加到列表中。但我想将 h3 操作为大写并指示换行符，然后将其插入到行列表中。

    linesArr = []
    for lines in full_text:
        booktitles = lines.select('h3')
        for booktitle in booktitles:
            linesArr.append(booktitle.text.upper())
            linesArr.append('')
        for line in lines.select('h3, .line'):
            linesArr.append(line.text)

此代码将所有书名附加到列表的开头，然后继续获取 h3 和 .line 项。我试过插入这样的代码：

    linesArr = []
    for lines in full_text:
        for line in lines.select('h3, .line'):
            if line.find('h3'):
                linesArr.append(line.text.upper())
                linesArr.append('')
            else:
                linesArr.append(line.text)

Answer 1

我不确定你想做什么，但在这里你可以用大写的方式得到一个数组，标题和你的所有行：

#!/usr/bin/python3
# coding: utf8

from bs4 import BeautifulSoup
import requests

page = requests.get("https://quod.lib.umich.edu/c/cme/CT/1:1?rgn=div2;view=fulltext")
soup = BeautifulSoup(page.text, 'html.parser')

title = soup.find('h3')
full_lines = soup.find_all('div',{'class':'line'})

linesArr = []
linesArr.append(title.get_text().upper())
for line in full_lines:
    linesArr.append(line.get_text())

# Print full array with the title and text
print(linesArr)

# Print text here with line break
for linea in linesArr:
    print(linea + '\n')

Select 多个元素 BeautifulSoup 并单独管理它们

Select multiple elements with BeautifulSoup and manage them individually

python

beautifulsoup

html-parsing

web-scraping