使用 BeautifulSoup，如何只从特定选择器中获取文本而没有子项中的文本？

Question

我不知道如何编码 BeautifulSoup 以便它只提供所选标签中的文本。我得到更多如其子(ren)的文字！

例如：

from bs4 import BeautifulSoup
soup = BeautifulSoup('<div id="left"><ul><li>"I want this text"<a href="someurl.com"> I don\'t want this text</a><p>I don\'t want this either</li><li>"Good"<a href="someurl.com"> Not Good</a><p> Not Good either</li></ul></div>', "html5lib") 
x = soup.select('ul > li')
for i in x:
    print(i.text)

输出：

"I want this text" I don't want this textI don't want this either

"Good" Not Good Not Good either

期望的输出：

"I want this text"

"Good"

Answer 1

一个选项是获取 contents list 的第一个元素：

for i in x:
    print(i.contents[0])

另一个 - 找到第一个 文本节点:

for i in x:
    print(i.find(text=True))

两者都会打印：

"I want this text"
"Good"

Answer 2

from bs4 import BeautifulSoup
from bs4 import NavigableString
soup = BeautifulSoup('<div id="left"><ul><li>"I want this text"<a href="someurl.com"> I don\'t want this text</a><p>I don\'t want this either</li><li>"Good"<a href="someurl.com"> Not Good</a><p> Not Good either</li></ul></div>', "html5lib")
x = soup.select('ul > li')
for i in x:
    if isinstance(i.next_element, NavigableString):#if li's next child is a string
        print(i.next_element)

使用 BeautifulSoup，如何只从特定选择器中获取文本而没有子项中的文本？

Using BeautifulSoup, how to get text only from the specific selector without the text in the children?

python

beautifulsoup

html-parsing

web-scraping