"Type 'lxml.etree._ElementUnicodeResult' cannot be serialized"

Question

我正在使用 lxml 从网页中提取数据，但无法将生成的 ElementUnicode 对象转换为字符串。这是我的代码：

from lxml import html
from lxml import etree
from lxml.etree import tostring

url = 'https://www.imdb.com/title/tt5848272/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2413b25e-e3f6-4229-9efd-599bb9ab1f97&pf_rd_r=9S5A89ZHEXE4K8SZBC40&pf_rd_s=right-2&pf_rd_t=15061&pf_rd_i=homepage&ref_=hm_otw_t0'

page = requests.get('url')
tree = html.fromstring(page.content)

a = tree.xpath('//div[@class="credit_summary_item"]/a[../h4/text() = "Directors:"]/text()')
mynewlist = []
for i in a:
  b = etree.tostring(i, method="text")
  mynewlist.append(b)

这是我得到的错误：

TypeError: Type 'lxml.etree._ElementUnicodeResult' cannot be serialized.

如有任何帮助，我们将不胜感激。

Answer 1

i 变量是一个 _ElementUnicodeResult object (a special type of string). You cannot use it as an argument to tostring()。

a 变量（XPath 求值的结果）是您想要的字符串列表。如果此列表的元素必须是纯字符串而不是 _ElementUnicodeResult 对象，则可以使用列表理解：

newlist = [str(s) for s in a]

Answer 2

我也无法将 'lxml.etree._ElementUnicodeResult' 转换为字符串。

然后我发现了以下link。

https://lxml.de/api/lxml.etree._ElementUnicodeResult-class.html

可以看到_ElementUnicodeResult继承了unicode的很多功能。

我使用了__str__()函数，将其转换为字符串类型。

它还直接支持其他一些字符串操作。您可以在 link 中查看。希望这会有所帮助 ;)

Answer 3

text = ''.join([str(s) for s in elementUnicodeResult])

"Type 'lxml.etree._ElementUnicodeResult' cannot be serialized"

"Type 'lxml.etree._ElementUnicodeResult' cannot be serialized"

python

lxml

web-scraping