Store lxml.etree._ElementTree objects in dataframe: TypeError: can't pickle lxml.etree._ElementTree objects

Store lxml.etree._ElementTree objects in dataframe: TypeError: can't pickle lxml.etree._ElementTree objects

我试图将 lxml.etree._ElementTree 对象存储在数据框中。不幸的是,pandas 无法识别这些对象。有没有办法仍然将它们存储在数据帧中,或者是否有另一种方法可以将所有信息存储在一个文件中,并且具有良好的 read/write 速度和文件大小?

下面是重现错误的示例:

import pandas as pd

import lxml
from lxml import etree

s = '''<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>'''

doc = etree.fromstring(s)
root = etree.ElementTree(doc)

df = pd.DataFrame(data = [["name1", "date1", root]], columns = ["name", "date", "root"])
df.to_pickle(r"D:\test\test.pkl")
# TypeError: can't pickle lxml.etree._ElementTree objects

回溯:

Traceback (most recent call last):

  File "<...>", line 2, in <module>
    df.to_pickle(r"D:\test\test.pkl")

  File "...\Anaconda\envs\...\lib\site-packages\pandas\core\generic.py", line 2771, in to_pickle
    to_pickle(self, path, compression=compression, protocol=protocol)

  File "...\Anaconda\envs\...\lib\site-packages\pandas\io\pickle.py", line 76, in to_pickle
    f.write(pickle.dumps(obj, protocol=protocol))

TypeError: can't pickle lxml.etree._ElementTree objects

对于未来的读者,通过在保存之前将 etree 转换为字符串来修复它:

df["root"] = df["root"].map(lambda x: etree.tostring(x, encoding='utf8', method='xml'))
df.to_pickle(r"D:\test.pkl")


df = pd.read_pickle(r"D:\test.pkl")
df["root"] = df["root"].map(etree.fromstring).map(etree.ElementTree)