无法将值列表解析为字符串列表

Unable to parse a list of values into a list of strings

所以我需要解析 python 中的值列表,并对它们进行单热编码以进行特征工程。以下是我的特征集 'amenities' 列的一个样本的值。

x = {"Wireless Internet","Air conditioning",Kitchen,Heating,"Family/kid friendly",Essentials,"Hair dryer",Iron,"translation missing: en.hosting_amenity_50"}

这里的问题是它既有大括号“{}”,也有应该用双引号但没有用双引号的值(请参阅上例中的 Kitchen, Heating)。如果我能把上面的转换成一个字符串,那么我就知道如何去掉大括号并将它们拆分成一个列表了。

我需要将上面的内容转换成项目列表,其中不在双引号中的值变成字符串。

输入数据看起来已损坏。然而,最简单的方法是删除双引号,然后根据逗号分隔(我已经避开了花括号部分,因为它也可以很容易地删除):

s = '"Wireless Internet","Air conditioning",Kitchen,Heating,"Family/kid friendly",Essentials,"Hair dryer",Iron,"translation missing: en.hosting_amenity_50"'

print(s.replace('"','').split(","))

结果:

['Wireless Internet', 'Air conditioning', 'Kitchen', 'Heating', 'Family/kid friendly', 'Essentials', 'Hair dryer', 'Iron', 'translation missing: en.hosting_amenity_50']

当然,如果数据包含逗号,你就完蛋了,因为没有办法区分字段中的逗号和分隔符逗号,因为缺少引号...(否则 ast.literal_eval解析)

完全剥离花括号的东西需要更多脏活儿,但可行:

s = 'x = {"Wireless Internet","Air conditioning",Kitchen,Heating,"Family/kid friendly",Essentials,"Hair dryer",Iron,"translation missing: en.hosting_amenity_50"}'

print(s.replace('"','').split("{")[1].rstrip('}').split(","))