无法将值列表解析为字符串列表
Unable to parse a list of values into a list of strings
所以我需要解析 python 中的值列表,并对它们进行单热编码以进行特征工程。以下是我的特征集 'amenities' 列的一个样本的值。
x = {"Wireless Internet","Air conditioning",Kitchen,Heating,"Family/kid friendly",Essentials,"Hair dryer",Iron,"translation missing: en.hosting_amenity_50"}
这里的问题是它既有大括号“{}”,也有应该用双引号但没有用双引号的值(请参阅上例中的 Kitchen, Heating)。如果我能把上面的转换成一个字符串,那么我就知道如何去掉大括号并将它们拆分成一个列表了。
我需要将上面的内容转换成项目列表,其中不在双引号中的值变成字符串。
输入数据看起来已损坏。然而,最简单的方法是删除双引号,然后根据逗号分隔(我已经避开了花括号部分,因为它也可以很容易地删除):
s = '"Wireless Internet","Air conditioning",Kitchen,Heating,"Family/kid friendly",Essentials,"Hair dryer",Iron,"translation missing: en.hosting_amenity_50"'
print(s.replace('"','').split(","))
结果:
['Wireless Internet', 'Air conditioning', 'Kitchen', 'Heating', 'Family/kid friendly', 'Essentials', 'Hair dryer', 'Iron', 'translation missing: en.hosting_amenity_50']
当然,如果数据包含逗号,你就完蛋了,因为没有办法区分字段中的逗号和分隔符逗号,因为缺少引号...(否则 ast.literal_eval
解析)
完全剥离花括号的东西需要更多脏活儿,但可行:
s = 'x = {"Wireless Internet","Air conditioning",Kitchen,Heating,"Family/kid friendly",Essentials,"Hair dryer",Iron,"translation missing: en.hosting_amenity_50"}'
print(s.replace('"','').split("{")[1].rstrip('}').split(","))
所以我需要解析 python 中的值列表,并对它们进行单热编码以进行特征工程。以下是我的特征集 'amenities' 列的一个样本的值。
x = {"Wireless Internet","Air conditioning",Kitchen,Heating,"Family/kid friendly",Essentials,"Hair dryer",Iron,"translation missing: en.hosting_amenity_50"}
这里的问题是它既有大括号“{}”,也有应该用双引号但没有用双引号的值(请参阅上例中的 Kitchen, Heating)。如果我能把上面的转换成一个字符串,那么我就知道如何去掉大括号并将它们拆分成一个列表了。
我需要将上面的内容转换成项目列表,其中不在双引号中的值变成字符串。
输入数据看起来已损坏。然而,最简单的方法是删除双引号,然后根据逗号分隔(我已经避开了花括号部分,因为它也可以很容易地删除):
s = '"Wireless Internet","Air conditioning",Kitchen,Heating,"Family/kid friendly",Essentials,"Hair dryer",Iron,"translation missing: en.hosting_amenity_50"'
print(s.replace('"','').split(","))
结果:
['Wireless Internet', 'Air conditioning', 'Kitchen', 'Heating', 'Family/kid friendly', 'Essentials', 'Hair dryer', 'Iron', 'translation missing: en.hosting_amenity_50']
当然,如果数据包含逗号,你就完蛋了,因为没有办法区分字段中的逗号和分隔符逗号,因为缺少引号...(否则 ast.literal_eval
解析)
完全剥离花括号的东西需要更多脏活儿,但可行:
s = 'x = {"Wireless Internet","Air conditioning",Kitchen,Heating,"Family/kid friendly",Essentials,"Hair dryer",Iron,"translation missing: en.hosting_amenity_50"}'
print(s.replace('"','').split("{")[1].rstrip('}').split(","))