如何正确读取带单引号的 csv 文件?
How to correctly read a csv-file with single quotes?
正在尝试读取 .csv
-文件,其中的行如下所示:
gif,940ff2312-4325-8898dfs-9ce1ca56c5sfb,'[{"mid": "/m/083dsf", "description": "buff", "probability": 0.9663228988647461, "topic": 0.9663228988647461}]'
我需要阅读这些行并将其放入两个列表中:gif
和bif
。每个列表必须包含成对的元组:第一个字符串('gif' 在我的示例中 ), 字典列表 (我的示例中单引号中的第三个元素).
我不知道如何正确解析它,因为 read_csv
这样做会引发错误。尝试了简单的字符串方法,它有效,但修复字典列表很复杂,我认为它不是 good/not 最佳。已尝试 JSON --- 无效。
这是我的方法:
gif = []
bif = []
with open('file.csv', 'r', encoding = 'utf-8') as file:
lines = file.readlines()
for line in lines:
obj = line[:line.find(',')]
arr = line[line.find('['):-2]
json_acceptable_string = arr.replace("'", "\"")
arr = json.loads(json_acceptable_string)
if obj == 'gif':
gif.append((obj, arr))
elif obj == 'bif':
bif.append((obj, arr))
有什么解决办法吗?也许 pandas
中存在一些误解和好的技巧?
更新: 我也这样试过:
import csv
gif = []
bif = []
with open('file.csv', 'rt', encoding='utf-8') as file:
csv_reader = csv.reader(file, delimiter=',', quotechar="'")
for line in csv_reader:
for obj, Id, objArr in line: # here I'm trying to split it in 3 objects
if obj == 'gif':
gif.append((obj, arr))
elif obj == 'bif':
bif.append((obj, arr))
但它引发了:
ValueError: too many values to unpack (expected 3)
您可以使用 quotechar
format parameter 正确解析单引号 JSON 字符串:
import csv
with open('file.csv') as csv_file:
reader = csv.reader(csv_file, delimiter=',', quotechar="'")
for row in reader:
print(row)
# If you want to parse the json, you can do:
# `json.loads(row[-1])` (requires the json module)
# Kudos to @juanpa.arrivillaga for the suggestion!
根据您提供的示例数据,这会根据需要产生以下输出:
['gif',
'940ff2312-4325-8898dfs-9ce1ca56c5sfb',
'[{"mid": "/m/083dsf", "description": "buff", "probability": 0.9663228988647461, "topic": 0.9663228988647461}]']
CSV 文件中的数据为
gif,940ff2312-4325-8898dfs-9ce1ca56c5sfb,'[{"mid": "/m/083dsf", "description": "buff", "probability": 0.9663228988647461, "topic": 0.9663228988647461}]'
要用单引号处理数组,pandas read_csv 构造函数有 "quotechar="'"
读取为数据帧:
df=pd.read_csv("touch.csv",header=None,quotechar="'",names=['key','code','arr'])
一种方法是转储为 json:
import json
for each in df['arr']:
my_json=json.dumps(each)
print(my_json)
另一种方式可以理解为字典数据结构,ast模块在这里可以派上用场。所以把它读成一个字符串,然后把它转换成字典。
my_list_of_dictionary=[ast.literal_eval (each.replace("[","").replace("]","")) for each in df['arr'] ]
for each_dict in my_list_of_dictionary:
print(f"Type:{type(each_dict)} value: {each_dict}")
输出:
正在尝试读取 .csv
-文件,其中的行如下所示:
gif,940ff2312-4325-8898dfs-9ce1ca56c5sfb,'[{"mid": "/m/083dsf", "description": "buff", "probability": 0.9663228988647461, "topic": 0.9663228988647461}]'
我需要阅读这些行并将其放入两个列表中:gif
和bif
。每个列表必须包含成对的元组:第一个字符串('gif' 在我的示例中 ), 字典列表 (我的示例中单引号中的第三个元素).
我不知道如何正确解析它,因为 read_csv
这样做会引发错误。尝试了简单的字符串方法,它有效,但修复字典列表很复杂,我认为它不是 good/not 最佳。已尝试 JSON --- 无效。
这是我的方法:
gif = []
bif = []
with open('file.csv', 'r', encoding = 'utf-8') as file:
lines = file.readlines()
for line in lines:
obj = line[:line.find(',')]
arr = line[line.find('['):-2]
json_acceptable_string = arr.replace("'", "\"")
arr = json.loads(json_acceptable_string)
if obj == 'gif':
gif.append((obj, arr))
elif obj == 'bif':
bif.append((obj, arr))
有什么解决办法吗?也许 pandas
中存在一些误解和好的技巧?
更新: 我也这样试过:
import csv
gif = []
bif = []
with open('file.csv', 'rt', encoding='utf-8') as file:
csv_reader = csv.reader(file, delimiter=',', quotechar="'")
for line in csv_reader:
for obj, Id, objArr in line: # here I'm trying to split it in 3 objects
if obj == 'gif':
gif.append((obj, arr))
elif obj == 'bif':
bif.append((obj, arr))
但它引发了:
ValueError: too many values to unpack (expected 3)
您可以使用 quotechar
format parameter 正确解析单引号 JSON 字符串:
import csv
with open('file.csv') as csv_file:
reader = csv.reader(csv_file, delimiter=',', quotechar="'")
for row in reader:
print(row)
# If you want to parse the json, you can do:
# `json.loads(row[-1])` (requires the json module)
# Kudos to @juanpa.arrivillaga for the suggestion!
根据您提供的示例数据,这会根据需要产生以下输出:
['gif',
'940ff2312-4325-8898dfs-9ce1ca56c5sfb',
'[{"mid": "/m/083dsf", "description": "buff", "probability": 0.9663228988647461, "topic": 0.9663228988647461}]']
CSV 文件中的数据为
gif,940ff2312-4325-8898dfs-9ce1ca56c5sfb,'[{"mid": "/m/083dsf", "description": "buff", "probability": 0.9663228988647461, "topic": 0.9663228988647461}]'
要用单引号处理数组,pandas read_csv 构造函数有 "quotechar="'"
读取为数据帧:
df=pd.read_csv("touch.csv",header=None,quotechar="'",names=['key','code','arr'])
一种方法是转储为 json:
import json
for each in df['arr']:
my_json=json.dumps(each)
print(my_json)
另一种方式可以理解为字典数据结构,ast模块在这里可以派上用场。所以把它读成一个字符串,然后把它转换成字典。
my_list_of_dictionary=[ast.literal_eval (each.replace("[","").replace("]","")) for each in df['arr'] ]
for each_dict in my_list_of_dictionary:
print(f"Type:{type(each_dict)} value: {each_dict}")
输出: