在 Python 2.7 中将 .csv.gz 转换为 .csv

Question

我已经阅读了 SO 和其他地方的文档和一些其他帖子，但我不太明白这个概念：

当您调用 csvFilename = gzip.open(filename, 'rb') 然后 reader = csv.reader(open(csvFilename)) 时，reader 不是有效的 csv 文件吗？

我正在尝试解决下面列出的问题，但在第 41 行和第 7 行（下面突出显示）出现 coercing to Unicode: need string or buffer, GzipFile found 错误，这让我相信 gzip.open 和 csv.reader 不像我之前想的那样工作。

我正在尝试解决的问题

我正在尝试将 results.csv.gz 转换为 results.csv 以便我可以将 results.csv 转换为 python 字典，然后将其与另一个 python 字典。

文件 1：

alertFile = payload.get('results_file')
alertDataCSV = rh.dataToDict(alertFile) # LINE 41
alertDataTotal = rh.mergeTwoDicts(splunkParams, alertDataCSV)

调用文件 2：

import gzip
import csv

def dataToDict(filename):
    csvFilename = gzip.open(filename, 'rb')
    reader = csv.reader(open(csvFilename)) # LINE 7
    alertData={}
    for row in reader:
        alertData[row[0]]=row[1:]
    return alertData

def mergeTwoDicts(dictA, dictB):
    dictC = dictA.copy()
    dictC.update(dictB)
    return dictC

*编辑：也请原谅我在 Python

中的非 PEP 命名风格

Answer 1

gzip.open returns 类文件对象（与plain open returns相同），不是解压文件的名称。只需将结果直接传递给 csv.reader 即可（csv.reader 将接收解压缩的行）。 csv 确实需要文本，所以在 Python 3 上你需要打开它以阅读文本（在 Python 2 'rb' 上很好，模块不处理编码，但是 csv 模块也没有）。只需更改：

csvFilename = gzip.open(filename, 'rb')
reader = csv.reader(open(csvFilename))

至：

# Python 2
csvFile = gzip.open(filename, 'rb')
reader = csv.reader(csvFile)  # No reopening involved

# Python 3
csvFile = gzip.open(filename, 'rt', newline='')  # Open in text mode, not binary, no line ending translation
reader = csv.reader(csvFile)  # No reopening involved

Answer 2

以下对我有用 python==3.7.9：

import gzip

my_filename = my_compressed_file.csv.gz

with gzip.open(my_filename, 'rt') as gz_file:
    data = gz_file.read() # read decompressed data
    with open(my_filename[:-3], 'wt') as out_file:
         out_file.write(data) # write decompressed data

my_filename[:-3]是获取实际的文件名，这样它确实得到一个随机的文件名。

在 Python 2.7 中将 .csv.gz 转换为 .csv

Converting a .csv.gz to .csv in Python 2.7

python

csv

dictionary

gzip

python-2.7