在 CSV 中格式化任意数据

format arbitrary data in CSV

我有一个任意格式的文件:

Name:pod1
Image:image1
cpu:2
memory:1000Mi
cpu:300m
memory:1000Mi
Name:pod2
Image:image2
cpu:2
memory:1000Mi
cpu:300m
memory:1000Mi

它实际上是来自 Kubernetes 集群的 pods 列表。

我需要像这样转换 csv 中的数据:

Name,Image,cpu,memory
pod1,image1,2,1000Mi
pod1,image1,300m,1000Mi
pod2,image2,2,1000Mi
pod2,image2,300m,1000Mi

第 2 行和第 4 行的前 2 个值重复第 1 行和第 3 行的前 2 个值。

我希望在 bash 中有一个结合 grep/sed/awk 的解决方案,但这就是我的挑剔。我对 Python 甚至 Powershell 中的任何解决方案都完全满意。

非常感谢!

假设顺序是固定的:当我看到“内存”行时,我将打印一条完整的记录。

awk '
    BEGIN {FS = ":"; OFS = ","; print "Name","Image","cpu","memory"}
    {record[] = }
     == "memory" {print record["Name"], record["Image"], record["cpu"], record["memory"]}
' file

一个通用但基于 python 的解决方案:

from pandas import DataFrame

text = """Name:pod1
Image:image1
cpu:2
memory:1000Mi
cpu:300m
memory:1000Mi
Name:pod2
Image:image2
cpu:2
memory:1000Mi
cpu:300m
memory:1000Mi"""

lines = text.splitlines(keepends=False)
record = dict()
records = list()
for line_number, line in enumerate(lines):
    field, value = line.split(':')
    if field in record:
        # we have seen this field before thus this record is complete
        # and the field value pair belongs to the next record
        records.append(record)
        # create a new empty record
        record = dict()
    # set the value for the current record
    record[field] = value

dataframe = DataFrame(records)
dataframe.to_csv('here_we_go.csv', index=False)

注意:此解决方案也适用于缺失或未排序的字段,但两者的组合可能会破坏它。

由于顺序总是一致的,只要看到 memory 键就可以写入一行:

import csv

with open('input.txt', newline='') as f_input, open('output.csv', 'w', newline='') as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames=['Name', 'Image', 'cpu', 'memory'])
    csv_output.writeheader()
    block = {}
    
    for row in csv.reader(f_input, delimiter=':'):
        if len(row) == 2:   # skip blank lines
            block[row[0]] = row[1]
        
            if row[0] == 'memory':
                csv_output.writerow(block)

给予:

Name,Image,cpu,memory
pod1,image1,2,1000Mi
pod1,image1,300m,1000Mi
pod2,image2,2,1000Mi
pod2,image2,300m,1000Mi