在 CSV 中格式化任意数据
format arbitrary data in CSV
我有一个任意格式的文件:
Name:pod1
Image:image1
cpu:2
memory:1000Mi
cpu:300m
memory:1000Mi
Name:pod2
Image:image2
cpu:2
memory:1000Mi
cpu:300m
memory:1000Mi
它实际上是来自 Kubernetes 集群的 pods 列表。
我需要像这样转换 csv 中的数据:
Name,Image,cpu,memory
pod1,image1,2,1000Mi
pod1,image1,300m,1000Mi
pod2,image2,2,1000Mi
pod2,image2,300m,1000Mi
第 2 行和第 4 行的前 2 个值重复第 1 行和第 3 行的前 2 个值。
我希望在 bash 中有一个结合 grep/sed/awk 的解决方案,但这就是我的挑剔。我对 Python 甚至 Powershell 中的任何解决方案都完全满意。
非常感谢!
假设顺序是固定的:当我看到“内存”行时,我将打印一条完整的记录。
awk '
BEGIN {FS = ":"; OFS = ","; print "Name","Image","cpu","memory"}
{record[] = }
== "memory" {print record["Name"], record["Image"], record["cpu"], record["memory"]}
' file
一个通用但基于 python 的解决方案:
from pandas import DataFrame
text = """Name:pod1
Image:image1
cpu:2
memory:1000Mi
cpu:300m
memory:1000Mi
Name:pod2
Image:image2
cpu:2
memory:1000Mi
cpu:300m
memory:1000Mi"""
lines = text.splitlines(keepends=False)
record = dict()
records = list()
for line_number, line in enumerate(lines):
field, value = line.split(':')
if field in record:
# we have seen this field before thus this record is complete
# and the field value pair belongs to the next record
records.append(record)
# create a new empty record
record = dict()
# set the value for the current record
record[field] = value
dataframe = DataFrame(records)
dataframe.to_csv('here_we_go.csv', index=False)
注意:此解决方案也适用于缺失或未排序的字段,但两者的组合可能会破坏它。
由于顺序总是一致的,只要看到 memory
键就可以写入一行:
import csv
with open('input.txt', newline='') as f_input, open('output.csv', 'w', newline='') as f_output:
csv_output = csv.DictWriter(f_output, fieldnames=['Name', 'Image', 'cpu', 'memory'])
csv_output.writeheader()
block = {}
for row in csv.reader(f_input, delimiter=':'):
if len(row) == 2: # skip blank lines
block[row[0]] = row[1]
if row[0] == 'memory':
csv_output.writerow(block)
给予:
Name,Image,cpu,memory
pod1,image1,2,1000Mi
pod1,image1,300m,1000Mi
pod2,image2,2,1000Mi
pod2,image2,300m,1000Mi
我有一个任意格式的文件:
Name:pod1
Image:image1
cpu:2
memory:1000Mi
cpu:300m
memory:1000Mi
Name:pod2
Image:image2
cpu:2
memory:1000Mi
cpu:300m
memory:1000Mi
它实际上是来自 Kubernetes 集群的 pods 列表。
我需要像这样转换 csv 中的数据:
Name,Image,cpu,memory
pod1,image1,2,1000Mi
pod1,image1,300m,1000Mi
pod2,image2,2,1000Mi
pod2,image2,300m,1000Mi
第 2 行和第 4 行的前 2 个值重复第 1 行和第 3 行的前 2 个值。
我希望在 bash 中有一个结合 grep/sed/awk 的解决方案,但这就是我的挑剔。我对 Python 甚至 Powershell 中的任何解决方案都完全满意。
非常感谢!
假设顺序是固定的:当我看到“内存”行时,我将打印一条完整的记录。
awk '
BEGIN {FS = ":"; OFS = ","; print "Name","Image","cpu","memory"}
{record[] = }
== "memory" {print record["Name"], record["Image"], record["cpu"], record["memory"]}
' file
一个通用但基于 python 的解决方案:
from pandas import DataFrame
text = """Name:pod1
Image:image1
cpu:2
memory:1000Mi
cpu:300m
memory:1000Mi
Name:pod2
Image:image2
cpu:2
memory:1000Mi
cpu:300m
memory:1000Mi"""
lines = text.splitlines(keepends=False)
record = dict()
records = list()
for line_number, line in enumerate(lines):
field, value = line.split(':')
if field in record:
# we have seen this field before thus this record is complete
# and the field value pair belongs to the next record
records.append(record)
# create a new empty record
record = dict()
# set the value for the current record
record[field] = value
dataframe = DataFrame(records)
dataframe.to_csv('here_we_go.csv', index=False)
注意:此解决方案也适用于缺失或未排序的字段,但两者的组合可能会破坏它。
由于顺序总是一致的,只要看到 memory
键就可以写入一行:
import csv
with open('input.txt', newline='') as f_input, open('output.csv', 'w', newline='') as f_output:
csv_output = csv.DictWriter(f_output, fieldnames=['Name', 'Image', 'cpu', 'memory'])
csv_output.writeheader()
block = {}
for row in csv.reader(f_input, delimiter=':'):
if len(row) == 2: # skip blank lines
block[row[0]] = row[1]
if row[0] == 'memory':
csv_output.writerow(block)
给予:
Name,Image,cpu,memory
pod1,image1,2,1000Mi
pod1,image1,300m,1000Mi
pod2,image2,2,1000Mi
pod2,image2,300m,1000Mi