python 熊猫帮助从文本文件到自定义格式
python panda help from text file to custom format
我正在 python 中寻求帮助,在那里我可以将以下内容转换为列。
文本文件中的数据:
---- [ Job Information : 2926 ] ----
Name : Run26
User : abc
Account : xyz
Partition : q_24hrs
Nodes : node3
Cores : 36
State : COMPLETED
ExitCode : 0:0
Submit : 2020-12-15T10:23:22
Start : 2020-12-15T10:23:22
End : 2020-12-15T14:13:50
Waited : 00:00:00
Reserved walltime : 1-00:00:00
Used walltime : 03:50:28
Used CPU time : 00:00:00
所需输出:- [保持此 header 不变]
Job id,Name,User,Account,Partition,Nodes,Cores
2926,abc,xyz,q_24hrs,node3,36
提前致谢....
您可以使用此示例如何使用 re
模块解析文本文件:
import re
with open("your_file.txt", "r") as f_in:
data = f_in.read()
job_ids = re.findall(r"Job Information : (\d+)", data)
names = re.findall(r"Name\s*:\s*(.*)", data)
users = re.findall(r"User\s*:\s*(.*)", data)
accounts = re.findall(r"Account\s*:\s*(.*)", data)
partitions = re.findall(r"Partition\s*:\s*(.*)", data)
nodes = re.findall(r"Nodes\s*:\s*(.*)", data)
cores = re.findall(r"Cores\s*:\s*(.*)", data)
df = pd.DataFrame(
zip(job_ids, names, users, accounts, partitions, nodes, cores),
columns=[
"Job id",
"Name",
"User",
"Account",
"Partition",
"Nodes",
"Cores",
],
)
print(df)
df.to_csv("data.csv", index=False)
创建 data.csv
:
Job id,Name,User,Account,Partition,Nodes,Cores
2926,Run26,abc,xyz,q_24hrs,node3,36
我正在 python 中寻求帮助,在那里我可以将以下内容转换为列。
文本文件中的数据:
---- [ Job Information : 2926 ] ----
Name : Run26
User : abc
Account : xyz
Partition : q_24hrs
Nodes : node3
Cores : 36
State : COMPLETED
ExitCode : 0:0
Submit : 2020-12-15T10:23:22
Start : 2020-12-15T10:23:22
End : 2020-12-15T14:13:50
Waited : 00:00:00
Reserved walltime : 1-00:00:00
Used walltime : 03:50:28
Used CPU time : 00:00:00
所需输出:- [保持此 header 不变]
Job id,Name,User,Account,Partition,Nodes,Cores
2926,abc,xyz,q_24hrs,node3,36
提前致谢....
您可以使用此示例如何使用 re
模块解析文本文件:
import re
with open("your_file.txt", "r") as f_in:
data = f_in.read()
job_ids = re.findall(r"Job Information : (\d+)", data)
names = re.findall(r"Name\s*:\s*(.*)", data)
users = re.findall(r"User\s*:\s*(.*)", data)
accounts = re.findall(r"Account\s*:\s*(.*)", data)
partitions = re.findall(r"Partition\s*:\s*(.*)", data)
nodes = re.findall(r"Nodes\s*:\s*(.*)", data)
cores = re.findall(r"Cores\s*:\s*(.*)", data)
df = pd.DataFrame(
zip(job_ids, names, users, accounts, partitions, nodes, cores),
columns=[
"Job id",
"Name",
"User",
"Account",
"Partition",
"Nodes",
"Cores",
],
)
print(df)
df.to_csv("data.csv", index=False)
创建 data.csv
:
Job id,Name,User,Account,Partition,Nodes,Cores
2926,Run26,abc,xyz,q_24hrs,node3,36