在 python 中拆分列表的元素

Question

我有一个列表，其中包含一组文件的历史记录。我需要将列表中的每个元素分成几列并将其保存到 CSV 文件中。我需要的列是 "commit_id, filename, committer, date, time, line_number, code"。我尝试使用 space 拆分它们，但它对提交者和代码不起作用。另外，我需要删除提交者姓名前的左括号和行号后的右括号。假设，这是我的清单：

my_list = [
 'f5213095324 master/ActiveMasterManager.java              (Michael Stack      2010-08-31 23:51:44 +0000   1) /**',
 'f5213095324 master/ActiveMasterManager.java              (Michael Stack      2010-08-31 23:51:44 +0000   2)  *',
 'f5213095324 master/ActiveMasterManager.java              (Michael Stack      2010-08-31 23:51:44 +0000   3)  * Licensed to the Apache Software Foundation (ASF) under one',
 'f5213095324 master/ActiveMasterManager.java              (Michael Stack      2010-08-31 23:51:44 +0000   4)  * or more contributor license agreements.',
 ...
 'd6ed1130d51 master/ActiveMasterManager.java              (Michael Stack      2011-04-28 19:51:25 +0000 281) }'
 ]

所需的 csv 输出：

commit_id   | filename                         | committer     | date       | time     | line_number | code 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
f5213095324 | master/ActiveMasterManager.java  | Michael Stack | 2010-08-31 | 23:51:44 | 1           | /**
f5213095324 | master/ActiveMasterManager.java  | Michael Stack | 2010-08-31 | 23:51:44 | 2           | *
f5213095324 | master/ActiveMasterManager.java  | Michael Stack | 2010-08-31 | 23:51:44 | 3           | * Licensed to the Apache Software Foundation (ASF) under one
f5213095324 | master/ActiveMasterManager.java  | Michael Stack | 2010-08-31 | 23:51:44 | 4           | * or more contributor license agreements.
........
d6ed1130d51 | master/ActiveMasterManager.java  | Michael Stack | 2011-04-28 | 19:51:25 | 281         | }

我尝试使用方法 str(my_list).replace(" ",'').split(" ") 创建一个新列表，然后再将其保存到 csv 文件中，但没有成功。任何帮助将不胜感激。谢谢

Answer 1

我认为你的文件是 tsv

试试这个。

import csv
with open('eggs.csv', newline='') as csvfile:
    spamreader = csv.reader(csvfile, delimiter='\t', quotechar='|')
    for row in spamreader:
        print(' | '.join(row))

如果这没有帮助，那么我认为您可能必须使用正则表达式，因为您的值中有 space，并且文件也是 space 分隔的。

Answer 2

这是一个正则表达式解决方案

import re
import csv

my_list = [
     'f5213095324 master/ActiveMasterManager.java              (Michael Stack      2010-08-31 23:51:44 +0000   1) /**',
     'f5213095324 master/ActiveMasterManager.java              (Michael Stack      2010-08-31 23:51:44 +0000   2)  *',
     'f5213095324 master/ActiveMasterManager.java              (Michael Stack      2010-08-31 23:51:44 +0000   3)  * Licensed to the Apache Software Foundation (ASF) under one',
     'f5213095324 master/ActiveMasterManager.java              (Michael Stack      2010-08-31 23:51:44 +0000   4)  * or more contributor license agreements.',
     'd6ed1130d51 master/ActiveMasterManager.java              (Michael Stack      2011-04-28 19:51:25 +0000 281) }'
     ]


pat = re.compile(r'(?P<commit_id>\w+)\s+(?P<filename>[^\s]+)\s+\((?P<commiter>.+)\s+(?P<date>\d{4}-\d\d-\d\d)\s+(?P<time>\d\d:\d\d:\d\d).+(?P<line_number>\b\d+\b)\)\s+(?P<code>.+)')

with open('somefile.csv', 'w+', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['commit_id', 'filename', 'commiter', 'date', 'time', 'line_number', 'code'])
    for line in my_list:
        writer.writerow([field.strip() for field in pat.match(line).groups()])

您可能想尝试一下 csv.writer 以获得您想要的美化输出。这以

结束

commit_id,filename,commiter,date,time,line_number,code
f5213095324,master/ActiveMasterManager.java,Michael Stack,2010-08-31,23:51:44,1,/**
f5213095324,master/ActiveMasterManager.java,Michael Stack,2010-08-31,23:51:44,2,*
f5213095324,master/ActiveMasterManager.java,Michael Stack,2010-08-31,23:51:44,3,* Licensed to the Apache Software Foundation (ASF) under one
f5213095324,master/ActiveMasterManager.java,Michael Stack,2010-08-31,23:51:44,4,* or more contributor license agreements.
d6ed1130d51,master/ActiveMasterManager.java,Michael Stack,2011-04-28,19:51:25,281,}

Answer 3

可能有点乱七八糟，但在 Python2.7 上提供了您想要的确切格式这是可以解决的问题。根据我的知识和一些 stack-overflow 搜索结果

my_list = [
     'f5213095324 master/ActiveMasterManager.java              (Michael Stack      2010-08-31 23:51:44 +0000   1) /**',
     'f5213095324 master/ActiveMasterManager.java              (Michael Stack      2010-08-31 23:51:44 +0000   2)  *',
     'f5213095324 master/ActiveMasterManager.java              (Michael Stack      2010-08-31 23:51:44 +0000   3)  * Licensed to the Apache Software Foundation (ASF) under one',
     'f5213095324 master/ActiveMasterManager.java              (Michael Stack      2010-08-31 23:51:44 +0000   4)  * or more contributor license agreements.',
     'd6ed1130d51 master/ActiveMasterManager.java              (Michael Stack      2011-04-28 19:51:25 +0000 281) }'
     ]

import re
import csv
from time import sleep
def SpaceToDelimit(Str, orig, new, Nright):
    li = Str.rsplit(orig, Nright)
    return new.join(li)
def nth_repl(s, sub, repl, nth):
    find = s.find(sub)
    # if find is not p1 we have found at least one match for the substring
    i = find != -1
    # loop util we find the nth or we find no match
    while find != -1 and i != nth:
        # find + 1 means we start at the last match start index + 1
        find = s.find(sub, find + 1)
        i += 1
    # if i  is equal to nth we found nth matches so replace
    if i == nth:
        return s[:find]+repl+s[find + len(sub):]
    return s
# notice my input was from your my_list above
spamreader = csv.reader(my_list, delimiter='\t', quotechar='|')
print "commit_id   | filename                        | committer     | date \      | time     | line_number | code "\
print "---------------------------------------------------------------------------"
for row in spamreader:
    row = str(row)
    row = re.sub(' +',' ',row)
    rowz = (''.join(row))
    nl= rowz[2:-3]
    nl = nl.replace(" ", " | ", 8) 
    nl = nl.replace("(","") 
    nl = nl.replace(")","")
    TEXT = nth_repl(nl, " | ", " ", 3)
    print TEXT

打印结果：

commit_id   | filename                        | committer     | date       | time     | line_number | code 
-------------------------------------------------------------------------------------------------------------
f5213095324 | master/ActiveMasterManager.java | Michael Stack | 2010-08-31 | 23:51:44 | +0000 | 1 | /*
f5213095324 | master/ActiveMasterManager.java | Michael Stack | 2010-08-31 | 23:51:44 | +0000 | 2 | 
f5213095324 | master/ActiveMasterManager.java | Michael Stack | 2010-08-31 | 23:51:44 | +0000 | 3 | * Licensed to the Apache Software Foundation ASF under on
f5213095324 | master/ActiveMasterManager.java | Michael Stack | 2010-08-31 | 23:51:44 | +0000 | 4 | * or more contributor license agreements
d6ed1130d51 | master/ActiveMasterManager.java | Michael Stack | 2011-04-28 | 19:51:25 | +0000 | 281 |

在 python 中拆分列表的元素

Splitting elements of a list in python

python

csv

element

list