将列表项拆分为 csv 列
Spliting list items into csv columns
我正在尝试制作一个传输应用程序,它将获取文件名并将其转换为已发布文档的 csv 记录。目前,python 调用给定文件夹中的所有文件名,创建一个列表,并将文件名拆分为文档编号、修订版和标题。
目前我已经能够 python 抓取文件名,创建此信息的列表,然后拆分它们以创建新的单独数据列表,即 documentnumber ,revision,title.pdf 到 [文献编号、修订、标题].
def getFiles():
i = 0
path = input("Paste in path for outgoing folder: ")
numTitleRev = os.listdir(path)
issueRec = []
fileData = []
totalList = len(numTitleRev)
listNumber = str(totalList)
print('\n' + "The total amount of documents in this folder is: " + listNumber + '\n')
csvOutput = []
while i < totalList:
for item in numTitleRev:
fileSplit = item.split(',', 2)
fileTitle = fileSplit.pop(2)
fileRev = fileSplit.pop(1)
fileNum = fileSplit.pop(0)
csvOutput.append([fileNum,fileRev,fileTitle])
with open('output.csv', 'a') as writeCSV:
writer = csv.writer(writeCSV)
for row in csvOutput:
writer.writerow(row)
i += 1
writeCSV.close()
print("Writing complete")
The output I'm looking for is like so:
Number - Revision - Title
File1 - 01 - Title 1
File2 - 03 - Title 2 etc.
上面的代码是将列表和它的记录按','拆分的过程,这就是文件名在文件夹中的存储方式。
我认为下面代码的问题是 csvOutput 只将一个结果发送到 CSV,即字符串的最后一个结果。
然后在 csv 中打印文件夹中的文件总数,而不是拆分列表记录一,发送到 csv 重复记录二。
问题是当文件总数不恒定时,我想不出如何将这些信息存储为变量。
如有任何帮助,我们将不胜感激。
您应该在循环之前初始化 csvOutput = []
并在每次迭代时更新它 csvOutput.append([fileNum,fileRev,fileTitle])
这应该可以解决只存储最后一次迭代数据的问题。
我假设这是一个循环遍历你的数据 while i < totalList:
但你没有使用 i
计数器来提取正确的数据块,而是对相同的数据执行内部循环及以上。
如果你有非常量数据,你可以像在你的内部循环中一样迭代它,但这只是一个猜测,你需要提供准确的数据结构和你遇到的问题以获得更好的答案。
主要问题是嵌套的 while/for
循环。我对代码进行了一些重组,使其可以在本地进行测试(并且可以仅由 copy/pasting 运行)。这也应该让您了解如何构建代码以更轻松地寻求帮助。
我添加了很多评论来解释我所做的更改。
import csv
# This part has ben extracted from the main logic, to make the code runnable
# with sample data (see main() below)
def getFiles():
path = input("Paste in path for outgoing folder: ")
numTitleRev = os.listdir(path)
print("\nThe total amount of documents in this folder is: %s\n" % len(numTitleRev))
return numTitleRev
# This piece of logic contained the core error. The nested "while" loop was
# unnecessary. Additionally, the ".append" call wass on the wrong indent-level.
# Removing the unnecessary while-loop makes this much clearer
def process_files(filenames):
parsed = []
for item in filenames:
# Using "pop()" is a destructive operation (it modifies the list
# in-place which may leed to bugs). In this case it's absolutely fine,
# but I replaced it with a different syntax which in turn also makes
# the code a bit nicer to read.
fileNum, fileRev, fileTitle = item.split(',', 2)
parsed.append([fileNum,fileRev,fileTitle])
return parsed
# Similarly to "getFiles", I extracted this to make testing easier. Both
# "getFiles" and "write_output" are functions with "side-effects" which rely on
# external resources (the disk in this case). Extracting the main logic into
# "process_files" makes that function easily testable without the need of
# having the files really exist on the disk.
def write_output(parsed_data):
with open('output.csv', 'a') as writeCSV:
writer = csv.writer(writeCSV)
for row in parsed_data:
writer.writerow(row)
print("Writing complete")
# This is just a simple main function to illustrate how the new functions are
# called.
def main():
filenames = [ # <-- Some example data to make the SO answer runnable
'0,1,this is an example.txt',
'1,4,this is an example.txt',
'2,200,this is an example, with a comma in the name.txt',
'3,1,this is an example.txt',
]
# filenames = getFiles() <-- This needs to be enabled for the real code
converted = process_files(filenames)
write_output(converted)
# This special block prevents "import side-effects" when this Python file would
# be imported somewhere else.
if __name__ == '__main__':
main()
我正在尝试制作一个传输应用程序,它将获取文件名并将其转换为已发布文档的 csv 记录。目前,python 调用给定文件夹中的所有文件名,创建一个列表,并将文件名拆分为文档编号、修订版和标题。
目前我已经能够 python 抓取文件名,创建此信息的列表,然后拆分它们以创建新的单独数据列表,即 documentnumber ,revision,title.pdf 到 [文献编号、修订、标题].
def getFiles():
i = 0
path = input("Paste in path for outgoing folder: ")
numTitleRev = os.listdir(path)
issueRec = []
fileData = []
totalList = len(numTitleRev)
listNumber = str(totalList)
print('\n' + "The total amount of documents in this folder is: " + listNumber + '\n')
csvOutput = []
while i < totalList:
for item in numTitleRev:
fileSplit = item.split(',', 2)
fileTitle = fileSplit.pop(2)
fileRev = fileSplit.pop(1)
fileNum = fileSplit.pop(0)
csvOutput.append([fileNum,fileRev,fileTitle])
with open('output.csv', 'a') as writeCSV:
writer = csv.writer(writeCSV)
for row in csvOutput:
writer.writerow(row)
i += 1
writeCSV.close()
print("Writing complete")
The output I'm looking for is like so:
Number - Revision - Title
File1 - 01 - Title 1
File2 - 03 - Title 2 etc.
上面的代码是将列表和它的记录按','拆分的过程,这就是文件名在文件夹中的存储方式。
我认为下面代码的问题是 csvOutput 只将一个结果发送到 CSV,即字符串的最后一个结果。
然后在 csv 中打印文件夹中的文件总数,而不是拆分列表记录一,发送到 csv 重复记录二。
问题是当文件总数不恒定时,我想不出如何将这些信息存储为变量。
如有任何帮助,我们将不胜感激。
您应该在循环之前初始化 csvOutput = []
并在每次迭代时更新它 csvOutput.append([fileNum,fileRev,fileTitle])
这应该可以解决只存储最后一次迭代数据的问题。
我假设这是一个循环遍历你的数据 while i < totalList:
但你没有使用 i
计数器来提取正确的数据块,而是对相同的数据执行内部循环及以上。
如果你有非常量数据,你可以像在你的内部循环中一样迭代它,但这只是一个猜测,你需要提供准确的数据结构和你遇到的问题以获得更好的答案。
主要问题是嵌套的 while/for
循环。我对代码进行了一些重组,使其可以在本地进行测试(并且可以仅由 copy/pasting 运行)。这也应该让您了解如何构建代码以更轻松地寻求帮助。
我添加了很多评论来解释我所做的更改。
import csv
# This part has ben extracted from the main logic, to make the code runnable
# with sample data (see main() below)
def getFiles():
path = input("Paste in path for outgoing folder: ")
numTitleRev = os.listdir(path)
print("\nThe total amount of documents in this folder is: %s\n" % len(numTitleRev))
return numTitleRev
# This piece of logic contained the core error. The nested "while" loop was
# unnecessary. Additionally, the ".append" call wass on the wrong indent-level.
# Removing the unnecessary while-loop makes this much clearer
def process_files(filenames):
parsed = []
for item in filenames:
# Using "pop()" is a destructive operation (it modifies the list
# in-place which may leed to bugs). In this case it's absolutely fine,
# but I replaced it with a different syntax which in turn also makes
# the code a bit nicer to read.
fileNum, fileRev, fileTitle = item.split(',', 2)
parsed.append([fileNum,fileRev,fileTitle])
return parsed
# Similarly to "getFiles", I extracted this to make testing easier. Both
# "getFiles" and "write_output" are functions with "side-effects" which rely on
# external resources (the disk in this case). Extracting the main logic into
# "process_files" makes that function easily testable without the need of
# having the files really exist on the disk.
def write_output(parsed_data):
with open('output.csv', 'a') as writeCSV:
writer = csv.writer(writeCSV)
for row in parsed_data:
writer.writerow(row)
print("Writing complete")
# This is just a simple main function to illustrate how the new functions are
# called.
def main():
filenames = [ # <-- Some example data to make the SO answer runnable
'0,1,this is an example.txt',
'1,4,this is an example.txt',
'2,200,this is an example, with a comma in the name.txt',
'3,1,this is an example.txt',
]
# filenames = getFiles() <-- This needs to be enabled for the real code
converted = process_files(filenames)
write_output(converted)
# This special block prevents "import side-effects" when this Python file would
# be imported somewhere else.
if __name__ == '__main__':
main()