Python 计算目录中所有文件行数的脚本
Python script to count num lines in all files in directory
所以我是 python 的新手,我正在尝试编写一个脚本来遍历目录中的所有 .txt 文件,计算每个文件中的行数(除了那些行为空白或注释掉),并将最终输出写入 csv。最终输出应如下所示:
agprices, avi, adp
132, 5, 8
我在使用将每个计数保存为字典值的语法时遇到问题。下面是我的代码:
#!/usr/bin/env python
import csv
import copy
import os
import sys
#get current working dir, set count, and select file delimiter
d = os.getcwd()
count = 0
ext = '.txt'
#parses through files and saves to a dict
series_dict = {}
txt_files = [i for i in os.listdir(d) if os.path.splitext(i)[1] == ext]
#selects all files with .txt extension
for f in txt_files:
with open(os.path.join(d,f)) as file_obj:
series_dict[f] = file_obj.read()
if line.strip(): #Exclude blank lines
continue
else if line.startswith("#"): #Exclude commented lines
continue
else
count +=1
#Need to save count as val in dict here
#save the dictionary with key/val pairs to a csv
with open('seriescount.csv', 'wb') as f:
w = csv.DictWriter(f, series_dict.keys())
w.writeheader()
w.writerow(series_dict)
编辑如下:
#!/usr/bin/env python
import csv
import copy
import os
import sys
import glob
#get current working dir, set count, and select file delimiter
os.chdir('/Users/Briana/Documents/Misc./PythonTest')
#parses through files and saves to a dict
series = {}
for fn in glob.glob('*.txt'):
with open(fn) as f:
series[fn] = (1 for line in f if line.strip() and not line.startswith('#'))
print series
#save the dictionary with key/val pairs to a csv
with open('seriescount.csv', 'wb') as f:
w = csv.DictWriter(f, series.keys())
sum(names.values())
我在倒数第二行遇到缩进错误,我不太清楚为什么?另外,我不确定我在最后一部分是否正确地编写了语法。同样,我只是想 return 一个包含文件名和文件行数的字典,例如 {a: 132, b:245, c:13}
我认为您应该对脚本进行两处更改:
- 使用
glob.glob()
获取与所需后缀匹配的文件列表
- 使用
for line in file_obj
遍历行
其他问题:
- 最后几行的缩进有误
您可以使用这个 1-liner 来计算文件中的行数:
line_nums = sum(1 for line in open(f) if line.strip() and line[0] != '#')
这会将您的代码段缩短为
for f in txt_files:
count += sum(1 for line in open(os.path.join(d,f))
if line[0] != '#' and line.strip())
您可以尝试以下方法:
os.chdir(ur_directory)
names={}
for fn in glob.glob('*.txt'):
with open(fn) as f:
names[fn]=sum(1 for line in f if line.strip() and not line.startswith('#'))
print names
这将打印类似于以下内容的字典:
{'test_text.txt': 20, 'f1.txt': 3, 'lines.txt': 101, 'foo.txt': 6, 'dat.txt': 6, 'hello.txt': 1, 'f2.txt': 4, 'neglob.txt': 8, 'bar.txt': 6, 'test_reg.txt': 6, 'mission_sp.txt': 71, 'test_nums.txt': 8, 'test.txt': 7, '2591.txt': 8303}
您可以在 csv.DictWriter
中使用 Python 字典。
如果您想要这些的总和,只需执行以下操作:
sum(names.values())
我看你想用字典来记录计数。你可以像这样创建一个顶部 counts = {}
然后(修复测试后)您可以为每个非注释行更新它:
series_dict = {}
txt_files = [i for i in os.listdir(d) if os.path.splitext(i)[1] == ext]
#selects all files with .txt extension
for f in txt_files:
counts[f] = 0 # create an entry in the dictionary to keep track of one file's lines
with open(os.path.join(d,f)) as file_obj:
series_dict[f] = file_obj.read()
if line.startswith("#"): #Exclude commented lines
continue
elif line.strip(): #Exclude blank lines
counts(f) += 1
所以我是 python 的新手,我正在尝试编写一个脚本来遍历目录中的所有 .txt 文件,计算每个文件中的行数(除了那些行为空白或注释掉),并将最终输出写入 csv。最终输出应如下所示:
agprices, avi, adp
132, 5, 8
我在使用将每个计数保存为字典值的语法时遇到问题。下面是我的代码:
#!/usr/bin/env python
import csv
import copy
import os
import sys
#get current working dir, set count, and select file delimiter
d = os.getcwd()
count = 0
ext = '.txt'
#parses through files and saves to a dict
series_dict = {}
txt_files = [i for i in os.listdir(d) if os.path.splitext(i)[1] == ext]
#selects all files with .txt extension
for f in txt_files:
with open(os.path.join(d,f)) as file_obj:
series_dict[f] = file_obj.read()
if line.strip(): #Exclude blank lines
continue
else if line.startswith("#"): #Exclude commented lines
continue
else
count +=1
#Need to save count as val in dict here
#save the dictionary with key/val pairs to a csv
with open('seriescount.csv', 'wb') as f:
w = csv.DictWriter(f, series_dict.keys())
w.writeheader()
w.writerow(series_dict)
编辑如下:
#!/usr/bin/env python
import csv
import copy
import os
import sys
import glob
#get current working dir, set count, and select file delimiter
os.chdir('/Users/Briana/Documents/Misc./PythonTest')
#parses through files and saves to a dict
series = {}
for fn in glob.glob('*.txt'):
with open(fn) as f:
series[fn] = (1 for line in f if line.strip() and not line.startswith('#'))
print series
#save the dictionary with key/val pairs to a csv
with open('seriescount.csv', 'wb') as f:
w = csv.DictWriter(f, series.keys())
sum(names.values())
我在倒数第二行遇到缩进错误,我不太清楚为什么?另外,我不确定我在最后一部分是否正确地编写了语法。同样,我只是想 return 一个包含文件名和文件行数的字典,例如 {a: 132, b:245, c:13}
我认为您应该对脚本进行两处更改:
- 使用
glob.glob()
获取与所需后缀匹配的文件列表 - 使用
for line in file_obj
遍历行
其他问题:
- 最后几行的缩进有误
您可以使用这个 1-liner 来计算文件中的行数:
line_nums = sum(1 for line in open(f) if line.strip() and line[0] != '#')
这会将您的代码段缩短为
for f in txt_files:
count += sum(1 for line in open(os.path.join(d,f))
if line[0] != '#' and line.strip())
您可以尝试以下方法:
os.chdir(ur_directory)
names={}
for fn in glob.glob('*.txt'):
with open(fn) as f:
names[fn]=sum(1 for line in f if line.strip() and not line.startswith('#'))
print names
这将打印类似于以下内容的字典:
{'test_text.txt': 20, 'f1.txt': 3, 'lines.txt': 101, 'foo.txt': 6, 'dat.txt': 6, 'hello.txt': 1, 'f2.txt': 4, 'neglob.txt': 8, 'bar.txt': 6, 'test_reg.txt': 6, 'mission_sp.txt': 71, 'test_nums.txt': 8, 'test.txt': 7, '2591.txt': 8303}
您可以在 csv.DictWriter
中使用 Python 字典。
如果您想要这些的总和,只需执行以下操作:
sum(names.values())
我看你想用字典来记录计数。你可以像这样创建一个顶部 counts = {}
然后(修复测试后)您可以为每个非注释行更新它:
series_dict = {}
txt_files = [i for i in os.listdir(d) if os.path.splitext(i)[1] == ext]
#selects all files with .txt extension
for f in txt_files:
counts[f] = 0 # create an entry in the dictionary to keep track of one file's lines
with open(os.path.join(d,f)) as file_obj:
series_dict[f] = file_obj.read()
if line.startswith("#"): #Exclude commented lines
continue
elif line.strip(): #Exclude blank lines
counts(f) += 1