Python 计算目录中所有文件行数的脚本

Question

所以我是 python 的新手，我正在尝试编写一个脚本来遍历目录中的所有 .txt 文件，计算每个文件中的行数（除了那些行为空白或注释掉），并将最终输出写入 csv。最终输出应如下所示：

agprices, avi, adp
132, 5, 8

我在使用将每个计数保存为字典值的语法时遇到问题。下面是我的代码：

#!/usr/bin/env python

import csv
import copy
import os
import sys

#get current working dir, set count, and select file delimiter
d = os.getcwd()
count = 0
ext = '.txt'

#parses through files and saves to a dict
series_dict = {}
txt_files = [i for i in os.listdir(d) if os.path.splitext(i)[1] == ext] 
 #selects all files with .txt extension
for f in txt_files:
    with open(os.path.join(d,f)) as file_obj:
        series_dict[f] = file_obj.read()

            if line.strip():                #Exclude blank lines
                continue
            else if line.startswith("#"):   #Exclude commented lines
                continue
            else
                count +=1
                #Need to save count as val in dict here

#save the dictionary with key/val pairs to a csv
with open('seriescount.csv', 'wb') as f: 
w = csv.DictWriter(f, series_dict.keys())
w.writeheader()
w.writerow(series_dict)

编辑如下：

#!/usr/bin/env python

import csv
import copy
import os
import sys
import glob

#get current working dir, set count, and select file delimiter
os.chdir('/Users/Briana/Documents/Misc./PythonTest')

#parses through files and saves to a dict
series = {}
for fn in glob.glob('*.txt'):
    with open(fn) as f:
        series[fn] = (1 for line in f if line.strip() and not line.startswith('#')) 

print series

#save the dictionary with key/val pairs to a csv
with open('seriescount.csv', 'wb') as f: 
    w = csv.DictWriter(f, series.keys())
    sum(names.values())

我在倒数第二行遇到缩进错误，我不太清楚为什么？另外，我不确定我在最后一部分是否正确地编写了语法。同样，我只是想 return 一个包含文件名和文件行数的字典，例如 {a: 132, b:245, c:13}

Answer 1

我认为您应该对脚本进行两处更改：

使用 glob.glob() 获取与所需后缀匹配的文件列表
使用for line in file_obj遍历行

其他问题：

最后几行的缩进有误

Answer 2

您可以使用这个 1-liner 来计算文件中的行数：

line_nums = sum(1 for line in open(f) if line.strip() and line[0] != '#')

这会将您的代码段缩短为

for f in txt_files:
    count += sum(1 for line in open(os.path.join(d,f)) 
                 if line[0] != '#' and line.strip())

Answer 3

您可以尝试以下方法：

os.chdir(ur_directory)
names={}
for fn in glob.glob('*.txt'):
    with open(fn) as f:
        names[fn]=sum(1 for line in f if line.strip() and not line.startswith('#'))    

print names

这将打印类似于以下内容的字典：

{'test_text.txt': 20, 'f1.txt': 3, 'lines.txt': 101, 'foo.txt': 6, 'dat.txt': 6, 'hello.txt': 1, 'f2.txt': 4, 'neglob.txt': 8, 'bar.txt': 6, 'test_reg.txt': 6, 'mission_sp.txt': 71, 'test_nums.txt': 8, 'test.txt': 7, '2591.txt': 8303}

您可以在 csv.DictWriter 中使用 Python 字典。

如果您想要这些的总和，只需执行以下操作：

sum(names.values())

Answer 4

我看你想用字典来记录计数。你可以像这样创建一个顶部 counts = {}

然后（修复测试后）您可以为每个非注释行更新它：

series_dict = {}
txt_files = [i for i in os.listdir(d) if os.path.splitext(i)[1] == ext]
#selects all files with .txt extension
for f in txt_files:
    counts[f] = 0 # create an entry in the dictionary to keep track of one file's lines 
    with open(os.path.join(d,f)) as file_obj:
        series_dict[f] = file_obj.read()

        if line.startswith("#"):   #Exclude commented lines
            continue
        elif line.strip():                #Exclude blank lines
            counts(f) += 1

Python 计算目录中所有文件行数的脚本

Python script to count num lines in all files in directory

python

csv

dictionary

count