遍历文本文件中的表

Question

大家。

我会说这是我不知道从哪里开始的第一个任务：

Create a text file (using an editor, not necessarily Python) containing two tab- separated columns, with each column containing a number. Then use Python to read through the file you’ve created. For each line, multiply each first number by the second, and then sum the results from all the lines. Ignore any line that doesn’t contain two numeric columns.

到目前为止我写了几行，但我不确定下一步需要去哪里：

filename = 'path'

def sum_columns(filename):
    sum = 0
    multiply = 0
    with open (filename) as f:

我应该将我的文件分成 2 列并创建它们的列表，还是应该做其他事情？

提前致谢

Answer 1

根据练习文本，您几乎可以做很多事情。在我看来，最好的方法是做这样的事情：

filename = 'path'

def sum_columns(filename):
    sum = 0
    multiply = 0
    with open (filename) as f:
        all_lines = f.readlines()
    f.close()
    for line in all_lines:
        splitted = line.split("\t")
        sum += int(splitted[0]) * int(splitted[1])
    return sum

你会把文件的所有行都列在 all_lines 中，然后你可以遍历每一行并将它们从选项卡中拆分出来，然后将它们相乘并求和到 sum 变量你初始化为 0，最后你将 return。正如其他人所暗示的，您也可以逐行读取文件而不用将每一行都记住到列表中，但是如果文件相对较小，您可以选择我的选项。

Answer 2

如果您有这样的文件：

1   2
2   4
4   8

您可以执行以下操作：

from functools import reduce

def is_int(s):
    try: 
        int(s)
        return True
    except ValueError:
        return False

filename = 'path'

def sum_columns(filename):
    with open (filename) as f:
        lines = f.readlines()
    return sum([
        reduce(lambda x, y: x * y, map(int,line.split("\t")))
        for line in lines
        if len(list(filter(is_int, line.split("\t")))) == 2
    ])

解释：

在顶部我定义了一个辅助函数，它确定一个字符串是否可以转换为 int。稍后将使用它来忽略没有 2 个数字的行。它基于 this answer

def is_int(s):
    try: 
        int(s)
        return True
    except ValueError:
        return False

然后，我们打开文件，将所有行读入一个变量。这不是最有效的，因为它可以在不存储 while 文件的情况下逐行处理，但是，对于较小的文件，这是可以忽略不计的。

with open (filename) as f:
    lines = f.readlines()

接下来，是执行查询的单个操作，但让我们将其分解：

首先，我们遍历所有行：

for line in lines

接下来，我们只保留由制表符分隔的正好有两个数字的行：

if len(list(filter(is_int, line.split("\t")))) == 2

最后，我们将行中的每个数字变成int，并将它们相乘：

reduce(lambda x, y: x * y, map(int,line.split("\t")))

然后我们将所有这些相加 return 结果

性能考虑

如果性能是一个问题，您可以实现相同的目的，逐行读取内容，而不是将整个文件拉入一个变量。它不太优雅，但更高效：

def sum_columns(filename):
    total = 0
    with open (filename) as f:
        for line in f:
            if len(list(filter(is_int, line.split("\t")))) != 2:
                continue
            total += reduce((lambda x, y: x * y), map(int,line.split("\t")))
    return total

（请注意，您仍然需要上述示例中的导入和助手）

Answer 3

input.txt

script.py

with open('input.txt') as f:
  total = 0
  for line in f:
    numbers = line.read().split('\t')
    try:
      line_value = int(numbers[0]) * int(numbers[1])
    except IndexError as e:
      # the line doesn't contain two numbers
      continue
    except ValueError as e:
      # a value couldn't be converted to a number
      continue
    total += line_value

Answer 4

这是一个简短的解决方案：

def sum_columns(filename):
    counter = 0
    with open(filename) as file:
        for line in file:
            try:
                a, b = [int(x) for x in line.split('\t')]
                counter += a * b
            except ValueError:
                continue
    return counter


file_name = 'myfile.txt'
print(sum_columns(file_name))

这是很多人（@martineau 是第一个）建议在评论中使用的内容（这也是我刚刚学到的东西）所以我决定把它放在一个答案中。

基本上会发生什么，循环遍历每一行并为每一行创建一个包含两个整数的列表（列表理解就是为了这个，否则两个数字都是字符串，如果你尝试将它们相乘），然后也解压这两个值，这很好，因为那时你只需要一个 except 因为抛出的唯一合理错误是 ValueError （因为无法解压或字符不能转换为整数）然后将两个值相乘并添加到计数器，在循环结束时 return 计数器

遍历文本文件中的表

Iterating through tables in text file

python

iteration

loops

file

解释：

性能考虑