当最后几行在 python 中不相等时从文件中读取特定列

Question

我在读取 python 中的文本文件时遇到问题。基本上我需要的是获得列表中的第 4 列。

有了这个小功能，我实现了它，没有任何大问题：

def load_file(filename):


    f = open(filename, 'r')

   # skip the first useless row
   line = list(f.readlines()[1:])

   total_sp = []


    for i in line:
        t = i.strip().split()
        total_sp.append(int(t[4]))

    return total_sp

但现在我必须管理文件，在最后一行中有任何不符合文本格式的随机数。 not 工作文本文件的示例是：

#generated file
well10_1         3        18         6         1         2  -0.01158   0.01842       142
well5_1         1        14         6         1         2  0.009474   0.01842       141
well4_1         1        13         4         1         2  -0.01842  -0.03737       125
well7_1         3        10         1         1         2 -0.002632  0.009005       101
well3_1         1        10         9         1         2  -0.03579  -0.06368       157
well8_1         3        10        10         1         2  -0.06895   -0.1021       158
well9_1         3        10        18         1         2   0.03053   0.02158       176
well2_1         1         4         4         1         2  -0.03737  -0.03737       128
well6_1         3         4         5         1         2  -0.07053   -0.1421       127
well1_1        -2         3         1         1         2  0.006663  -0.02415       128
         1    0.9259
         2   0.07407

其中 1 0.9259 和 2 0.07407 必须转储。

事实上，在这个文本文件中使用上面的函数，由于最后多了 2 行，我得到了以下错误：

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/tmp/tmpqi8Ktw.py", line 21, in load_obh
    total_sp.append(int(t[4]))
IndexError: list index out of range

如何删除 line 变量中的最后几行？

感谢大家

Answer 1

f.readlines 已经 returns 一个列表。正如您提供起始索引以从中切片一样，您可以使用负索引指定“2 before the end”，如下所示：

line = f.readlines()[1:-2]

应该可以解决问题。

编辑：要在末尾处理任意数量的行：

def load_file(filename):
    f = open(filename, 'r')

    # skip the first useless row
    line = f.readlines()[1:]

    total_sp = []
    for i in line:
        t = i.strip().split()
        # check if enough columns were found
        if len(t) >= 5:
            total_sp.append(int(t[4]))

    return total_sp

Answer 2

有很多方法可以解决这个问题，其中一种方法是用 try and except 包围错误代码来处理 indexError，像这样：

try :
    total_sp.append(int(t[4]))
except IndexError : 
    pass

这只会在索引退出时附加到 total_sp，否则不会。此外，只要您没有与该特定索引对应的数据，这就会处理。

或者，如果您只想删除最后两行（元素），您可以使用 slice operator，例如将 line = list(f.readlines()[1:]) 替换为 line = f.readlines()[1:-2]。

Answer 3

还有一个"your case-specific"解决办法：

for i in line:
    if not i.startswith(' '):
        t = i.strip().split()
        total_sp.append(int(t[4]))

当最后几行在 python 中不相等时从文件中读取特定列

Reading specific column from file when last few rows are not equivalent in python

python

split

readfile