Python:查找并替换文件中的数字序列

Python: find and replace sequence of numbers in a file

我想用其他序列号替换文件中的数字序列。例如我想要代码找到:

5723
5724
5725
.
.

在文件中并将其替换为

1
2
3
.
.

文件格式如下:

    5723    1   4  0.0530  40.8469574826  23.6497161096  71.2721134368  # hc
    5724    1   4  0.0530  41.2184192051  22.0657965663  70.7655969235  # hc
    5725    1   4  0.0530  40.1209834536  22.2320441560  72.1100610464  # hc
    5726    1   2  0.0390  38.2072673529  21.5636299564  70.4226801302  # ni
    5727    1   3  0.0080  39.1491515464  22.7414447024  70.1836001683  # c1
    5728    1   4  0.0530  38.6092690356  23.6286807105  70.4379331882  # hc
    5729    1   5 -0.1060  39.4744610200  22.9631667398  68.7099315672  # c
    5730    1   4  0.0530  39.7733681662  22.0164196098  68.2561710623  # hc
    5731    1   4  0.0530  40.3997078786  23.5957910115  68.6602988667  # hc
    5732    1   6 -0.1768  37.4127695738  20.7445960448  69.5033013922  # c5
    5733    1   7  0.1268  37.5907142     20.8480311755  68.4090824525  # h

我已经编写了这段 cod 来执行此操作,但它只是替换了第一个代码,我该如何更正这段代码?

import os
import sys
import fileinput

masir = os.curdir + '\test\'
input  = open('poly-IL9.data', 'r')
output = open('out.data', 'w')
range1 = range(5722,13193)
range2 = range(1,7472)


for i in range(len(x1)):
    for j in range(len(y1)):
        x = str(range1[i])
        y = str(range2[j])
        clean = input.read().replace(x,y)
        output.write(clean)

首先用with语句打开你的文件。而不是打开文件而不关闭。

The with statement is used to wrap the execution of a block with methods defined by a context manager.

Read more about the with statement and its usage advantage.

这里您只需要遍历文件并拆分行并将第一个元素替换为行数:

with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out :
    for i,line in enumerate(inp,1):
       out.write(' '.join([str(i)]+line.split()[1:])+'\n')

您可以使用 enumerate 遍历您的文件对象以保留索引。

此外,您还可以使用 csv 模块打开文件以拒绝拆分行。

import csv
with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out:
    spamreader = csv.reader(csvfile, delimiter=' ')
    for i,row in enumerate(spamreader):
        out.write(' '.join([str(i)]+line[1:])+'\n')

请注意,如果您的文件与其他空格分隔或混合使用,您可以使用 re.split() 函数使用正则表达式拆分文件:

import re
with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out :
    for i,line in enumerate(inp,1):
       out.write(' '.join([str(i)]+re.split(r'\s+',line)[1:]+'\n')

clean = input.read().replace(x,y)中的read()方法是一次读取整个文件,所以只进行一次替换是有道理的。尝试 readline() 或首选 for line in file: 逐行处理文件。

如果你想处理数据,你想考虑使用 Pandas 库

并且,在 pandas

使用pd.read_csv

读取csv文件
In [4]: df = pd.read_csv('temp.csv')

In [5]: df
Out[5]:
      b  c       d          e          f          g
5723  1  4  0.0530  40.846957  23.649716  71.272113
5724  1  4  0.0530  41.218419  22.065797  70.765597
5725  1  4  0.0530  40.120983  22.232044  72.110061
5726  1  2  0.0390  38.207267  21.563630  70.422680
5727  1  3  0.0080  39.149152  22.741445  70.183600
5728  1  4  0.0530  38.609269  23.628681  70.437933
5729  1  5 -0.1060  39.474461  22.963167  68.709932
5730  1  4  0.0530  39.773368  22.016420  68.256171
5731  1  4  0.0530  40.399708  23.595791  68.660299
5732  1  6 -0.1768  37.412770  20.744596  69.503301
5733  1  7  0.1268  37.590714  20.848031  68.409082

使用reset_index(drop=True)重置索引顺序。这里索引从0

开始
In [6]: df.reset_index(drop=True)
Out[6]:
    b  c       d          e          f          g
0   1  4  0.0530  40.846957  23.649716  71.272113
1   1  4  0.0530  41.218419  22.065797  70.765597
2   1  4  0.0530  40.120983  22.232044  72.110061
3   1  2  0.0390  38.207267  21.563630  70.422680
4   1  3  0.0080  39.149152  22.741445  70.183600
5   1  4  0.0530  38.609269  23.628681  70.437933
6   1  5 -0.1060  39.474461  22.963167  68.709932
7   1  4  0.0530  39.773368  22.016420  68.256171
8   1  4  0.0530  40.399708  23.595791  68.660299
9   1  6 -0.1768  37.412770  20.744596  69.503301
10  1  7  0.1268  37.590714  20.848031  68.409082

您也可以从 1 开始构建您的唯一索引,例如

In [7]: df.set_index(np.arange(1, len(df)+1))
Out[7]:
    b  c       d          e          f          g
1   1  4  0.0530  40.846957  23.649716  71.272113
2   1  4  0.0530  41.218419  22.065797  70.765597
3   1  4  0.0530  40.120983  22.232044  72.110061
4   1  2  0.0390  38.207267  21.563630  70.422680
5   1  3  0.0080  39.149152  22.741445  70.183600
6   1  4  0.0530  38.609269  23.628681  70.437933
7   1  5 -0.1060  39.474461  22.963167  68.709932
8   1  4  0.0530  39.773368  22.016420  68.256171
9   1  4  0.0530  40.399708  23.595791  68.660299
10  1  6 -0.1768  37.412770  20.744596  69.503301
11  1  7  0.1268  37.590714  20.848031  68.409082

注意:将有更简单的方法来修改文件。但是,如果您想处理、分析数据 - 使用 pandas 会让您的生活更轻松。