Python:查找并替换文件中的数字序列
Python: find and replace sequence of numbers in a file
我想用其他序列号替换文件中的数字序列。例如我想要代码找到:
5723
5724
5725
.
.
在文件中并将其替换为
1
2
3
.
.
文件格式如下:
5723 1 4 0.0530 40.8469574826 23.6497161096 71.2721134368 # hc
5724 1 4 0.0530 41.2184192051 22.0657965663 70.7655969235 # hc
5725 1 4 0.0530 40.1209834536 22.2320441560 72.1100610464 # hc
5726 1 2 0.0390 38.2072673529 21.5636299564 70.4226801302 # ni
5727 1 3 0.0080 39.1491515464 22.7414447024 70.1836001683 # c1
5728 1 4 0.0530 38.6092690356 23.6286807105 70.4379331882 # hc
5729 1 5 -0.1060 39.4744610200 22.9631667398 68.7099315672 # c
5730 1 4 0.0530 39.7733681662 22.0164196098 68.2561710623 # hc
5731 1 4 0.0530 40.3997078786 23.5957910115 68.6602988667 # hc
5732 1 6 -0.1768 37.4127695738 20.7445960448 69.5033013922 # c5
5733 1 7 0.1268 37.5907142 20.8480311755 68.4090824525 # h
我已经编写了这段 cod 来执行此操作,但它只是替换了第一个代码,我该如何更正这段代码?
import os
import sys
import fileinput
masir = os.curdir + '\test\'
input = open('poly-IL9.data', 'r')
output = open('out.data', 'w')
range1 = range(5722,13193)
range2 = range(1,7472)
for i in range(len(x1)):
for j in range(len(y1)):
x = str(range1[i])
y = str(range2[j])
clean = input.read().replace(x,y)
output.write(clean)
首先用with
语句打开你的文件。而不是打开文件而不关闭。
The with statement is used to wrap the execution of a block with methods defined by a context manager.
Read more about the with
statement and its usage advantage.
这里您只需要遍历文件并拆分行并将第一个元素替换为行数:
with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out :
for i,line in enumerate(inp,1):
out.write(' '.join([str(i)]+line.split()[1:])+'\n')
您可以使用 enumerate
遍历您的文件对象以保留索引。
此外,您还可以使用 csv
模块打开文件以拒绝拆分行。
import csv
with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out:
spamreader = csv.reader(csvfile, delimiter=' ')
for i,row in enumerate(spamreader):
out.write(' '.join([str(i)]+line[1:])+'\n')
请注意,如果您的文件与其他空格分隔或混合使用,您可以使用 re.split()
函数使用正则表达式拆分文件:
import re
with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out :
for i,line in enumerate(inp,1):
out.write(' '.join([str(i)]+re.split(r'\s+',line)[1:]+'\n')
clean = input.read().replace(x,y
)中的read()
方法是一次读取整个文件,所以只进行一次替换是有道理的。尝试 readline()
或首选 for line in file:
逐行处理文件。
如果你想处理数据,你想考虑使用 Pandas 库
并且,在 pandas
使用pd.read_csv
读取csv文件
In [4]: df = pd.read_csv('temp.csv')
In [5]: df
Out[5]:
b c d e f g
5723 1 4 0.0530 40.846957 23.649716 71.272113
5724 1 4 0.0530 41.218419 22.065797 70.765597
5725 1 4 0.0530 40.120983 22.232044 72.110061
5726 1 2 0.0390 38.207267 21.563630 70.422680
5727 1 3 0.0080 39.149152 22.741445 70.183600
5728 1 4 0.0530 38.609269 23.628681 70.437933
5729 1 5 -0.1060 39.474461 22.963167 68.709932
5730 1 4 0.0530 39.773368 22.016420 68.256171
5731 1 4 0.0530 40.399708 23.595791 68.660299
5732 1 6 -0.1768 37.412770 20.744596 69.503301
5733 1 7 0.1268 37.590714 20.848031 68.409082
使用reset_index(drop=True)
重置索引顺序。这里索引从0
开始
In [6]: df.reset_index(drop=True)
Out[6]:
b c d e f g
0 1 4 0.0530 40.846957 23.649716 71.272113
1 1 4 0.0530 41.218419 22.065797 70.765597
2 1 4 0.0530 40.120983 22.232044 72.110061
3 1 2 0.0390 38.207267 21.563630 70.422680
4 1 3 0.0080 39.149152 22.741445 70.183600
5 1 4 0.0530 38.609269 23.628681 70.437933
6 1 5 -0.1060 39.474461 22.963167 68.709932
7 1 4 0.0530 39.773368 22.016420 68.256171
8 1 4 0.0530 40.399708 23.595791 68.660299
9 1 6 -0.1768 37.412770 20.744596 69.503301
10 1 7 0.1268 37.590714 20.848031 68.409082
您也可以从 1
开始构建您的唯一索引,例如
In [7]: df.set_index(np.arange(1, len(df)+1))
Out[7]:
b c d e f g
1 1 4 0.0530 40.846957 23.649716 71.272113
2 1 4 0.0530 41.218419 22.065797 70.765597
3 1 4 0.0530 40.120983 22.232044 72.110061
4 1 2 0.0390 38.207267 21.563630 70.422680
5 1 3 0.0080 39.149152 22.741445 70.183600
6 1 4 0.0530 38.609269 23.628681 70.437933
7 1 5 -0.1060 39.474461 22.963167 68.709932
8 1 4 0.0530 39.773368 22.016420 68.256171
9 1 4 0.0530 40.399708 23.595791 68.660299
10 1 6 -0.1768 37.412770 20.744596 69.503301
11 1 7 0.1268 37.590714 20.848031 68.409082
注意:将有更简单的方法来修改文件。但是,如果您想处理、分析数据 - 使用 pandas 会让您的生活更轻松。
我想用其他序列号替换文件中的数字序列。例如我想要代码找到:
5723
5724
5725
.
.
在文件中并将其替换为
1
2
3
.
.
文件格式如下:
5723 1 4 0.0530 40.8469574826 23.6497161096 71.2721134368 # hc
5724 1 4 0.0530 41.2184192051 22.0657965663 70.7655969235 # hc
5725 1 4 0.0530 40.1209834536 22.2320441560 72.1100610464 # hc
5726 1 2 0.0390 38.2072673529 21.5636299564 70.4226801302 # ni
5727 1 3 0.0080 39.1491515464 22.7414447024 70.1836001683 # c1
5728 1 4 0.0530 38.6092690356 23.6286807105 70.4379331882 # hc
5729 1 5 -0.1060 39.4744610200 22.9631667398 68.7099315672 # c
5730 1 4 0.0530 39.7733681662 22.0164196098 68.2561710623 # hc
5731 1 4 0.0530 40.3997078786 23.5957910115 68.6602988667 # hc
5732 1 6 -0.1768 37.4127695738 20.7445960448 69.5033013922 # c5
5733 1 7 0.1268 37.5907142 20.8480311755 68.4090824525 # h
我已经编写了这段 cod 来执行此操作,但它只是替换了第一个代码,我该如何更正这段代码?
import os
import sys
import fileinput
masir = os.curdir + '\test\'
input = open('poly-IL9.data', 'r')
output = open('out.data', 'w')
range1 = range(5722,13193)
range2 = range(1,7472)
for i in range(len(x1)):
for j in range(len(y1)):
x = str(range1[i])
y = str(range2[j])
clean = input.read().replace(x,y)
output.write(clean)
首先用with
语句打开你的文件。而不是打开文件而不关闭。
The with statement is used to wrap the execution of a block with methods defined by a context manager.
Read more about the with
statement and its usage advantage.
这里您只需要遍历文件并拆分行并将第一个元素替换为行数:
with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out :
for i,line in enumerate(inp,1):
out.write(' '.join([str(i)]+line.split()[1:])+'\n')
您可以使用 enumerate
遍历您的文件对象以保留索引。
此外,您还可以使用 csv
模块打开文件以拒绝拆分行。
import csv
with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out:
spamreader = csv.reader(csvfile, delimiter=' ')
for i,row in enumerate(spamreader):
out.write(' '.join([str(i)]+line[1:])+'\n')
请注意,如果您的文件与其他空格分隔或混合使用,您可以使用 re.split()
函数使用正则表达式拆分文件:
import re
with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out :
for i,line in enumerate(inp,1):
out.write(' '.join([str(i)]+re.split(r'\s+',line)[1:]+'\n')
clean = input.read().replace(x,y
)中的read()
方法是一次读取整个文件,所以只进行一次替换是有道理的。尝试 readline()
或首选 for line in file:
逐行处理文件。
如果你想处理数据,你想考虑使用 Pandas 库
并且,在 pandas
使用pd.read_csv
In [4]: df = pd.read_csv('temp.csv')
In [5]: df
Out[5]:
b c d e f g
5723 1 4 0.0530 40.846957 23.649716 71.272113
5724 1 4 0.0530 41.218419 22.065797 70.765597
5725 1 4 0.0530 40.120983 22.232044 72.110061
5726 1 2 0.0390 38.207267 21.563630 70.422680
5727 1 3 0.0080 39.149152 22.741445 70.183600
5728 1 4 0.0530 38.609269 23.628681 70.437933
5729 1 5 -0.1060 39.474461 22.963167 68.709932
5730 1 4 0.0530 39.773368 22.016420 68.256171
5731 1 4 0.0530 40.399708 23.595791 68.660299
5732 1 6 -0.1768 37.412770 20.744596 69.503301
5733 1 7 0.1268 37.590714 20.848031 68.409082
使用reset_index(drop=True)
重置索引顺序。这里索引从0
In [6]: df.reset_index(drop=True)
Out[6]:
b c d e f g
0 1 4 0.0530 40.846957 23.649716 71.272113
1 1 4 0.0530 41.218419 22.065797 70.765597
2 1 4 0.0530 40.120983 22.232044 72.110061
3 1 2 0.0390 38.207267 21.563630 70.422680
4 1 3 0.0080 39.149152 22.741445 70.183600
5 1 4 0.0530 38.609269 23.628681 70.437933
6 1 5 -0.1060 39.474461 22.963167 68.709932
7 1 4 0.0530 39.773368 22.016420 68.256171
8 1 4 0.0530 40.399708 23.595791 68.660299
9 1 6 -0.1768 37.412770 20.744596 69.503301
10 1 7 0.1268 37.590714 20.848031 68.409082
您也可以从 1
开始构建您的唯一索引,例如
In [7]: df.set_index(np.arange(1, len(df)+1))
Out[7]:
b c d e f g
1 1 4 0.0530 40.846957 23.649716 71.272113
2 1 4 0.0530 41.218419 22.065797 70.765597
3 1 4 0.0530 40.120983 22.232044 72.110061
4 1 2 0.0390 38.207267 21.563630 70.422680
5 1 3 0.0080 39.149152 22.741445 70.183600
6 1 4 0.0530 38.609269 23.628681 70.437933
7 1 5 -0.1060 39.474461 22.963167 68.709932
8 1 4 0.0530 39.773368 22.016420 68.256171
9 1 4 0.0530 40.399708 23.595791 68.660299
10 1 6 -0.1768 37.412770 20.744596 69.503301
11 1 7 0.1268 37.590714 20.848031 68.409082
注意:将有更简单的方法来修改文件。但是,如果您想处理、分析数据 - 使用 pandas 会让您的生活更轻松。