每隔第一行、第二行和第三行读取一个 txt 文件,并保存在 3 个不同的列表中
Read a txt file every first, 2nd and 3rd line and save in 3 different lists
我有一个巨大的固定格式文本文件,如下所示:
first line
2nd line
3rd line
#Mi Mj
#Ni Nj Nk
#Pi Pj
#----------- The numeric values start here ------
M0 M1
N0 N1 N2
P0 P1
M1 M2
N1 N2 N3
P1 P2
M2 M3
N2 N2 N3
P2 P3
...
我需要跳过第 1 行到第 7 行,每隔 1、2、3 次读取第 7 行之后的文件,然后将其保存到三个不同的列表中。
我设法通过跳过其他 2 行来为我的代码中的每一行执行此操作,但不是为每一行第二行和第三行执行此操作。如何使用两次 next(fileobject) 不是一种有效的方法。那么谁能告诉我如何最好地处理大文件?
最后需要这样的结果:
list1 = [M0, M1, M1, M2, M2, M3]
list2= [N0,N1,N2,N1, N2,N3,N2,N2, N3]
list3= [P0, P1, P1, P2,P2, P3]
这是我的代码:
# Python 3
myfile = open('myfile.txt', 'r')
m,n,p = [], [], []
for line in myfile:
ll = line.strip() # string
if not ll.startswith("#"):
row = ll.split() # list
print(row)
try:
m.append(row[0]) # append first column of every third line
except IndexError:
print('There is not a standard line: ', line)
next(myfile)
next(myfile)
print(m)
myfile.close()
像这样的东西应该可以工作:
import itertools
from collections import defaultdict
result = defaultdict(list)
counter = 0
with open('myfile.txt') as fh:
for line in itertools.islice(fh, 7, None):
row = line.strip().split()
result[counter % 3].extend(row)
counter += 1
result
将是一个字典(默认字典),键为 0
、1
和 2
,包含与您的 m
、n
和 p
。当然你可以改变它,我这样做只是为了避免代码重复。
我假设这里的绊脚石是您正在读取循环命令中的行并努力将 m n 和 p 分开。如果是这种情况,那么我建议在开始循环之前从读取文件中获取总行数并使用范围函数 -7 /3 (这需要非常严格地遵守文本文件结构,但会允许您在每个循环实例中的三行之间切换。请参阅下面的代码。
myfile = open('myfile.txt', 'r')
m,n,p = [], [], []
# Obtain line count from text file and work out iterations needed for loop
myfile_length = len(myfile.readlines())
iterations_needed = ((myfile_lenght-7)/3)
my_file.seek(0)
for i in range(7): # Gets past those pesky first 7 lines
myfile.readlines()
for x in range(iterations_needed):
ll = my_file.readline() # string
row = ll.split() # list
m.append(row[0]) # append first column of every third line
m.append(row[1])
ll = my_file.readline() # string
row = ll.split() # list
n.append(row[0])
n.append(row[1])
n.append(row[2])
ll = my_file.readline() # string
row = ll.split() # list
p.append(row[0])
p.append(row[1])
print(m)
print(n)
print(p)
myfile.close()
希望这对您有所帮助。您将不得不重新添加错误检查,为了简单起见,我将其删除。此外,我还没有编译代码,因此可能存在语法错误。修复的核心就在那里。
请注意,您可以使用以下方法将读取行和拆分行剪切为一行:
read_file.readline().rstrip()
我认为,您的想法的基本缺陷是您想要阅读所有第一行数字,然后是第二行。最好通读一次文件,而不是在阅读文件时整理不同的行。一种方法是像另一个答案一样使用行计数器 mod 3 ,或者您可以继续阅读 for 循环中的行,如下所示:
myFileList= """first line
2nd line
3rd line
#Mi Mj
#Ni Nj Nk
#Pi Pj
#----------- The numeric values start here ------
M0 M1
N0 N1 N2
P0 P1
M1 M2
N1 N2 N3
P1 P2
M2 M3
N2 N2 N3
P2 P3"""
myFakeFile = iter(myFileList.split('\n'))
m,n,p = [], [], []
m2, n2, p2 = [], [], []
# Skip the first few lines
for skip_lines in range(7):
next(myFakeFile)
# Read the next line ...
for line in myFakeFile:
# ... and the following two lines
second_line = next(myFakeFile)
third_line = next(myFakeFile)
# Either append line directly,
m.append(line)
n.append(second_line)
p.append(third_line)
# or split the lines before appending
m2.extend(line.split())
n2.extend(second_line.split())
p2.extend(third_line.split())
print('m = {}'.format(m))
print('n = {}'.format(n))
print('p = {}\n'.format(p))
print('m2 = {}'.format(m2))
print('n2 = {}'.format(n2))
print('p2 = {}\n'.format(p2))
我有点不确定你是否想再次拆分实际的行,因此我添加了 m2, n2, p2
列表来显示这一点。而且我不得不伪造文件以获得 运行 示例。但是此代码会产生以下输出:
m = ['M0 M1', 'M1 M2', 'M2 M3']
n = ['N0 N1 N2', 'N1 N2 N3', 'N2 N2 N3']
p = ['P0 P1', 'P1 P2', 'P2 P3']
m2 = ['M0', 'M1', 'M1', 'M2', 'M2', 'M3']
n2 = ['N0', 'N1', 'N2', 'N1', 'N2', 'N3', 'N2', 'N2', 'N3']
p2 = ['P0', 'P1', 'P1', 'P2', 'P2', 'P3']
您需要将文件分成 3 行一组:
# based on 'grouper()' example from the python 2 itertools documentation
from itertools import izip
def partition(lines, n):
iters = [iter(lines)] * n
return izip(*iters)
这样 list(partition("ABCDEFGHI", 3))
会得到:
["ABC", "DEF", "GHI"]
然后,简单地分解并重新压缩结果:
partitions = partition("ABCDEFGHI", 3)
splits = zip(*partitions)
所以你的代码最终会看起来像这样:
from itertools import izip, islice
def partition(lines, n):
iters = [iter(lines)] * n
return izip(*iters)
with open("myfile.txt") as f:
keep = islice(f, 7, None) # drop the first 7 lines
parts = partition(keep, 3) # partition into groups of 3
groups = izip(*parts) # group the lines by their index % 3
M, N, P = [sum((g.split() for g in group), []) for group in groups]
为简单起见,我省略了错误 checking/handling。
参考资料:
https://docs.python.org/2/library/itertools.html?highlight=itertools#recipes
fh = open('in.txt', 'r')
list1, list2, list3 = [], [], []
def get_list(num):
return {
0: list3,
1: list1,
2: list2,
}[num]
count = 1
for i, line in enumerate(fh, 1):
if (i < 7):
continue
get_list(count % 3).extend(line.rstrip().split())
count += 1
print("{}\n{}\n{}".format(list1, list2, list3))
我有一个巨大的固定格式文本文件,如下所示:
first line
2nd line
3rd line
#Mi Mj
#Ni Nj Nk
#Pi Pj
#----------- The numeric values start here ------
M0 M1
N0 N1 N2
P0 P1
M1 M2
N1 N2 N3
P1 P2
M2 M3
N2 N2 N3
P2 P3
...
我需要跳过第 1 行到第 7 行,每隔 1、2、3 次读取第 7 行之后的文件,然后将其保存到三个不同的列表中。
我设法通过跳过其他 2 行来为我的代码中的每一行执行此操作,但不是为每一行第二行和第三行执行此操作。如何使用两次 next(fileobject) 不是一种有效的方法。那么谁能告诉我如何最好地处理大文件? 最后需要这样的结果:
list1 = [M0, M1, M1, M2, M2, M3]
list2= [N0,N1,N2,N1, N2,N3,N2,N2, N3]
list3= [P0, P1, P1, P2,P2, P3]
这是我的代码:
# Python 3
myfile = open('myfile.txt', 'r')
m,n,p = [], [], []
for line in myfile:
ll = line.strip() # string
if not ll.startswith("#"):
row = ll.split() # list
print(row)
try:
m.append(row[0]) # append first column of every third line
except IndexError:
print('There is not a standard line: ', line)
next(myfile)
next(myfile)
print(m)
myfile.close()
像这样的东西应该可以工作:
import itertools
from collections import defaultdict
result = defaultdict(list)
counter = 0
with open('myfile.txt') as fh:
for line in itertools.islice(fh, 7, None):
row = line.strip().split()
result[counter % 3].extend(row)
counter += 1
result
将是一个字典(默认字典),键为 0
、1
和 2
,包含与您的 m
、n
和 p
。当然你可以改变它,我这样做只是为了避免代码重复。
我假设这里的绊脚石是您正在读取循环命令中的行并努力将 m n 和 p 分开。如果是这种情况,那么我建议在开始循环之前从读取文件中获取总行数并使用范围函数 -7 /3 (这需要非常严格地遵守文本文件结构,但会允许您在每个循环实例中的三行之间切换。请参阅下面的代码。
myfile = open('myfile.txt', 'r')
m,n,p = [], [], []
# Obtain line count from text file and work out iterations needed for loop
myfile_length = len(myfile.readlines())
iterations_needed = ((myfile_lenght-7)/3)
my_file.seek(0)
for i in range(7): # Gets past those pesky first 7 lines
myfile.readlines()
for x in range(iterations_needed):
ll = my_file.readline() # string
row = ll.split() # list
m.append(row[0]) # append first column of every third line
m.append(row[1])
ll = my_file.readline() # string
row = ll.split() # list
n.append(row[0])
n.append(row[1])
n.append(row[2])
ll = my_file.readline() # string
row = ll.split() # list
p.append(row[0])
p.append(row[1])
print(m)
print(n)
print(p)
myfile.close()
希望这对您有所帮助。您将不得不重新添加错误检查,为了简单起见,我将其删除。此外,我还没有编译代码,因此可能存在语法错误。修复的核心就在那里。
请注意,您可以使用以下方法将读取行和拆分行剪切为一行:
read_file.readline().rstrip()
我认为,您的想法的基本缺陷是您想要阅读所有第一行数字,然后是第二行。最好通读一次文件,而不是在阅读文件时整理不同的行。一种方法是像另一个答案一样使用行计数器 mod 3 ,或者您可以继续阅读 for 循环中的行,如下所示:
myFileList= """first line
2nd line
3rd line
#Mi Mj
#Ni Nj Nk
#Pi Pj
#----------- The numeric values start here ------
M0 M1
N0 N1 N2
P0 P1
M1 M2
N1 N2 N3
P1 P2
M2 M3
N2 N2 N3
P2 P3"""
myFakeFile = iter(myFileList.split('\n'))
m,n,p = [], [], []
m2, n2, p2 = [], [], []
# Skip the first few lines
for skip_lines in range(7):
next(myFakeFile)
# Read the next line ...
for line in myFakeFile:
# ... and the following two lines
second_line = next(myFakeFile)
third_line = next(myFakeFile)
# Either append line directly,
m.append(line)
n.append(second_line)
p.append(third_line)
# or split the lines before appending
m2.extend(line.split())
n2.extend(second_line.split())
p2.extend(third_line.split())
print('m = {}'.format(m))
print('n = {}'.format(n))
print('p = {}\n'.format(p))
print('m2 = {}'.format(m2))
print('n2 = {}'.format(n2))
print('p2 = {}\n'.format(p2))
我有点不确定你是否想再次拆分实际的行,因此我添加了 m2, n2, p2
列表来显示这一点。而且我不得不伪造文件以获得 运行 示例。但是此代码会产生以下输出:
m = ['M0 M1', 'M1 M2', 'M2 M3']
n = ['N0 N1 N2', 'N1 N2 N3', 'N2 N2 N3']
p = ['P0 P1', 'P1 P2', 'P2 P3']
m2 = ['M0', 'M1', 'M1', 'M2', 'M2', 'M3']
n2 = ['N0', 'N1', 'N2', 'N1', 'N2', 'N3', 'N2', 'N2', 'N3']
p2 = ['P0', 'P1', 'P1', 'P2', 'P2', 'P3']
您需要将文件分成 3 行一组:
# based on 'grouper()' example from the python 2 itertools documentation
from itertools import izip
def partition(lines, n):
iters = [iter(lines)] * n
return izip(*iters)
这样 list(partition("ABCDEFGHI", 3))
会得到:
["ABC", "DEF", "GHI"]
然后,简单地分解并重新压缩结果:
partitions = partition("ABCDEFGHI", 3)
splits = zip(*partitions)
所以你的代码最终会看起来像这样:
from itertools import izip, islice
def partition(lines, n):
iters = [iter(lines)] * n
return izip(*iters)
with open("myfile.txt") as f:
keep = islice(f, 7, None) # drop the first 7 lines
parts = partition(keep, 3) # partition into groups of 3
groups = izip(*parts) # group the lines by their index % 3
M, N, P = [sum((g.split() for g in group), []) for group in groups]
为简单起见,我省略了错误 checking/handling。
参考资料: https://docs.python.org/2/library/itertools.html?highlight=itertools#recipes
fh = open('in.txt', 'r')
list1, list2, list3 = [], [], []
def get_list(num):
return {
0: list3,
1: list1,
2: list2,
}[num]
count = 1
for i, line in enumerate(fh, 1):
if (i < 7):
continue
get_list(count % 3).extend(line.rstrip().split())
count += 1
print("{}\n{}\n{}".format(list1, list2, list3))