解析特定行之间的文本文件
Parse text file between specific lines
所以如果我有一个看起来像这样的文本文件,我想创建每个数据块的列表。
[Blocktype A]
thing
thing
thing
[Blocktype A]
thing
thing
thing
thing
thing
[Blocktype A]
thing
thing
[Blocktype B]
thing
thing
thing
基本上我希望我的代码能够做到这一点....
如果行 == '[Blocktype A]',将下一个 X 行(可以变化)附加到 'block/stanza' 列表,直到到达换行符。那时,将此 'block' 列表附加到整个列表,清空 'block' 列表,并对下一个 Blocktype A 节执行相同的操作,直到到达新行等......我想做与“[Blocktype B]”相同。
最后,我试图获得一个包含子列表作为元素的列表。换句话说,[Blocktype A]列表数据的列表,以及所有[Blocktype B]列表数据的列表
bigListA = [['Blocktype A', 'thing', 'thing', 'thing'], ['Blocktype A', 'thing', 'thing', 'thing', 'thing', 'thing'], 等等...]
bigListB = 同上
我不确定如何在像这样的特定行之间进行解析。有任何想法吗?非常感谢!
编辑* 这是我的代码。这个问题是,['B'] 节被添加到它们不应该添加的列表中。我觉得我的列表清空步骤已关闭。我刚刚发现的另一个问题是,当我打印出返回列表的元素时,每个元素都是相同的(只有文件中的第一个块......它只是重复)
def getBlock(myFile):
"""
blah blah blah parses by stanza
"""
print myFile
with open(myFile, 'r') as inFile:
print '~~~ newfile ~~~\n\n'
extraData = list()
blockList = list()
for line in inFile:
if line.strip() == '': # skips extraData, start of data blocks
termBlock = list()
for line in inFile:
if line.strip() == '[A]' and len(termBlock) !=0: # A
blockList.append(termBlock) # appends termBlock to blockList
del termBlock[:] # ensures list is empty for new termBlock
termBlock.append(line.strip())
elif line.strip() == '[B]' and len(termBlock) !=0: # B
del termBlock[:]
termBlock.append(line.strip())
elif line.strip() == '': # skip line if it's blank
continue
else: # add all block data
termBlock.append(line.strip())
else:
metaData.append(line) # adds metaData
return blockList, metaData
像这样:
bigLists = ([z.strip('[').strip(']') for z in y.split('\n') if z]
for y in x.split('\n\n'))
bigListA = [x for x in bigLists if x[0] == 'Blocktype A']
bigListB = [x for x in bigLists if x[0] == 'Blocktype B']
我喜欢为此使用生成器函数:
import itertools
from pprint import pprint
def stanzas(f):
stanza = []
for line in f:
line = line.strip()
if line.startswith('['):
if stanza:
yield stanza
stanza = []
if line:
stanza += [line]
if stanza:
yield stanza
with open('foo.ini') as input_file:
all_data = stanzas(input_file)
all_data = sorted(all_data, key = lambda x:x[0])
all_data = itertools.groupby(all_data, key = lambda x:x[0])
all_data = {k:list(v) for k,v in all_data}
# All of the data is in a dict in all_data. The dict keys are whatever
# stanza headers in the file there were.
# We can extract out the bits we want using []
bigListA = all_data['[Blocktype A]']
bigListB = all_data['[Blocktype B]']
pprint(bigListA)
pprint(bigListB)
输出正是您所需要的
def bigList(list_name,start):
quit_ask = ""
list_name = []
l = []
check = True
started = False
with open("TEXT.txt") as text_file:
for line in text_file:
line = line.strip()
if line.startswith(start) or started == True:
while '' in l: l.remove('')
if line.startswith(start):
quit_ask = line
if check != True:
list_name.append(l)
l = []
l.append(line)
started = True
elif line.startswith('[') and line != quit_ask: break
else: l.append(line); check = False
list_name.append(l)
return list_name
bigListA = []
bigListB = []
bigListA = bigList(bigListA,'[Blocktype A]')
bigListB = bigList(bigListB,'[Blocktype B]')
print bigListA
print bigListB
而且您不会被迫导入任何东西!
所以如果我有一个看起来像这样的文本文件,我想创建每个数据块的列表。
[Blocktype A]
thing
thing
thing
[Blocktype A]
thing
thing
thing
thing
thing
[Blocktype A]
thing
thing
[Blocktype B]
thing
thing
thing
基本上我希望我的代码能够做到这一点....
如果行 == '[Blocktype A]',将下一个 X 行(可以变化)附加到 'block/stanza' 列表,直到到达换行符。那时,将此 'block' 列表附加到整个列表,清空 'block' 列表,并对下一个 Blocktype A 节执行相同的操作,直到到达新行等......我想做与“[Blocktype B]”相同。
最后,我试图获得一个包含子列表作为元素的列表。换句话说,[Blocktype A]列表数据的列表,以及所有[Blocktype B]列表数据的列表
bigListA = [['Blocktype A', 'thing', 'thing', 'thing'], ['Blocktype A', 'thing', 'thing', 'thing', 'thing', 'thing'], 等等...]
bigListB = 同上
我不确定如何在像这样的特定行之间进行解析。有任何想法吗?非常感谢!
编辑* 这是我的代码。这个问题是,['B'] 节被添加到它们不应该添加的列表中。我觉得我的列表清空步骤已关闭。我刚刚发现的另一个问题是,当我打印出返回列表的元素时,每个元素都是相同的(只有文件中的第一个块......它只是重复)
def getBlock(myFile):
"""
blah blah blah parses by stanza
"""
print myFile
with open(myFile, 'r') as inFile:
print '~~~ newfile ~~~\n\n'
extraData = list()
blockList = list()
for line in inFile:
if line.strip() == '': # skips extraData, start of data blocks
termBlock = list()
for line in inFile:
if line.strip() == '[A]' and len(termBlock) !=0: # A
blockList.append(termBlock) # appends termBlock to blockList
del termBlock[:] # ensures list is empty for new termBlock
termBlock.append(line.strip())
elif line.strip() == '[B]' and len(termBlock) !=0: # B
del termBlock[:]
termBlock.append(line.strip())
elif line.strip() == '': # skip line if it's blank
continue
else: # add all block data
termBlock.append(line.strip())
else:
metaData.append(line) # adds metaData
return blockList, metaData
像这样:
bigLists = ([z.strip('[').strip(']') for z in y.split('\n') if z]
for y in x.split('\n\n'))
bigListA = [x for x in bigLists if x[0] == 'Blocktype A']
bigListB = [x for x in bigLists if x[0] == 'Blocktype B']
我喜欢为此使用生成器函数:
import itertools
from pprint import pprint
def stanzas(f):
stanza = []
for line in f:
line = line.strip()
if line.startswith('['):
if stanza:
yield stanza
stanza = []
if line:
stanza += [line]
if stanza:
yield stanza
with open('foo.ini') as input_file:
all_data = stanzas(input_file)
all_data = sorted(all_data, key = lambda x:x[0])
all_data = itertools.groupby(all_data, key = lambda x:x[0])
all_data = {k:list(v) for k,v in all_data}
# All of the data is in a dict in all_data. The dict keys are whatever
# stanza headers in the file there were.
# We can extract out the bits we want using []
bigListA = all_data['[Blocktype A]']
bigListB = all_data['[Blocktype B]']
pprint(bigListA)
pprint(bigListB)
输出正是您所需要的
def bigList(list_name,start):
quit_ask = ""
list_name = []
l = []
check = True
started = False
with open("TEXT.txt") as text_file:
for line in text_file:
line = line.strip()
if line.startswith(start) or started == True:
while '' in l: l.remove('')
if line.startswith(start):
quit_ask = line
if check != True:
list_name.append(l)
l = []
l.append(line)
started = True
elif line.startswith('[') and line != quit_ask: break
else: l.append(line); check = False
list_name.append(l)
return list_name
bigListA = []
bigListB = []
bigListA = bigList(bigListA,'[Blocktype A]')
bigListB = bigList(bigListB,'[Blocktype B]')
print bigListA
print bigListB
而且您不会被迫导入任何东西!