解析匹配键的文本,然后获取第一组匹配的 table 名称行
Parse text for matching key then grab first set of matching table name rows
尝试对一些大型的旧平面文本文件(老实说是一团糟)进行核对。我遇到的问题是我找到了我的匹配键,我试图获取具有匹配 table 名称的第一组连续行并忽略其余行。我将如何阅读我需要的而不是其余的?玩弄休息,但逻辑正在逃避我。
示例: 如果我正在寻找 101 的 PK 和 table 饮料名称,我想从下面的列表中打印
喝25
喝26
FlatTextFile.txt
pk_tbl 23 100
食物 0 0
喝 0 0
甜点 0 0
pk_tbl 101
食物 0
喝 25
喝 26
甜点 0
喝 27
喝 28
喝 29
pk_tbl 102
食物 0
喝 0
喝 0
喝 0
甜点 0
我所处位置示例的伪代码
pk_flag = 0
for row in d:
if (row[0]= 'drink') and (pk_flag =='1'):
print(row)
if (row[0]= 'pk_tbl')and (row[2] =='101'):
pk_flag = 1;
elif (row[0]= 'pk_tbl')and (row[2] !='101'):
pk_flag = 0;
有点混乱哈哈,任何帮助表示赞赏。
谢谢!
def get_table_data(file_path = 'FlatTextFile.txt', table_keyword = 'pk_tbl', table_num = '101', data_keyword = 'drink'):
output_ls = []
with open(file_path, 'r') as fh:
table = False
data = False
for line in fh.readlines():
if not len(line.strip()): # Ignoring blank lines
continue
row = line.split()
if not table: # Searching for table keyword and number
if row[0] == table_keyword and row[1] == table_num:
table = True
else:
if row[0] == table_keyword: # I'm already at next table
break
if not data: # Searching for data keyword
if row[0] == data_keyword:
data = True
output_ls.append(line)
else: # Searching for more consecutive data keywords
if row[0] == data_keyword:
output_ls.append(line)
else:
break
return output_ls
假设文件FlatTextFile.txt
中的花样存储为:
table name
food # (none or one or more)
drink # (none or one or more)
desert # (none or one or more)
(food, drink, desert pattern can repeat for a table)
(blank line)
table name (next table name)
(food, drink, desert pattern in any order)
您想在找到 table pk_tbl 101
后立即选择带有 drink
的记录。 table 的名称可以是 pk_tbl
+ 任意字符串或无字符串 + 101
根据上述假设,下面是从 table 101 中挑选特定饮料的代码。
with open ('FlatTextFile.txt', 'r') as f:
table = False
output = []
line_count = 0
for line in f:
line = line.rstrip()
x = line.split()
if {'pk_tbl','101'} <= set(x): #checks if 'pk_tbl' and '101' are in x
table = True
continue
if table and 'drink' in x: #finds values with drinks
line_count +=1
output.append(line)
continue
if line_count > 0: break #we are past drink in table pk_tbl; stop processing
print (output)
这个输出将是:
['drink 25', 'drink 26']
尝试对一些大型的旧平面文本文件(老实说是一团糟)进行核对。我遇到的问题是我找到了我的匹配键,我试图获取具有匹配 table 名称的第一组连续行并忽略其余行。我将如何阅读我需要的而不是其余的?玩弄休息,但逻辑正在逃避我。
示例: 如果我正在寻找 101 的 PK 和 table 饮料名称,我想从下面的列表中打印
喝25
喝26
FlatTextFile.txt
pk_tbl 23 100
食物 0 0
喝 0 0
甜点 0 0
pk_tbl 101
食物 0
喝 25
喝 26
甜点 0
喝 27
喝 28
喝 29
pk_tbl 102
食物 0
喝 0
喝 0
喝 0
甜点 0
我所处位置示例的伪代码
pk_flag = 0
for row in d:
if (row[0]= 'drink') and (pk_flag =='1'):
print(row)
if (row[0]= 'pk_tbl')and (row[2] =='101'):
pk_flag = 1;
elif (row[0]= 'pk_tbl')and (row[2] !='101'):
pk_flag = 0;
有点混乱哈哈,任何帮助表示赞赏。 谢谢!
def get_table_data(file_path = 'FlatTextFile.txt', table_keyword = 'pk_tbl', table_num = '101', data_keyword = 'drink'):
output_ls = []
with open(file_path, 'r') as fh:
table = False
data = False
for line in fh.readlines():
if not len(line.strip()): # Ignoring blank lines
continue
row = line.split()
if not table: # Searching for table keyword and number
if row[0] == table_keyword and row[1] == table_num:
table = True
else:
if row[0] == table_keyword: # I'm already at next table
break
if not data: # Searching for data keyword
if row[0] == data_keyword:
data = True
output_ls.append(line)
else: # Searching for more consecutive data keywords
if row[0] == data_keyword:
output_ls.append(line)
else:
break
return output_ls
假设文件FlatTextFile.txt
中的花样存储为:
table name
food # (none or one or more)
drink # (none or one or more)
desert # (none or one or more)
(food, drink, desert pattern can repeat for a table)
(blank line)
table name (next table name)
(food, drink, desert pattern in any order)
您想在找到 table pk_tbl 101
后立即选择带有 drink
的记录。 table 的名称可以是 pk_tbl
+ 任意字符串或无字符串 + 101
根据上述假设,下面是从 table 101 中挑选特定饮料的代码。
with open ('FlatTextFile.txt', 'r') as f:
table = False
output = []
line_count = 0
for line in f:
line = line.rstrip()
x = line.split()
if {'pk_tbl','101'} <= set(x): #checks if 'pk_tbl' and '101' are in x
table = True
continue
if table and 'drink' in x: #finds values with drinks
line_count +=1
output.append(line)
continue
if line_count > 0: break #we are past drink in table pk_tbl; stop processing
print (output)
这个输出将是:
['drink 25', 'drink 26']