解析根据其索引位置提取值的文本文件
Parse a Text File extracting values according to its index position
大家好,你们好吗?希望你一切都好!
如何解析使用索引位置提取特定值的文本文件,将值附加到列表,然后将其转换为 pandas 数据帧。到目前为止,我能够编写以下代码:
文本样本:
header:0RCPF049100000084220210407
body:1927907801100032G 00sucess
1067697546140032G 00sucess
1053756666000032G 00sucess
1321723368900032G 00sucess
1037673956810032G 00sucess
例如,第一行是 header,从中,我只需要位于以下索引位置的日期:
date_from_header = 林哈斯[0][18:26]
其余值在 body
中
import csv
import pandas as pd
headers = ["data_mov", "chave_detalhe", "cpf_cliente", "cd_clube",
"cd_operacao","filler","cd_retorno","tc_recusa"]
# This is the actual code
with open('RCPF0491.20210407.1609.txt', "r")as f:
linhas = [linha.rstrip() for linha in f.readlines()]
for i in range(0,len(linhas)):
data_mov = linhas[0][18:26]
chave_detalhe=linhas[1][0:1]
cpf_cliente=linhas[1][1:12]
cd_clube=linhas[1][12:16]
cd_operacao=linhas[1][16:17]
filler=linhas[1][17:40]
cd_retorno=linhas[1][40:42]
tx_recusa=linhas[1][42:100]
data = [data_mov,chave_detalhe,cpf_cliente,cd_clube,cd_operacao","filler,cd_retorno,tc_recusa]
预期结果如下所示:
data_mov chave_detalhe cpf_cliente cd_clube cd_operacao filler cd_retorno tx_recusa
'20210407' '1' 92790780110 '0032' 'G' 'blank space' '00' 'sucesso'
'20210407' '1' 92790780110 '0032' 'G' 'blank space' '00' 'sucesso'
'20210407' '1' 92790780110 '0032' 'G' 'blank space' '00' 'sucesso'
使用whosebug.com/a/10851479/1581658
def parse_file(filename):
indices = [0,1,12,16,17,18,20] # list the indices to split on
parsed_data = [] # returned array by line
with open(filename) as f:
header = next(f) #skip the header
data_mov = header[18:26] # and get data_mov from header
for line in f: #loop through lines
#split each line by the indices
parts = [data_mov] + [line.rstrip()[i:j] for i,j in zip(indices, indices[1:]+[None])]
parsed_data.append(parts)
return parsed_data
print(parse_file("filename.txt"))
我感谢 SamBob 的帮助,如果有人需要,请遵循最终解决方案:
import itertools
import pandas as pd
pd.options.display.width = 0
def parse_file(filename):
indices=[0,1,12,16,17,18,42] # list of indexes
parsed_data = [] # return a list
with open(filename) as f:
header = next(f)
data_mov = header[18:26]
for line in itertools.islice(f,1,100):
# dividr de acordo com os índices.
parts = [data_mov] + [line.rstrip()[i:j] for i,j in zip(indices, indices[1:]+[None])]
parsed_data.append(parts)
# convert to dataframe
cols = ['data_mov', 'chave_detalhe', 'cpf_cliente','cd_clube','cd_operacao','filler','cd_retorno','tx_recusa']
df = pd.DataFrame(parsed_data, columns=cols)
return df
df = (parse_file("filename.txt"))
大家好,你们好吗?希望你一切都好! 如何解析使用索引位置提取特定值的文本文件,将值附加到列表,然后将其转换为 pandas 数据帧。到目前为止,我能够编写以下代码: 文本样本:
header:0RCPF049100000084220210407
body:1927907801100032G 00sucess
1067697546140032G 00sucess
1053756666000032G 00sucess
1321723368900032G 00sucess
1037673956810032G 00sucess
例如,第一行是 header,从中,我只需要位于以下索引位置的日期: date_from_header = 林哈斯[0][18:26] 其余值在 body
中import csv
import pandas as pd
headers = ["data_mov", "chave_detalhe", "cpf_cliente", "cd_clube",
"cd_operacao","filler","cd_retorno","tc_recusa"]
# This is the actual code
with open('RCPF0491.20210407.1609.txt', "r")as f:
linhas = [linha.rstrip() for linha in f.readlines()]
for i in range(0,len(linhas)):
data_mov = linhas[0][18:26]
chave_detalhe=linhas[1][0:1]
cpf_cliente=linhas[1][1:12]
cd_clube=linhas[1][12:16]
cd_operacao=linhas[1][16:17]
filler=linhas[1][17:40]
cd_retorno=linhas[1][40:42]
tx_recusa=linhas[1][42:100]
data = [data_mov,chave_detalhe,cpf_cliente,cd_clube,cd_operacao","filler,cd_retorno,tc_recusa]
预期结果如下所示:
data_mov chave_detalhe cpf_cliente cd_clube cd_operacao filler cd_retorno tx_recusa
'20210407' '1' 92790780110 '0032' 'G' 'blank space' '00' 'sucesso'
'20210407' '1' 92790780110 '0032' 'G' 'blank space' '00' 'sucesso'
'20210407' '1' 92790780110 '0032' 'G' 'blank space' '00' 'sucesso'
使用whosebug.com/a/10851479/1581658
def parse_file(filename):
indices = [0,1,12,16,17,18,20] # list the indices to split on
parsed_data = [] # returned array by line
with open(filename) as f:
header = next(f) #skip the header
data_mov = header[18:26] # and get data_mov from header
for line in f: #loop through lines
#split each line by the indices
parts = [data_mov] + [line.rstrip()[i:j] for i,j in zip(indices, indices[1:]+[None])]
parsed_data.append(parts)
return parsed_data
print(parse_file("filename.txt"))
我感谢 SamBob 的帮助,如果有人需要,请遵循最终解决方案:
import itertools
import pandas as pd
pd.options.display.width = 0
def parse_file(filename):
indices=[0,1,12,16,17,18,42] # list of indexes
parsed_data = [] # return a list
with open(filename) as f:
header = next(f)
data_mov = header[18:26]
for line in itertools.islice(f,1,100):
# dividr de acordo com os índices.
parts = [data_mov] + [line.rstrip()[i:j] for i,j in zip(indices, indices[1:]+[None])]
parsed_data.append(parts)
# convert to dataframe
cols = ['data_mov', 'chave_detalhe', 'cpf_cliente','cd_clube','cd_operacao','filler','cd_retorno','tx_recusa']
df = pd.DataFrame(parsed_data, columns=cols)
return df
df = (parse_file("filename.txt"))